Search in sources :

Example 1 with AvroKeyValueSinkWriter

use of org.apache.flink.streaming.connectors.fs.AvroKeyValueSinkWriter in project flink by apache.

the class BucketingSinkTest method testNonRollingAvroKeyValueWithCompressionWriter.

/**
	 * This tests {@link AvroKeyValueSinkWriter}
	 * with non-rolling output and with compression.
	 */
@Test
public void testNonRollingAvroKeyValueWithCompressionWriter() throws Exception {
    final String outPath = hdfsURI + "/avro-kv-no-comp-non-rolling-out";
    final int numElements = 20;
    Map<String, String> properties = new HashMap<>();
    Schema keySchema = Schema.create(Schema.Type.INT);
    Schema valueSchema = Schema.create(Schema.Type.STRING);
    properties.put(AvroKeyValueSinkWriter.CONF_OUTPUT_KEY_SCHEMA, keySchema.toString());
    properties.put(AvroKeyValueSinkWriter.CONF_OUTPUT_VALUE_SCHEMA, valueSchema.toString());
    properties.put(AvroKeyValueSinkWriter.CONF_COMPRESS, String.valueOf(true));
    properties.put(AvroKeyValueSinkWriter.CONF_COMPRESS_CODEC, DataFileConstants.SNAPPY_CODEC);
    BucketingSink<Tuple2<Integer, String>> sink = new BucketingSink<Tuple2<Integer, String>>(outPath).setWriter(new AvroKeyValueSinkWriter<Integer, String>(properties)).setBucketer(new BasePathBucketer<Tuple2<Integer, String>>()).setPartPrefix(PART_PREFIX).setPendingPrefix("").setPendingSuffix("");
    OneInputStreamOperatorTestHarness<Tuple2<Integer, String>, Object> testHarness = createTestSink(sink, 1, 0);
    testHarness.setProcessingTime(0L);
    testHarness.setup();
    testHarness.open();
    for (int i = 0; i < numElements; i++) {
        testHarness.processElement(new StreamRecord<>(Tuple2.of(i, "message #" + Integer.toString(i))));
    }
    testHarness.close();
    GenericData.setStringType(valueSchema, GenericData.StringType.String);
    Schema elementSchema = AvroKeyValueSinkWriter.AvroKeyValue.getSchema(keySchema, valueSchema);
    FSDataInputStream inStream = dfs.open(new Path(outPath + "/" + PART_PREFIX + "-0-0"));
    SpecificDatumReader<GenericRecord> elementReader = new SpecificDatumReader<>(elementSchema);
    DataFileStream<GenericRecord> dataFileStream = new DataFileStream<>(inStream, elementReader);
    for (int i = 0; i < numElements; i++) {
        AvroKeyValueSinkWriter.AvroKeyValue<Integer, String> wrappedEntry = new AvroKeyValueSinkWriter.AvroKeyValue<>(dataFileStream.next());
        int key = wrappedEntry.getKey();
        Assert.assertEquals(i, key);
        String value = wrappedEntry.getValue();
        Assert.assertEquals("message #" + i, value);
    }
    dataFileStream.close();
    inStream.close();
}
Also used : Path(org.apache.hadoop.fs.Path) HashMap(java.util.HashMap) Schema(org.apache.avro.Schema) DataFileStream(org.apache.avro.file.DataFileStream) TypeHint(org.apache.flink.api.common.typeinfo.TypeHint) Tuple2(org.apache.flink.api.java.tuple.Tuple2) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream) SpecificDatumReader(org.apache.avro.specific.SpecificDatumReader) GenericRecord(org.apache.avro.generic.GenericRecord) AvroKeyValueSinkWriter(org.apache.flink.streaming.connectors.fs.AvroKeyValueSinkWriter) Test(org.junit.Test)

Aggregations

HashMap (java.util.HashMap)1 Schema (org.apache.avro.Schema)1 DataFileStream (org.apache.avro.file.DataFileStream)1 GenericRecord (org.apache.avro.generic.GenericRecord)1 SpecificDatumReader (org.apache.avro.specific.SpecificDatumReader)1 TypeHint (org.apache.flink.api.common.typeinfo.TypeHint)1 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)1 AvroKeyValueSinkWriter (org.apache.flink.streaming.connectors.fs.AvroKeyValueSinkWriter)1 FSDataInputStream (org.apache.hadoop.fs.FSDataInputStream)1 Path (org.apache.hadoop.fs.Path)1 Test (org.junit.Test)1