Search in sources :

Example 1 with NonRecordContainer

use of io.confluent.kafka.serializers.NonRecordContainer in project kafka-connect-storage-cloud by confluentinc.

the class AvroRecordWriterProvider method getRecordWriter.

@Override
public RecordWriter getRecordWriter(final S3SinkConnectorConfig conf, final String filename) {
    // This is not meant to be a thread-safe writer!
    return new RecordWriter() {

        final DataFileWriter<Object> writer = new DataFileWriter<>(new GenericDatumWriter<>());

        Schema schema = null;

        S3OutputStream s3out;

        @Override
        public void write(SinkRecord record) {
            if (schema == null) {
                schema = record.valueSchema();
                try {
                    log.info("Opening record writer for: {}", filename);
                    s3out = storage.create(filename, true);
                    org.apache.avro.Schema avroSchema = avroData.fromConnectSchema(schema);
                    writer.setCodec(CodecFactory.fromString(conf.getAvroCodec()));
                    writer.create(avroSchema, s3out);
                } catch (IOException e) {
                    throw new ConnectException(e);
                }
            }
            log.trace("Sink record: {}", record);
            Object value = avroData.fromConnectData(schema, record.value());
            try {
                // NonRecordContainers to just their value to properly handle these types
                if (value instanceof NonRecordContainer) {
                    value = ((NonRecordContainer) value).getValue();
                }
                writer.append(value);
            } catch (IOException e) {
                throw new ConnectException(e);
            }
        }

        @Override
        public void commit() {
            try {
                // Flush is required here, because closing the writer will close the underlying S3
                // output stream before committing any data to S3.
                writer.flush();
                s3out.commit();
                writer.close();
            } catch (IOException e) {
                throw new ConnectException(e);
            }
        }

        @Override
        public void close() {
            try {
                writer.close();
            } catch (IOException e) {
                throw new ConnectException(e);
            }
        }
    };
}
Also used : DataFileWriter(org.apache.avro.file.DataFileWriter) Schema(org.apache.kafka.connect.data.Schema) S3OutputStream(io.confluent.connect.s3.storage.S3OutputStream) GenericDatumWriter(org.apache.avro.generic.GenericDatumWriter) IOException(java.io.IOException) SinkRecord(org.apache.kafka.connect.sink.SinkRecord) RecordWriter(io.confluent.connect.storage.format.RecordWriter) NonRecordContainer(io.confluent.kafka.serializers.NonRecordContainer) ConnectException(org.apache.kafka.connect.errors.ConnectException)

Example 2 with NonRecordContainer

use of io.confluent.kafka.serializers.NonRecordContainer in project kafka-connect-storage-cloud by confluentinc.

the class DataWriterAvroTest method verifyContents.

protected void verifyContents(List<SinkRecord> expectedRecords, int startIndex, Collection<Object> records) {
    Schema expectedSchema = null;
    for (Object avroRecord : records) {
        if (expectedSchema == null) {
            expectedSchema = expectedRecords.get(startIndex).valueSchema();
        }
        Object expectedValue = SchemaProjector.project(expectedRecords.get(startIndex).valueSchema(), expectedRecords.get(startIndex++).value(), expectedSchema);
        Object value = format.getAvroData().fromConnectData(expectedSchema, expectedValue);
        // NonRecordContainers to just their value to properly handle these types
        if (value instanceof NonRecordContainer) {
            value = ((NonRecordContainer) value).getValue();
        }
        if (avroRecord instanceof Utf8) {
            assertEquals(value, avroRecord.toString());
        } else {
            assertEquals(value, avroRecord);
        }
    }
}
Also used : Schema(org.apache.kafka.connect.data.Schema) FieldSchema(org.apache.hadoop.hive.metastore.api.FieldSchema) Utf8(org.apache.avro.util.Utf8) NonRecordContainer(io.confluent.kafka.serializers.NonRecordContainer)

Example 3 with NonRecordContainer

use of io.confluent.kafka.serializers.NonRecordContainer in project kafka-connect-storage-cloud by confluentinc.

the class AvroUtils method putRecords.

public static byte[] putRecords(Collection<SinkRecord> records, AvroData avroData) throws IOException {
    final DataFileWriter<Object> writer = new DataFileWriter<>(new GenericDatumWriter<>());
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    Schema schema = null;
    for (SinkRecord record : records) {
        if (schema == null) {
            schema = record.valueSchema();
            org.apache.avro.Schema avroSchema = avroData.fromConnectSchema(schema);
            writer.create(avroSchema, out);
        }
        Object value = avroData.fromConnectData(schema, record.value());
        // NonRecordContainers to just their value to properly handle these types
        if (value instanceof NonRecordContainer) {
            value = ((NonRecordContainer) value).getValue();
        }
        writer.append(value);
    }
    writer.flush();
    return out.toByteArray();
}
Also used : DataFileWriter(org.apache.avro.file.DataFileWriter) Schema(org.apache.kafka.connect.data.Schema) ByteArrayOutputStream(java.io.ByteArrayOutputStream) SinkRecord(org.apache.kafka.connect.sink.SinkRecord) NonRecordContainer(io.confluent.kafka.serializers.NonRecordContainer)

Aggregations

NonRecordContainer (io.confluent.kafka.serializers.NonRecordContainer)3 Schema (org.apache.kafka.connect.data.Schema)3 DataFileWriter (org.apache.avro.file.DataFileWriter)2 SinkRecord (org.apache.kafka.connect.sink.SinkRecord)2 S3OutputStream (io.confluent.connect.s3.storage.S3OutputStream)1 RecordWriter (io.confluent.connect.storage.format.RecordWriter)1 ByteArrayOutputStream (java.io.ByteArrayOutputStream)1 IOException (java.io.IOException)1 GenericDatumWriter (org.apache.avro.generic.GenericDatumWriter)1 Utf8 (org.apache.avro.util.Utf8)1 FieldSchema (org.apache.hadoop.hive.metastore.api.FieldSchema)1 ConnectException (org.apache.kafka.connect.errors.ConnectException)1