Search in sources :

Example 6 with GenericDatumWriter

use of org.apache.avro.generic.GenericDatumWriter in project druid by druid-io.

the class SchemaRegistryBasedAvroBytesDecoderTest method getAvroDatum.

byte[] getAvroDatum(Schema schema, GenericRecord someAvroDatum) throws IOException {
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
    writer.write(someAvroDatum, EncoderFactory.get().directBinaryEncoder(out, null));
    return out.toByteArray();
}
Also used : ByteArrayOutputStream(java.io.ByteArrayOutputStream) GenericDatumWriter(org.apache.avro.generic.GenericDatumWriter) GenericRecord(org.apache.avro.generic.GenericRecord)

Example 7 with GenericDatumWriter

use of org.apache.avro.generic.GenericDatumWriter in project haivvreo by jghoman.

the class AvroContainerOutputFormat method getHiveRecordWriter.

@Override
public FileSinkOperator.RecordWriter getHiveRecordWriter(JobConf jobConf, Path path, Class<? extends Writable> valueClass, boolean isCompressed, Properties properties, Progressable progressable) throws IOException {
    Schema schema;
    try {
        schema = HaivvreoUtils.determineSchemaOrThrowException(jobConf, properties);
    } catch (HaivvreoException e) {
        throw new IOException(e);
    }
    GenericDatumWriter<GenericRecord> gdw = new GenericDatumWriter<GenericRecord>(schema);
    DataFileWriter<GenericRecord> dfw = new DataFileWriter<GenericRecord>(gdw);
    if (isCompressed) {
        int level = jobConf.getInt(DEFLATE_LEVEL_KEY, DEFAULT_DEFLATE_LEVEL);
        String codecName = jobConf.get(OUTPUT_CODEC, DEFLATE_CODEC);
        CodecFactory factory = codecName.equals(DEFLATE_CODEC) ? CodecFactory.deflateCodec(level) : CodecFactory.fromString(codecName);
        dfw.setCodec(factory);
    }
    dfw.create(schema, path.getFileSystem(jobConf).create(path));
    return new AvroGenericRecordWriter(dfw);
}
Also used : Schema(org.apache.avro.Schema) DataFileWriter(org.apache.avro.file.DataFileWriter) IOException(java.io.IOException) GenericDatumWriter(org.apache.avro.generic.GenericDatumWriter) CodecFactory(org.apache.avro.file.CodecFactory) GenericRecord(org.apache.avro.generic.GenericRecord)

Example 8 with GenericDatumWriter

use of org.apache.avro.generic.GenericDatumWriter in project rest.li by linkedin.

the class AvroUtil method jsonFromGenericRecord.

public static String jsonFromGenericRecord(GenericRecord record) throws IOException {
    GenericDatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>();
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    AvroAdapter avroAdapter = AvroAdapterFinder.getAvroAdapter();
    Encoder jsonEncoder = avroAdapter.createJsonEncoder(record.getSchema(), outputStream);
    writer.setSchema(record.getSchema());
    writer.write(record, jsonEncoder);
    jsonEncoder.flush();
    return outputStream.toString();
}
Also used : AvroAdapter(com.linkedin.data.avro.AvroAdapter) Encoder(org.apache.avro.io.Encoder) GenericDatumWriter(org.apache.avro.generic.GenericDatumWriter) ByteArrayOutputStream(java.io.ByteArrayOutputStream) GenericRecord(org.apache.avro.generic.GenericRecord)

Example 9 with GenericDatumWriter

use of org.apache.avro.generic.GenericDatumWriter in project databus by linkedin.

the class RelayEventGenerator method populateEvents.

int populateEvents(String source, short id, GenericRecord record, DbusEventKey key, byte[] schemaId, DbusEventsStatisticsCollector statsCollector, DbusEventBufferAppendable buffer) {
    if (record != null && key != null) {
        try {
            // Serialize the row
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            Encoder encoder = new BinaryEncoder(bos);
            GenericDatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(record.getSchema());
            writer.write(record, encoder);
            byte[] serializedValue = bos.toByteArray();
            short pPartitionId = RngUtils.randomPositiveShort();
            short lPartitionId = RngUtils.randomPositiveShort();
            long timeStamp = System.currentTimeMillis() * 1000000;
            buffer.appendEvent(key, pPartitionId, lPartitionId, timeStamp, id, schemaId, serializedValue, false, statsCollector);
            return 1;
        } catch (IOException io) {
            LOG.error("Cannot create byte stream payload: " + source);
        }
    }
    return 0;
}
Also used : BinaryEncoder(org.apache.avro.io.BinaryEncoder) Encoder(org.apache.avro.io.Encoder) BinaryEncoder(org.apache.avro.io.BinaryEncoder) ByteArrayOutputStream(java.io.ByteArrayOutputStream) GenericDatumWriter(org.apache.avro.generic.GenericDatumWriter) IOException(java.io.IOException) GenericRecord(org.apache.avro.generic.GenericRecord)

Example 10 with GenericDatumWriter

use of org.apache.avro.generic.GenericDatumWriter in project databus by linkedin.

the class GoldenGateEventProducer method addEventToBuffer.

/**
   *
   * @param dbUpdates  The dbUpdates present in the current transaction
   * @param ti The meta information about the transaction. (See TransactionInfo class for more details).
   * @throws DatabusException
   * @throws UnsupportedKeyException
   */
protected void addEventToBuffer(List<TransactionState.PerSourceTransactionalUpdate> dbUpdates, TransactionInfo ti) throws DatabusException, UnsupportedKeyException {
    if (dbUpdates.size() == 0)
        throw new DatabusException("Cannot handle empty dbUpdates");
    long scn = ti.getScn();
    long timestamp = ti.getTransactionTimeStampNs();
    EventSourceStatistics globalStats = getSource(GLOBAL_SOURCE_ID).getStatisticsBean();
    /**
     * We skip the start scn of the relay, we have already added a EOP for this SCN in the buffer.
     * Why is this not a problem ?
     * There are two cases:
     * 1. When we use the earliest/latest scn if there is no maxScn (We don't really have a start point). So it's really OK to miss the first event.
     * 2. If it's the maxSCN, then event was already seen by the relay.
     */
    if (scn == _startPrevScn.get()) {
        _log.info("Skipping this transaction, EOP already send for this event");
        return;
    }
    getEventBuffer().startEvents();
    int eventsInTransactionCount = 0;
    List<EventReaderSummary> summaries = new ArrayList<EventReaderSummary>();
    for (int i = 0; i < dbUpdates.size(); ++i) {
        GenericRecord record = null;
        TransactionState.PerSourceTransactionalUpdate perSourceUpdate = dbUpdates.get(i);
        short sourceId = (short) perSourceUpdate.getSourceId();
        // prepare stats collection per source
        EventSourceStatistics perSourceStats = getSource(sourceId).getStatisticsBean();
        Iterator<DbUpdateState.DBUpdateImage> dbUpdateIterator = perSourceUpdate.getDbUpdatesSet().iterator();
        int eventsInDbUpdate = 0;
        long dbUpdatesEventsSize = 0;
        long startDbUpdatesMs = System.currentTimeMillis();
        while (//TODO verify if there is any case where we need to rollback.
        dbUpdateIterator.hasNext()) {
            DbUpdateState.DBUpdateImage dbUpdate = dbUpdateIterator.next();
            //Construct the Databus Event key, determine the key type and construct the key
            Object keyObj = obtainKey(dbUpdate);
            DbusEventKey eventKey = new DbusEventKey(keyObj);
            //Get the logicalparition id
            PartitionFunction partitionFunction = _partitionFunctionHashMap.get((int) sourceId);
            short lPartitionId = partitionFunction.getPartition(eventKey);
            record = dbUpdate.getGenericRecord();
            //Write the event to the buffer
            if (record == null)
                throw new DatabusException("Cannot write event to buffer because record = " + record);
            if (record.getSchema() == null)
                throw new DatabusException("The record does not have a schema (null schema)");
            try {
                //Collect stats on number of dbUpdates for one source
                eventsInDbUpdate++;
                //Count of all the events in the current transaction
                eventsInTransactionCount++;
                // Serialize the row
                ByteArrayOutputStream bos = new ByteArrayOutputStream();
                Encoder encoder = new BinaryEncoder(bos);
                GenericDatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(record.getSchema());
                writer.write(record, encoder);
                byte[] serializedValue = bos.toByteArray();
                //Get the md5 for the schema
                SchemaId schemaId = SchemaId.createWithMd5(dbUpdate.getSchema());
                //Determine the operation type and convert to dbus opcode
                DbusOpcode opCode;
                if (dbUpdate.getOpType() == DbUpdateState.DBUpdateImage.OpType.INSERT || dbUpdate.getOpType() == DbUpdateState.DBUpdateImage.OpType.UPDATE) {
                    opCode = DbusOpcode.UPSERT;
                    if (_log.isDebugEnabled())
                        _log.debug("The event with scn " + scn + " is INSERT/UPDATE");
                } else if (dbUpdate.getOpType() == DbUpdateState.DBUpdateImage.OpType.DELETE) {
                    opCode = DbusOpcode.DELETE;
                    if (_log.isDebugEnabled())
                        _log.debug("The event with scn " + scn + " is DELETE");
                } else {
                    throw new DatabusException("Unknown opcode from dbUpdate for event with scn:" + scn);
                }
                //Construct the dbusEvent info
                DbusEventInfo dbusEventInfo = new DbusEventInfo(opCode, scn, (short) _pConfig.getId(), lPartitionId, timestamp, sourceId, schemaId.getByteArray(), serializedValue, false, false);
                dbusEventInfo.setReplicated(dbUpdate.isReplicated());
                perSourceStats.addEventCycle(1, ti.getTransactionTimeRead(), serializedValue.length, scn);
                globalStats.addEventCycle(1, ti.getTransactionTimeRead(), serializedValue.length, scn);
                long tsEnd = System.currentTimeMillis();
                perSourceStats.addTimeOfLastDBAccess(tsEnd);
                globalStats.addTimeOfLastDBAccess(tsEnd);
                //Append to the event buffer
                getEventBuffer().appendEvent(eventKey, dbusEventInfo, _statsCollector);
                _rc.incrementEventCount();
                dbUpdatesEventsSize += serializedValue.length;
            } catch (IOException io) {
                perSourceStats.addError();
                globalStats.addEmptyEventCycle();
                _log.error("Cannot create byte stream payload: " + dbUpdates.get(i).getSourceId());
            }
        }
        long endDbUpdatesMs = System.currentTimeMillis();
        long dbUpdatesElapsedTimeMs = endDbUpdatesMs - startDbUpdatesMs;
        // Log Event Summary at logical source level
        EventReaderSummary summary = new EventReaderSummary(sourceId, _monitoredSources.get(sourceId).getSourceName(), scn, eventsInDbUpdate, dbUpdatesEventsSize, -1L, /* Not supported */
        dbUpdatesElapsedTimeMs, timestamp, timestamp, -1L);
        if (_eventsLog.isInfoEnabled()) {
            _eventsLog.info(summary.toString());
        }
        summaries.add(summary);
        if (_log.isDebugEnabled())
            _log.debug("There are " + eventsInDbUpdate + " events seen in the current dbUpdate");
    }
    // Log Event Summary at Physical source level
    ReadEventCycleSummary summary = new ReadEventCycleSummary(_pConfig.getName(), summaries, scn, -1);
    if (_eventsLog.isInfoEnabled()) {
        _eventsLog.info(summary.toString());
    }
    _log.info("Writing " + eventsInTransactionCount + " events from transaction with scn: " + scn);
    if (scn <= 0)
        throw new DatabusException("Unable to write events to buffer because of negative/zero scn: " + scn);
    getEventBuffer().endEvents(scn, _statsCollector);
    _scn.set(scn);
    if (getMaxScnReaderWriter() != null) {
        try {
            getMaxScnReaderWriter().saveMaxScn(_scn.get());
        } catch (DatabusException e) {
            _log.error("Cannot save scn = " + _scn + " for physical source = " + getName(), e);
        }
    }
}
Also used : PartitionFunction(com.linkedin.databus2.producers.PartitionFunction) TransactionState(com.linkedin.databus2.ggParser.XmlStateMachine.TransactionState) ArrayList(java.util.ArrayList) EventSourceStatistics(com.linkedin.databus.monitoring.mbean.EventSourceStatistics) DbusEventInfo(com.linkedin.databus.core.DbusEventInfo) ReadEventCycleSummary(com.linkedin.databus2.producers.db.ReadEventCycleSummary) EventReaderSummary(com.linkedin.databus2.producers.db.EventReaderSummary) Encoder(org.apache.avro.io.Encoder) BinaryEncoder(org.apache.avro.io.BinaryEncoder) DbusOpcode(com.linkedin.databus.core.DbusOpcode) GenericRecord(org.apache.avro.generic.GenericRecord) DbUpdateState(com.linkedin.databus2.ggParser.XmlStateMachine.DbUpdateState) ByteArrayOutputStream(java.io.ByteArrayOutputStream) GenericDatumWriter(org.apache.avro.generic.GenericDatumWriter) IOException(java.io.IOException) DatabusException(com.linkedin.databus2.core.DatabusException) BinaryEncoder(org.apache.avro.io.BinaryEncoder) SchemaId(com.linkedin.databus2.schemas.SchemaId) PerSourceTransactionalUpdate(com.linkedin.databus2.ggParser.XmlStateMachine.TransactionState.PerSourceTransactionalUpdate) DbusEventKey(com.linkedin.databus.core.DbusEventKey)

Aggregations

GenericDatumWriter (org.apache.avro.generic.GenericDatumWriter)49 GenericRecord (org.apache.avro.generic.GenericRecord)46 ByteArrayOutputStream (java.io.ByteArrayOutputStream)24 Schema (org.apache.avro.Schema)23 DataFileWriter (org.apache.avro.file.DataFileWriter)17 BinaryEncoder (org.apache.avro.io.BinaryEncoder)17 IOException (java.io.IOException)13 Encoder (org.apache.avro.io.Encoder)12 File (java.io.File)9 Test (org.junit.Test)6 FileOutputStream (java.io.FileOutputStream)4 GenericDatumReader (org.apache.avro.generic.GenericDatumReader)4 ArrayList (java.util.ArrayList)3 Properties (java.util.Properties)3 Producer (kafka.javaapi.producer.Producer)3 ProducerConfig (kafka.producer.ProducerConfig)3 GenericRecordBuilder (org.apache.avro.generic.GenericRecordBuilder)3 JsonEncoder (org.apache.avro.io.JsonEncoder)3 AvroAdapter (com.linkedin.data.avro.AvroAdapter)2 DbusEventInfo (com.linkedin.databus.core.DbusEventInfo)2