Search in sources :

Example 1 with AvroKey

use of org.apache.avro.mapred.AvroKey in project crunch by cloudera.

the class AvroIndexedRecordPartitionerTest method testGetPartition_IntegerMinValue.

@Test
public void testGetPartition_IntegerMinValue() {
    IndexedRecord indexedRecord = new MockIndexedRecord(Integer.MIN_VALUE);
    AvroKey<IndexedRecord> avroKey = new AvroKey<IndexedRecord>(indexedRecord);
    assertEquals(0, avroPartitioner.getPartition(avroKey, new AvroValue<Object>(), Integer.MAX_VALUE));
}
Also used : IndexedRecord(org.apache.avro.generic.IndexedRecord) AvroKey(org.apache.avro.mapred.AvroKey) AvroValue(org.apache.avro.mapred.AvroValue) Test(org.junit.Test)

Example 2 with AvroKey

use of org.apache.avro.mapred.AvroKey in project crunch by cloudera.

the class AvroIndexedRecordPartitionerTest method testGetPartition.

@Test
public void testGetPartition() {
    IndexedRecord indexedRecord = new MockIndexedRecord(3);
    AvroKey<IndexedRecord> avroKey = new AvroKey<IndexedRecord>(indexedRecord);
    assertEquals(3, avroPartitioner.getPartition(avroKey, new AvroValue<Object>(), 5));
    assertEquals(1, avroPartitioner.getPartition(avroKey, new AvroValue<Object>(), 2));
}
Also used : IndexedRecord(org.apache.avro.generic.IndexedRecord) AvroKey(org.apache.avro.mapred.AvroKey) AvroValue(org.apache.avro.mapred.AvroValue) Test(org.junit.Test)

Example 3 with AvroKey

use of org.apache.avro.mapred.AvroKey in project crunch by cloudera.

the class AvroIndexedRecordPartitionerTest method testGetPartition_NegativeHashValue.

@Test
public void testGetPartition_NegativeHashValue() {
    IndexedRecord indexedRecord = new MockIndexedRecord(-3);
    AvroKey<IndexedRecord> avroKey = new AvroKey<IndexedRecord>(indexedRecord);
    assertEquals(3, avroPartitioner.getPartition(avroKey, new AvroValue<Object>(), 5));
    assertEquals(1, avroPartitioner.getPartition(avroKey, new AvroValue<Object>(), 2));
}
Also used : IndexedRecord(org.apache.avro.generic.IndexedRecord) AvroKey(org.apache.avro.mapred.AvroKey) AvroValue(org.apache.avro.mapred.AvroValue) Test(org.junit.Test)

Example 4 with AvroKey

use of org.apache.avro.mapred.AvroKey in project crunch by cloudera.

the class SafeAvroSerialization method getSerializer.

/**
 * Returns the specified output serializer.
 */
public Serializer<AvroWrapper<T>> getSerializer(Class<AvroWrapper<T>> c) {
    // AvroWrapper used for final output, AvroKey or AvroValue for map output
    boolean isFinalOutput = c.equals(AvroWrapper.class);
    Configuration conf = getConf();
    Schema schema = isFinalOutput ? AvroJob.getOutputSchema(conf) : (AvroKey.class.isAssignableFrom(c) ? Pair.getKeySchema(AvroJob.getMapOutputSchema(conf)) : Pair.getValueSchema(AvroJob.getMapOutputSchema(conf)));
    ReflectDataFactory factory = Avros.getReflectDataFactory(conf);
    ReflectDatumWriter<T> writer = factory.getWriter();
    writer.setSchema(schema);
    return new AvroWrapperSerializer(writer);
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) Schema(org.apache.avro.Schema) AvroKey(org.apache.avro.mapred.AvroKey)

Example 5 with AvroKey

use of org.apache.avro.mapred.AvroKey in project components by Talend.

the class SimpleRecordFormatAvroIO method read.

@Override
public PCollection<IndexedRecord> read(PBegin in) {
    // Reuseable coder.
    LazyAvroCoder<Object> lac = LazyAvroCoder.of();
    AvroHdfsFileSource source = AvroHdfsFileSource.of(doAs, path, lac);
    source.getExtraHadoopConfiguration().addFrom(getExtraHadoopConfiguration());
    source.setLimit(limit);
    PCollection<KV<AvroKey, NullWritable>> read = // 
    in.apply(Read.from(source)).setCoder(source.getDefaultOutputCoder());
    PCollection<AvroKey> pc1 = read.apply(Keys.<AvroKey>create());
    PCollection<Object> pc2 = pc1.apply(ParDo.of(new ExtractRecordFromAvroKey()));
    pc2 = pc2.setCoder(lac);
    PCollection<IndexedRecord> pc3 = pc2.apply(ConvertToIndexedRecord.<Object>of());
    return pc3;
}
Also used : AvroHdfsFileSource(org.talend.components.simplefileio.runtime.sources.AvroHdfsFileSource) ConvertToIndexedRecord(org.talend.components.adapter.beam.transform.ConvertToIndexedRecord) IndexedRecord(org.apache.avro.generic.IndexedRecord) AvroKey(org.apache.avro.mapred.AvroKey) KV(org.apache.beam.sdk.values.KV)

Aggregations

AvroKey (org.apache.avro.mapred.AvroKey)12 NullWritable (org.apache.hadoop.io.NullWritable)7 Test (org.junit.Test)7 GenericRecord (org.apache.avro.generic.GenericRecord)5 IndexedRecord (org.apache.avro.generic.IndexedRecord)4 AvroValue (org.apache.avro.mapred.AvroValue)4 Pair (org.apache.hadoop.mrunit.types.Pair)4 Configuration (org.apache.hadoop.conf.Configuration)3 AvroIO (com.google.cloud.dataflow.sdk.io.AvroIO)2 WindowedValue (com.google.cloud.dataflow.sdk.util.WindowedValue)2 Schema (org.apache.avro.Schema)2 BytesWritable (org.apache.hadoop.io.BytesWritable)2 CannotProvideCoderException (com.google.cloud.dataflow.sdk.coders.CannotProvideCoderException)1 File (java.io.File)1 FileInputStream (java.io.FileInputStream)1 IOException (java.io.IOException)1 HashMap (java.util.HashMap)1 Set (java.util.Set)1 GenericRecordBuilder (org.apache.avro.generic.GenericRecordBuilder)1 AvroJob (org.apache.avro.mapreduce.AvroJob)1