Search in sources :

Example 16 with DataFileStream

use of org.apache.avro.file.DataFileStream in project beam by apache.

the class AvroIOTest method testMetadata.

@Test
@SuppressWarnings("unchecked")
@Category(NeedsRunner.class)
public void testMetadata() throws Exception {
    List<GenericClass> values = ImmutableList.of(new GenericClass(3, "hi"), new GenericClass(5, "bar"));
    File outputFile = tmpFolder.newFile("output.avro");
    p.apply(Create.of(values)).apply(AvroIO.write(GenericClass.class).to(outputFile.getAbsolutePath()).withoutSharding().withMetadata(ImmutableMap.<String, Object>of("stringKey", "stringValue", "longKey", 100L, "bytesKey", "bytesValue".getBytes())));
    p.run();
    DataFileStream dataFileStream = new DataFileStream(new FileInputStream(outputFile), new GenericDatumReader());
    assertEquals("stringValue", dataFileStream.getMetaString("stringKey"));
    assertEquals(100L, dataFileStream.getMetaLong("longKey"));
    assertArrayEquals("bytesValue".getBytes(), dataFileStream.getMeta("bytesKey"));
}
Also used : GenericDatumReader(org.apache.avro.generic.GenericDatumReader) DataFileStream(org.apache.avro.file.DataFileStream) File(java.io.File) FileInputStream(java.io.FileInputStream) Category(org.junit.experimental.categories.Category) Test(org.junit.Test)

Example 17 with DataFileStream

use of org.apache.avro.file.DataFileStream in project beam by apache.

the class AvroIOTest method testAvroIONullCodecWriteAndReadASingleFile.

@Test
@SuppressWarnings("unchecked")
@Category(NeedsRunner.class)
public void testAvroIONullCodecWriteAndReadASingleFile() throws Throwable {
    List<GenericClass> values = ImmutableList.of(new GenericClass(3, "hi"), new GenericClass(5, "bar"));
    File outputFile = tmpFolder.newFile("output.avro");
    p.apply(Create.of(values)).apply(AvroIO.write(GenericClass.class).to(outputFile.getAbsolutePath()).withoutSharding().withCodec(CodecFactory.nullCodec()));
    p.run();
    PCollection<GenericClass> input = p.apply(AvroIO.read(GenericClass.class).from(outputFile.getAbsolutePath()));
    PAssert.that(input).containsInAnyOrder(values);
    p.run();
    DataFileStream dataFileStream = new DataFileStream(new FileInputStream(outputFile), new GenericDatumReader());
    assertEquals("null", dataFileStream.getMetaString("avro.codec"));
}
Also used : GenericDatumReader(org.apache.avro.generic.GenericDatumReader) DataFileStream(org.apache.avro.file.DataFileStream) File(java.io.File) FileInputStream(java.io.FileInputStream) Category(org.junit.experimental.categories.Category) Test(org.junit.Test)

Example 18 with DataFileStream

use of org.apache.avro.file.DataFileStream in project cdap by caskdata.

the class DynamicPartitionerWithAvroTest method readOutput.

private Set<GenericRecord> readOutput(Location location) throws IOException {
    DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(SCHEMA);
    Set<GenericRecord> records = new HashSet<>();
    for (Location file : location.list()) {
        if (file.getName().endsWith(".avro")) {
            DataFileStream<GenericRecord> fileStream = new DataFileStream<>(file.getInputStream(), datumReader);
            Iterables.addAll(records, fileStream);
            fileStream.close();
        }
    }
    return records;
}
Also used : GenericDatumReader(org.apache.avro.generic.GenericDatumReader) GenericRecord(org.apache.avro.generic.GenericRecord) DataFileStream(org.apache.avro.file.DataFileStream) HashSet(java.util.HashSet) Location(org.apache.twill.filesystem.Location)

Aggregations

DataFileStream (org.apache.avro.file.DataFileStream)18 GenericRecord (org.apache.avro.generic.GenericRecord)13 Test (org.junit.Test)10 Schema (org.apache.avro.Schema)9 GenericDatumReader (org.apache.avro.generic.GenericDatumReader)9 FileInputStream (java.io.FileInputStream)8 File (java.io.File)5 ByteArrayInputStream (java.io.ByteArrayInputStream)4 HashMap (java.util.HashMap)4 SpecificDatumReader (org.apache.avro.specific.SpecificDatumReader)4 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)4 FSDataInputStream (org.apache.hadoop.fs.FSDataInputStream)4 Path (org.apache.hadoop.fs.Path)4 IOException (java.io.IOException)3 Field (org.apache.avro.Schema.Field)3 DataFileWriter (org.apache.avro.file.DataFileWriter)3 Category (org.junit.experimental.categories.Category)3 DimensionFieldSpec (com.linkedin.pinot.common.data.DimensionFieldSpec)2 FieldSpec (com.linkedin.pinot.common.data.FieldSpec)2 MetricFieldSpec (com.linkedin.pinot.common.data.MetricFieldSpec)2