Search in sources :

Example 36 with Partition

use of org.apache.samza.Partition in project samza by apache.

the class TestAvroFileHdfsReader method testRandomRead.

@Test
public void testRandomRead() throws Exception {
    SystemStreamPartition ssp = new SystemStreamPartition("hdfs", "testStream", new Partition(0));
    SingleFileHdfsReader reader = new AvroFileHdfsReader(ssp);
    reader.open(AVRO_FILE, "0");
    for (int i = 0; i < NUM_EVENTS / 2; i++) {
        reader.readNext();
    }
    String offset = reader.nextOffset();
    IncomingMessageEnvelope envelope = reader.readNext();
    Assert.assertEquals(offset, envelope.getOffset());
    GenericRecord record1 = (GenericRecord) envelope.getMessage();
    for (int i = 0; i < 5; i++) reader.readNext();
    // seek to the offset within the same reader
    reader.seek(offset);
    Assert.assertEquals(offset, reader.nextOffset());
    envelope = reader.readNext();
    Assert.assertEquals(offset, envelope.getOffset());
    GenericRecord record2 = (GenericRecord) envelope.getMessage();
    Assert.assertEquals(record1, record2);
    reader.close();
    // open a new reader and initialize it with the offset
    reader = new AvroFileHdfsReader(ssp);
    reader.open(AVRO_FILE, offset);
    envelope = reader.readNext();
    Assert.assertEquals(offset, envelope.getOffset());
    GenericRecord record3 = (GenericRecord) envelope.getMessage();
    Assert.assertEquals(record1, record3);
    reader.close();
}
Also used : Partition(org.apache.samza.Partition) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) IncomingMessageEnvelope(org.apache.samza.system.IncomingMessageEnvelope) GenericRecord(org.apache.avro.generic.GenericRecord) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) Test(org.junit.Test)

Example 37 with Partition

use of org.apache.samza.Partition in project samza by apache.

the class TestAvroFileHdfsReader method testFileReopen.

@Test
public void testFileReopen() throws Exception {
    SystemStreamPartition ssp = new SystemStreamPartition("hdfs", "testStream", new Partition(0));
    SingleFileHdfsReader reader = new AvroFileHdfsReader(ssp);
    reader.open(AVRO_FILE, "0");
    int index = 0;
    for (; index < NUM_EVENTS / 2; index++) {
        GenericRecord record = (GenericRecord) reader.readNext().getMessage();
        Assert.assertEquals(index, record.get(FIELD_1));
        Assert.assertEquals("string_" + index, record.get(FIELD_2).toString());
    }
    String offset = reader.nextOffset();
    reader.close();
    reader = new AvroFileHdfsReader(ssp);
    reader.open(AVRO_FILE, offset);
    for (; index < NUM_EVENTS; index++) {
        GenericRecord record = (GenericRecord) reader.readNext().getMessage();
        Assert.assertEquals(index, record.get(FIELD_1));
        Assert.assertEquals("string_" + index, record.get(FIELD_2).toString());
    }
    Assert.assertEquals(NUM_EVENTS, index);
    reader.close();
}
Also used : Partition(org.apache.samza.Partition) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) GenericRecord(org.apache.avro.generic.GenericRecord) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) Test(org.junit.Test)

Example 38 with Partition

use of org.apache.samza.Partition in project samza by apache.

the class TestAvroFileHdfsReader method testSequentialRead.

@Test
public void testSequentialRead() throws Exception {
    SystemStreamPartition ssp = new SystemStreamPartition("hdfs", "testStream", new Partition(0));
    SingleFileHdfsReader reader = new AvroFileHdfsReader(ssp);
    reader.open(AVRO_FILE, "0");
    int index = 0;
    while (reader.hasNext()) {
        GenericRecord record = (GenericRecord) reader.readNext().getMessage();
        Assert.assertEquals(index, record.get(FIELD_1));
        Assert.assertEquals("string_" + index, record.get(FIELD_2).toString());
        index++;
    }
    Assert.assertEquals(NUM_EVENTS, index);
    reader.close();
}
Also used : Partition(org.apache.samza.Partition) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) GenericRecord(org.apache.avro.generic.GenericRecord) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) Test(org.junit.Test)

Example 39 with Partition

use of org.apache.samza.Partition in project samza by apache.

the class TestMultiFileHdfsReader method testReaderReopen.

@Test
public void testReaderReopen() throws Exception {
    SystemStreamPartition ssp = new SystemStreamPartition("hdfs", "testStream", new Partition(0));
    // read until the middle of the first file
    MultiFileHdfsReader multiReader = new MultiFileHdfsReader(HdfsReaderFactory.ReaderType.AVRO, ssp, Arrays.asList(descriptors), "0:0");
    int index = 0;
    String offset = "0:0";
    for (; index < NUM_EVENTS / 2; index++) {
        IncomingMessageEnvelope envelope = multiReader.readNext();
        GenericRecord record = (GenericRecord) envelope.getMessage();
        Assert.assertEquals(index % NUM_EVENTS, record.get(FIELD_1));
        Assert.assertEquals("string_" + (index % NUM_EVENTS), record.get(FIELD_2).toString());
        offset = envelope.getOffset();
    }
    multiReader.close();
    // read until the middle of the second file
    multiReader = new MultiFileHdfsReader(HdfsReaderFactory.ReaderType.AVRO, ssp, Arrays.asList(descriptors), offset);
    // skip one duplicate event
    multiReader.readNext();
    for (; index < NUM_EVENTS + NUM_EVENTS / 2; index++) {
        IncomingMessageEnvelope envelope = multiReader.readNext();
        GenericRecord record = (GenericRecord) envelope.getMessage();
        Assert.assertEquals(index % NUM_EVENTS, record.get(FIELD_1));
        Assert.assertEquals("string_" + (index % NUM_EVENTS), record.get(FIELD_2).toString());
        offset = envelope.getOffset();
    }
    multiReader.close();
    // read the rest of all files
    multiReader = new MultiFileHdfsReader(HdfsReaderFactory.ReaderType.AVRO, ssp, Arrays.asList(descriptors), offset);
    // skip one duplicate event
    multiReader.readNext();
    while (multiReader.hasNext()) {
        IncomingMessageEnvelope envelope = multiReader.readNext();
        GenericRecord record = (GenericRecord) envelope.getMessage();
        Assert.assertEquals(index % NUM_EVENTS, record.get(FIELD_1));
        Assert.assertEquals("string_" + (index % NUM_EVENTS), record.get(FIELD_2).toString());
        index++;
        offset = envelope.getOffset();
    }
    Assert.assertEquals(3 * NUM_EVENTS, index);
    multiReader.close();
    // reopen with the offset of the last record
    multiReader = new MultiFileHdfsReader(HdfsReaderFactory.ReaderType.AVRO, ssp, Arrays.asList(descriptors), offset);
    // skip one duplicate event
    multiReader.readNext();
    Assert.assertFalse(multiReader.hasNext());
    multiReader.close();
}
Also used : Partition(org.apache.samza.Partition) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) IncomingMessageEnvelope(org.apache.samza.system.IncomingMessageEnvelope) GenericRecord(org.apache.avro.generic.GenericRecord) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) Test(org.junit.Test)

Example 40 with Partition

use of org.apache.samza.Partition in project samza by apache.

the class TestMultiFileHdfsReader method testOutOfRangeSingleFileOffset.

@Test(expected = SamzaException.class)
public void testOutOfRangeSingleFileOffset() {
    SystemStreamPartition ssp = new SystemStreamPartition("hdfs", "testStream", new Partition(0));
    new MultiFileHdfsReader(HdfsReaderFactory.ReaderType.AVRO, ssp, Arrays.asList(descriptors), "0:1000000&0");
    Assert.fail();
}
Also used : Partition(org.apache.samza.Partition) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) Test(org.junit.Test)

Aggregations

Partition (org.apache.samza.Partition)42 Test (org.junit.Test)31 SystemStreamPartition (org.apache.samza.system.SystemStreamPartition)30 List (java.util.List)15 HashMap (java.util.HashMap)13 IncomingMessageEnvelope (org.apache.samza.system.IncomingMessageEnvelope)11 ArrayList (java.util.ArrayList)10 SystemStreamPartitionMetadata (org.apache.samza.system.SystemStreamMetadata.SystemStreamPartitionMetadata)8 HashSet (java.util.HashSet)7 FileMetadata (org.apache.samza.system.hdfs.partitioner.FileSystemAdapter.FileMetadata)7 GenericRecord (org.apache.avro.generic.GenericRecord)6 TaskName (org.apache.samza.container.TaskName)6 SamzaException (org.apache.samza.SamzaException)5 Config (org.apache.samza.config.Config)5 SystemStreamMetadata (org.apache.samza.system.SystemStreamMetadata)5 SystemStream (org.apache.samza.system.SystemStream)4 LinkedHashMap (java.util.LinkedHashMap)3 MapConfig (org.apache.samza.config.MapConfig)3 SinglePartitionWithoutOffsetsSystemAdmin (org.apache.samza.util.SinglePartitionWithoutOffsetsSystemAdmin)3 MetricsRegistryMap (org.apache.samza.metrics.MetricsRegistryMap)2