Examples with HadoopInputFormatBoundedSource - org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource

Example 6 with HadoopInputFormatBoundedSource

use of org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource in project beam by apache.

the class HadoopFormatIOReadTest method testSkipKeyValueClone.

/**
 * This test validates that in case reader is instructed to not to clone key value records, then
 * key value records are exactly the same as output from the source no mater if they are mutable
 * or immutable. This override setting is useful to turn on when using key-value translation
 * functions and avoid possibly unnecessary copy.
 */
@Test
public void testSkipKeyValueClone() throws Exception {
    SerializableConfiguration serConf = loadTestConfiguration(EmployeeInputFormat.class, Text.class, Employee.class);
    // with skip clone 'true' it should produce different instances of key/value
    List<BoundedSource<KV<Text, Employee>>> sources = new HadoopInputFormatBoundedSource<>(serConf, WritableCoder.of(Text.class), AvroCoder.of(Employee.class), new SingletonTextFn(), new SingletonEmployeeFn(), true, true).split(0, p.getOptions());
    for (BoundedSource<KV<Text, Employee>> source : sources) {
        List<KV<Text, Employee>> elems = SourceTestUtils.readFromSource(source, p.getOptions());
        for (KV<Text, Employee> elem : elems) {
            Assert.assertSame(SingletonTextFn.TEXT, elem.getKey());
            Assert.assertEquals(SingletonTextFn.TEXT, elem.getKey());
            Assert.assertSame(SingletonEmployeeFn.EMPLOYEE, elem.getValue());
            Assert.assertEquals(SingletonEmployeeFn.EMPLOYEE, elem.getValue());
        }
    }
    // with skip clone 'false' it should produce different instances of value
    sources = new HadoopInputFormatBoundedSource<>(serConf, WritableCoder.of(Text.class), AvroCoder.of(Employee.class), new SingletonTextFn(), new SingletonEmployeeFn(), false, false).split(0, p.getOptions());
    for (BoundedSource<KV<Text, Employee>> source : sources) {
        List<KV<Text, Employee>> elems = SourceTestUtils.readFromSource(source, p.getOptions());
        for (KV<Text, Employee> elem : elems) {
            Assert.assertNotSame(SingletonTextFn.TEXT, elem.getKey());
            Assert.assertEquals(SingletonTextFn.TEXT, elem.getKey());
            Assert.assertNotSame(SingletonEmployeeFn.EMPLOYEE, elem.getValue());
            Assert.assertEquals(SingletonEmployeeFn.EMPLOYEE, elem.getValue());
        }
    }
}

Also used : HadoopInputFormatBoundedSource(org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource) BoundedSource(org.apache.beam.sdk.io.BoundedSource) SerializableConfiguration(org.apache.beam.sdk.io.hadoop.SerializableConfiguration) HadoopInputFormatBoundedSource(org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource) Text(org.apache.hadoop.io.Text) KV(org.apache.beam.sdk.values.KV) Test(org.junit.Test)

Example 7 with HadoopInputFormatBoundedSource

use of org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource in project beam by apache.

the class HadoopFormatIOReadTest method testReadDisplayData.

/**
 * This test validates functionality of {@link
 * HadoopInputFormatBoundedSource#populateDisplayData(DisplayData.Builder)}
 * populateDisplayData(DisplayData.Builder)}.
 */
@Test
public void testReadDisplayData() {
    HadoopInputFormatBoundedSource<Text, Employee> boundedSource = new HadoopInputFormatBoundedSource<>(serConf, WritableCoder.of(Text.class), AvroCoder.of(Employee.class), // No key translation required.
    null, // No value translation required.
    null, new SerializableSplit(), false, false);
    DisplayData displayData = DisplayData.from(boundedSource);
    assertThat(displayData, hasDisplayItem("mapreduce.job.inputformat.class", serConf.get().get("mapreduce.job.inputformat.class")));
    assertThat(displayData, hasDisplayItem("key.class", serConf.get().get("key.class")));
    assertThat(displayData, hasDisplayItem("value.class", serConf.get().get("value.class")));
}

Example 8 with HadoopInputFormatBoundedSource

use of org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource in project beam by apache.

the class HadoopFormatIOReadTest method testReadIfCreateRecordReaderFails.

/**
 * This test validates behavior of {@link HadoopInputFormatBoundedSource} if RecordReader object
 * creation fails.
 */
@Test
public void testReadIfCreateRecordReaderFails() throws Exception {
    thrown.expect(Exception.class);
    thrown.expectMessage("Exception in creating RecordReader");
    InputFormat<Text, Employee> mockInputFormat = Mockito.mock(EmployeeInputFormat.class);
    Mockito.when(mockInputFormat.createRecordReader(Mockito.any(InputSplit.class), Mockito.any(TaskAttemptContext.class))).thenThrow(new IOException("Exception in creating RecordReader"));
    HadoopInputFormatBoundedSource<Text, Employee> boundedSource = new HadoopInputFormatBoundedSource<>(serConf, WritableCoder.of(Text.class), AvroCoder.of(Employee.class), // No key translation required.
    null, // No value translation required.
    null, new SerializableSplit(), false, false);
    boundedSource.setInputFormatObj(mockInputFormat);
    SourceTestUtils.readFromSource(boundedSource, p.getOptions());
}

Also used : HadoopInputFormatBoundedSource(org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource) SerializableSplit(org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.SerializableSplit) Text(org.apache.hadoop.io.Text) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) IOException(java.io.IOException) NewObjectsEmployeeInputSplit(org.apache.beam.sdk.io.hadoop.format.EmployeeInputFormat.NewObjectsEmployeeInputSplit) InputSplit(org.apache.hadoop.mapreduce.InputSplit) Test(org.junit.Test)

Example 9 with HadoopInputFormatBoundedSource

use of org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource in project beam by apache.

the class HadoopFormatIOReadTest method testGetFractionConsumedForBadProgressValue.

/**
 * This test validates the method getFractionConsumed()- when a bad progress value is returned by
 * the inputformat.
 */
@Test
public void testGetFractionConsumedForBadProgressValue() throws Exception {
    InputFormat<Text, Employee> mockInputFormat = Mockito.mock(EmployeeInputFormat.class);
    EmployeeRecordReader mockReader = Mockito.mock(EmployeeRecordReader.class);
    Mockito.when(mockInputFormat.createRecordReader(Mockito.any(), Mockito.any())).thenReturn(mockReader);
    Mockito.when(mockReader.nextKeyValue()).thenReturn(true);
    // Set to a bad value , not in range of 0 to 1
    Mockito.when(mockReader.getProgress()).thenReturn(2.0F);
    InputSplit mockInputSplit = Mockito.mock(NewObjectsEmployeeInputSplit.class);
    HadoopInputFormatBoundedSource<Text, Employee> boundedSource = new HadoopInputFormatBoundedSource<>(serConf, WritableCoder.of(Text.class), AvroCoder.of(Employee.class), // No key translation required.
    null, // No value translation required.
    null, new SerializableSplit(mockInputSplit), false, false);
    boundedSource.setInputFormatObj(mockInputFormat);
    BoundedReader<KV<Text, Employee>> reader = boundedSource.createReader(p.getOptions());
    assertEquals(Double.valueOf(0), reader.getFractionConsumed());
    boolean start = reader.start();
    assertTrue(start);
    if (start) {
        boolean advance = reader.advance();
        assertEquals(null, reader.getFractionConsumed());
        assertTrue(advance);
        if (advance) {
            advance = reader.advance();
            assertEquals(null, reader.getFractionConsumed());
        }
    }
    // Validate if getFractionConsumed() returns null after few number of reads as getProgress
    // returns invalid value '2' which is not in the range of 0 to 1.
    assertEquals(null, reader.getFractionConsumed());
    reader.close();
}

Also used : EmployeeRecordReader(org.apache.beam.sdk.io.hadoop.format.EmployeeInputFormat.EmployeeRecordReader) HadoopInputFormatBoundedSource(org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource) SerializableSplit(org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.SerializableSplit) Text(org.apache.hadoop.io.Text) KV(org.apache.beam.sdk.values.KV) NewObjectsEmployeeInputSplit(org.apache.beam.sdk.io.hadoop.format.EmployeeInputFormat.NewObjectsEmployeeInputSplit) InputSplit(org.apache.hadoop.mapreduce.InputSplit) Test(org.junit.Test)

Example 10 with HadoopInputFormatBoundedSource

use of org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource in project beam by apache.

the class HadoopFormatIOReadTest method testReadWithNullCreateRecordReader.

/**
 * This test validates behavior of HadoopInputFormatSource if {@link
 * InputFormat#createRecordReader(InputSplit, TaskAttemptContext)} createRecordReader(InputSplit,
 * TaskAttemptContext)} of InputFormat returns null.
 */
@Test
public void testReadWithNullCreateRecordReader() throws Exception {
    InputFormat<Text, Employee> mockInputFormat = Mockito.mock(EmployeeInputFormat.class);
    thrown.expect(IOException.class);
    thrown.expectMessage(String.format("Null RecordReader object returned by %s", mockInputFormat.getClass()));
    Mockito.when(mockInputFormat.createRecordReader(Mockito.any(InputSplit.class), Mockito.any(TaskAttemptContext.class))).thenReturn(null);
    HadoopInputFormatBoundedSource<Text, Employee> boundedSource = new HadoopInputFormatBoundedSource<>(serConf, WritableCoder.of(Text.class), AvroCoder.of(Employee.class), // No key translation required.
    null, // No value translation required.
    null, new SerializableSplit(), false, false);
    boundedSource.setInputFormatObj(mockInputFormat);
    SourceTestUtils.readFromSource(boundedSource, p.getOptions());
}

Also used : HadoopInputFormatBoundedSource(org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource) SerializableSplit(org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.SerializableSplit) Text(org.apache.hadoop.io.Text) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) NewObjectsEmployeeInputSplit(org.apache.beam.sdk.io.hadoop.format.EmployeeInputFormat.NewObjectsEmployeeInputSplit) InputSplit(org.apache.hadoop.mapreduce.InputSplit) Test(org.junit.Test)

Aggregations

HadoopInputFormatBoundedSource (org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource)11 Text (org.apache.hadoop.io.Text)11 Test (org.junit.Test)11 SerializableSplit (org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.SerializableSplit)10 NewObjectsEmployeeInputSplit (org.apache.beam.sdk.io.hadoop.format.EmployeeInputFormat.NewObjectsEmployeeInputSplit)6 InputSplit (org.apache.hadoop.mapreduce.InputSplit)6 KV (org.apache.beam.sdk.values.KV)5 JobContext (org.apache.hadoop.mapreduce.JobContext)3 EmployeeRecordReader (org.apache.beam.sdk.io.hadoop.format.EmployeeInputFormat.EmployeeRecordReader)2 TaskAttemptContext (org.apache.hadoop.mapreduce.TaskAttemptContext)2 IOException (java.io.IOException)1 ArrayList (java.util.ArrayList)1 BoundedSource (org.apache.beam.sdk.io.BoundedSource)1 SerializableConfiguration (org.apache.beam.sdk.io.hadoop.SerializableConfiguration)1 DisplayData (org.apache.beam.sdk.transforms.display.DisplayData)1 LongWritable (org.apache.hadoop.io.LongWritable)1 Writable (org.apache.hadoop.io.Writable)1 InputFormat (org.apache.hadoop.mapreduce.InputFormat)1 DBInputFormat (org.apache.hadoop.mapreduce.lib.db.DBInputFormat)1