Search in sources :

Example 1 with SerializableSplit

use of org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.SerializableSplit in project beam by apache.

the class HadoopInputFormatIOTest method testComputeSplitsIfGetSplitsReturnsEmptyList.

/**
   * This test validates behavior of
   * {@link HadoopInputFormatBoundedSource#computeSplitsIfNecessary() computeSplits()} when Hadoop
   * InputFormat's {@link InputFormat#getSplits(JobContext)} returns empty list.
   */
@Test
public void testComputeSplitsIfGetSplitsReturnsEmptyList() throws Exception {
    InputFormat<?, ?> mockInputFormat = Mockito.mock(EmployeeInputFormat.class);
    SerializableSplit mockInputSplit = Mockito.mock(SerializableSplit.class);
    Mockito.when(mockInputFormat.getSplits(Mockito.any(JobContext.class))).thenReturn(new ArrayList<InputSplit>());
    HadoopInputFormatBoundedSource<Text, Employee> hifSource = new HadoopInputFormatBoundedSource<Text, Employee>(serConf, WritableCoder.of(Text.class), AvroCoder.of(Employee.class), // No key translation required.
    null, // No value translation required.
    null, mockInputSplit);
    thrown.expect(IOException.class);
    thrown.expectMessage("Error in computing splits, getSplits() returns a empty list");
    hifSource.setInputFormatObj(mockInputFormat);
    hifSource.computeSplitsIfNecessary();
}
Also used : HadoopInputFormatBoundedSource(org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.HadoopInputFormatBoundedSource) SerializableSplit(org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.SerializableSplit) Text(org.apache.hadoop.io.Text) JobContext(org.apache.hadoop.mapreduce.JobContext) InputSplit(org.apache.hadoop.mapreduce.InputSplit) NewObjectsEmployeeInputSplit(org.apache.beam.sdk.io.hadoop.inputformat.EmployeeInputFormat.NewObjectsEmployeeInputSplit) Test(org.junit.Test)

Example 2 with SerializableSplit

use of org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.SerializableSplit in project beam by apache.

the class HadoopInputFormatIOTest method testGetCurrentSourceFunction.

/**
   * This test verifies that the method
   * {@link HadoopInputFormatBoundedSource.HadoopInputFormatReader#getCurrentSource()
   * getCurrentSource()} returns correct source object.
   */
@Test
public void testGetCurrentSourceFunction() throws Exception {
    SerializableSplit split = new SerializableSplit();
    BoundedSource<KV<Text, Employee>> source = new HadoopInputFormatBoundedSource<Text, Employee>(serConf, WritableCoder.of(Text.class), AvroCoder.of(Employee.class), // No key translation required.
    null, // No value translation required.
    null, split);
    BoundedReader<KV<Text, Employee>> hifReader = source.createReader(p.getOptions());
    BoundedSource<KV<Text, Employee>> hifSource = hifReader.getCurrentSource();
    assertEquals(hifSource, source);
}
Also used : HadoopInputFormatBoundedSource(org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.HadoopInputFormatBoundedSource) SerializableSplit(org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.SerializableSplit) Text(org.apache.hadoop.io.Text) KV(org.apache.beam.sdk.values.KV) Test(org.junit.Test)

Example 3 with SerializableSplit

use of org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.SerializableSplit in project beam by apache.

the class HadoopInputFormatIOTest method testReadIfCreateRecordReaderFails.

/**
   * This test validates behavior of {@link HadoopInputFormatBoundedSource} if RecordReader object
   * creation fails.
   */
@Test
public void testReadIfCreateRecordReaderFails() throws Exception {
    thrown.expect(Exception.class);
    thrown.expectMessage("Exception in creating RecordReader");
    InputFormat<Text, Employee> mockInputFormat = Mockito.mock(EmployeeInputFormat.class);
    Mockito.when(mockInputFormat.createRecordReader(Mockito.any(InputSplit.class), Mockito.any(TaskAttemptContext.class))).thenThrow(new IOException("Exception in creating RecordReader"));
    HadoopInputFormatBoundedSource<Text, Employee> boundedSource = new HadoopInputFormatBoundedSource<Text, Employee>(serConf, WritableCoder.of(Text.class), AvroCoder.of(Employee.class), // No key translation required.
    null, // No value translation required.
    null, new SerializableSplit());
    boundedSource.setInputFormatObj(mockInputFormat);
    SourceTestUtils.readFromSource(boundedSource, p.getOptions());
}
Also used : HadoopInputFormatBoundedSource(org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.HadoopInputFormatBoundedSource) SerializableSplit(org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.SerializableSplit) Text(org.apache.hadoop.io.Text) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) IOException(java.io.IOException) InputSplit(org.apache.hadoop.mapreduce.InputSplit) NewObjectsEmployeeInputSplit(org.apache.beam.sdk.io.hadoop.inputformat.EmployeeInputFormat.NewObjectsEmployeeInputSplit) Test(org.junit.Test)

Example 4 with SerializableSplit

use of org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.SerializableSplit in project beam by apache.

the class HadoopInputFormatIOTest method testComputeSplitsIfGetSplitsReturnsListHavingNullValues.

/**
   * This test validates behavior of
   * {@link HadoopInputFormatBoundedSource#computeSplitsIfNecessary() computeSplits()} if Hadoop
   * InputFormat's {@link InputFormat#getSplits() getSplits()} returns InputSplit list having some
   * null values.
   */
@Test
public void testComputeSplitsIfGetSplitsReturnsListHavingNullValues() throws Exception {
    // InputSplit list having null value.
    InputSplit mockInputSplit = Mockito.mock(InputSplit.class, Mockito.withSettings().extraInterfaces(Writable.class));
    List<InputSplit> inputSplitList = new ArrayList<InputSplit>();
    inputSplitList.add(mockInputSplit);
    inputSplitList.add(null);
    InputFormat<Text, Employee> mockInputFormat = Mockito.mock(EmployeeInputFormat.class);
    Mockito.when(mockInputFormat.getSplits(Mockito.any(JobContext.class))).thenReturn(inputSplitList);
    HadoopInputFormatBoundedSource<Text, Employee> hifSource = new HadoopInputFormatBoundedSource<Text, Employee>(serConf, WritableCoder.of(Text.class), AvroCoder.of(Employee.class), // No key translation required.
    null, // No value translation required.
    null, new SerializableSplit());
    thrown.expect(IOException.class);
    thrown.expectMessage("Error in computing splits, split is null in InputSplits list populated " + "by getSplits() : ");
    hifSource.setInputFormatObj(mockInputFormat);
    hifSource.computeSplitsIfNecessary();
}
Also used : ArrayList(java.util.ArrayList) HadoopInputFormatBoundedSource(org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.HadoopInputFormatBoundedSource) Writable(org.apache.hadoop.io.Writable) LongWritable(org.apache.hadoop.io.LongWritable) SerializableSplit(org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.SerializableSplit) Text(org.apache.hadoop.io.Text) JobContext(org.apache.hadoop.mapreduce.JobContext) InputSplit(org.apache.hadoop.mapreduce.InputSplit) NewObjectsEmployeeInputSplit(org.apache.beam.sdk.io.hadoop.inputformat.EmployeeInputFormat.NewObjectsEmployeeInputSplit) Test(org.junit.Test)

Example 5 with SerializableSplit

use of org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.SerializableSplit in project beam by apache.

the class HadoopInputFormatIOTest method testComputeSplitsIfGetSplitsReturnsNullValue.

/**
   * This test validates behavior of
   * {@link HadoopInputFormatBoundedSource#computeSplitsIfNecessary() computeSplits()} when Hadoop
   * InputFormat's {@link InputFormat#getSplits() getSplits()} returns NULL value.
   */
@Test
public void testComputeSplitsIfGetSplitsReturnsNullValue() throws Exception {
    InputFormat<Text, Employee> mockInputFormat = Mockito.mock(EmployeeInputFormat.class);
    SerializableSplit mockInputSplit = Mockito.mock(SerializableSplit.class);
    Mockito.when(mockInputFormat.getSplits(Mockito.any(JobContext.class))).thenReturn(null);
    HadoopInputFormatBoundedSource<Text, Employee> hifSource = new HadoopInputFormatBoundedSource<Text, Employee>(serConf, WritableCoder.of(Text.class), AvroCoder.of(Employee.class), // No key translation required.
    null, // No value translation required.
    null, mockInputSplit);
    thrown.expect(IOException.class);
    thrown.expectMessage("Error in computing splits, getSplits() returns null.");
    hifSource.setInputFormatObj(mockInputFormat);
    hifSource.computeSplitsIfNecessary();
}
Also used : HadoopInputFormatBoundedSource(org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.HadoopInputFormatBoundedSource) SerializableSplit(org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.SerializableSplit) Text(org.apache.hadoop.io.Text) JobContext(org.apache.hadoop.mapreduce.JobContext) Test(org.junit.Test)

Aggregations

HadoopInputFormatBoundedSource (org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.HadoopInputFormatBoundedSource)10 SerializableSplit (org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO.SerializableSplit)10 Text (org.apache.hadoop.io.Text)10 Test (org.junit.Test)10 NewObjectsEmployeeInputSplit (org.apache.beam.sdk.io.hadoop.inputformat.EmployeeInputFormat.NewObjectsEmployeeInputSplit)7 InputSplit (org.apache.hadoop.mapreduce.InputSplit)7 KV (org.apache.beam.sdk.values.KV)4 TaskAttemptContext (org.apache.hadoop.mapreduce.TaskAttemptContext)4 JobContext (org.apache.hadoop.mapreduce.JobContext)3 EmployeeRecordReader (org.apache.beam.sdk.io.hadoop.inputformat.EmployeeInputFormat.EmployeeRecordReader)2 IOException (java.io.IOException)1 ArrayList (java.util.ArrayList)1 DisplayData (org.apache.beam.sdk.transforms.display.DisplayData)1 LongWritable (org.apache.hadoop.io.LongWritable)1 Writable (org.apache.hadoop.io.Writable)1 InputFormat (org.apache.hadoop.mapreduce.InputFormat)1