Search in sources :

Example 21 with RecordReader

use of org.apache.hadoop.mapreduce.RecordReader in project druid by druid-io.

the class BaseParquetInputTest method getFirstRow.

static Object getFirstRow(Job job, String parserType, String parquetPath) throws IOException, InterruptedException {
    File testFile = new File(parquetPath);
    Path path = new Path(testFile.getAbsoluteFile().toURI());
    FileSplit split = new FileSplit(path, 0, testFile.length(), null);
    InputFormat inputFormat = ReflectionUtils.newInstance(INPUT_FORMAT_CLASSES.get(parserType), job.getConfiguration());
    TaskAttemptContext context = new TaskAttemptContextImpl(job.getConfiguration(), new TaskAttemptID());
    try (RecordReader reader = inputFormat.createRecordReader(split, context)) {
        reader.initialize(split, context);
        reader.nextKeyValue();
        return reader.getCurrentValue();
    }
}
Also used : Path(org.apache.hadoop.fs.Path) InputFormat(org.apache.hadoop.mapreduce.InputFormat) TaskAttemptID(org.apache.hadoop.mapreduce.TaskAttemptID) TaskAttemptContextImpl(org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl) RecordReader(org.apache.hadoop.mapreduce.RecordReader) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) FileSplit(org.apache.hadoop.mapreduce.lib.input.FileSplit) File(java.io.File)

Example 22 with RecordReader

use of org.apache.hadoop.mapreduce.RecordReader in project mongo-hadoop by mongodb.

the class GridFSInputFormatTest method testRecordReader.

@Test
public void testRecordReader() throws IOException, InterruptedException {
    List<InputSplit> splits = getSplits();
    Configuration conf = getConfiguration();
    // Split README by sections in Markdown.
    MongoConfigUtil.setGridFSDelimiterPattern(conf, "#+");
    TaskAttemptContext context = mockTaskAttemptContext(conf);
    List<String> sections = new ArrayList<String>();
    for (InputSplit split : splits) {
        RecordReader reader = new GridFSInputFormat.GridFSTextRecordReader();
        reader.initialize(split, context);
        while (reader.nextKeyValue()) {
            sections.add(reader.getCurrentValue().toString());
        }
    }
    assertEquals(Arrays.asList(readmeSections), sections);
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) RecordReader(org.apache.hadoop.mapreduce.RecordReader) ArrayList(java.util.ArrayList) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) InputSplit(org.apache.hadoop.mapreduce.InputSplit) Test(org.junit.Test) BaseHadoopTest(com.mongodb.hadoop.testutils.BaseHadoopTest)

Example 23 with RecordReader

use of org.apache.hadoop.mapreduce.RecordReader in project elephant-bird by twitter.

the class TestLocationAsTuple method testSimpleLoad.

@Test
public void testSimpleLoad() throws IOException {
    Configuration conf = new Configuration();
    Job job = EasyMock.createMock(Job.class);
    EasyMock.expect(HadoopCompat.getConfiguration(job)).andStubReturn(conf);
    EasyMock.replay(job);
    LoadFunc loader = new LocationAsTuple();
    loader.setUDFContextSignature("foo");
    loader.setLocation("a\tb", job);
    RecordReader reader = EasyMock.createMock(RecordReader.class);
    PigSplit split = EasyMock.createMock(PigSplit.class);
    EasyMock.expect(split.getConf()).andStubReturn(conf);
    loader.prepareToRead(reader, split);
    Tuple next = loader.getNext();
    assertEquals("a", next.get(0));
    assertEquals("b", next.get(1));
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) RecordReader(org.apache.hadoop.mapreduce.RecordReader) PigSplit(org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit) Job(org.apache.hadoop.mapreduce.Job) Tuple(org.apache.pig.data.Tuple) LoadFunc(org.apache.pig.LoadFunc) Test(org.junit.Test)

Example 24 with RecordReader

use of org.apache.hadoop.mapreduce.RecordReader in project elephant-bird by twitter.

the class TestLocationAsTuple method testTokenizedLoad.

@Test
public void testTokenizedLoad() throws IOException {
    Configuration conf = new Configuration();
    Job job = EasyMock.createMock(Job.class);
    EasyMock.expect(HadoopCompat.getConfiguration(job)).andStubReturn(conf);
    EasyMock.replay(job);
    LoadFunc loader = new LocationAsTuple(",");
    loader.setUDFContextSignature("foo");
    loader.setLocation("a,b\tc", job);
    RecordReader reader = EasyMock.createMock(RecordReader.class);
    PigSplit split = EasyMock.createMock(PigSplit.class);
    EasyMock.expect(split.getConf()).andStubReturn(conf);
    loader.prepareToRead(reader, split);
    Tuple next = loader.getNext();
    assertEquals("a", next.get(0));
    assertEquals("b\tc", next.get(1));
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) RecordReader(org.apache.hadoop.mapreduce.RecordReader) PigSplit(org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit) Job(org.apache.hadoop.mapreduce.Job) Tuple(org.apache.pig.data.Tuple) LoadFunc(org.apache.pig.LoadFunc) Test(org.junit.Test)

Aggregations

RecordReader (org.apache.hadoop.mapreduce.RecordReader)24 TaskAttemptContext (org.apache.hadoop.mapreduce.TaskAttemptContext)17 Configuration (org.apache.hadoop.conf.Configuration)13 InputSplit (org.apache.hadoop.mapreduce.InputSplit)13 TaskAttemptID (org.apache.hadoop.mapreduce.TaskAttemptID)11 TaskAttemptContextImpl (org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl)11 InputFormat (org.apache.hadoop.mapreduce.InputFormat)9 Job (org.apache.hadoop.mapreduce.Job)8 Test (org.junit.Test)8 ArrayList (java.util.ArrayList)7 Path (org.apache.hadoop.fs.Path)7 FileSplit (org.apache.hadoop.mapreduce.lib.input.FileSplit)6 IOException (java.io.IOException)4 File (java.io.File)3 FileSystem (org.apache.hadoop.fs.FileSystem)3 Mapper (org.apache.hadoop.mapreduce.Mapper)3 WrappedMapper (org.apache.hadoop.mapreduce.lib.map.WrappedMapper)3 Scan (org.apache.hadoop.hbase.client.Scan)2 RecordWriter (org.apache.hadoop.mapreduce.RecordWriter)2 JobContextImpl (org.apache.hadoop.mapreduce.task.JobContextImpl)2