Search in sources :

Example 21 with InputFormat

use of org.apache.hadoop.mapreduce.InputFormat in project druid by druid-io.

the class BaseParquetInputTest method getFirstRow.

static Object getFirstRow(Job job, String parserType, String parquetPath) throws IOException, InterruptedException {
    File testFile = new File(parquetPath);
    Path path = new Path(testFile.getAbsoluteFile().toURI());
    FileSplit split = new FileSplit(path, 0, testFile.length(), null);
    InputFormat inputFormat = ReflectionUtils.newInstance(INPUT_FORMAT_CLASSES.get(parserType), job.getConfiguration());
    TaskAttemptContext context = new TaskAttemptContextImpl(job.getConfiguration(), new TaskAttemptID());
    try (RecordReader reader = inputFormat.createRecordReader(split, context)) {
        reader.initialize(split, context);
        reader.nextKeyValue();
        return reader.getCurrentValue();
    }
}
Also used : Path(org.apache.hadoop.fs.Path) InputFormat(org.apache.hadoop.mapreduce.InputFormat) TaskAttemptID(org.apache.hadoop.mapreduce.TaskAttemptID) TaskAttemptContextImpl(org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl) RecordReader(org.apache.hadoop.mapreduce.RecordReader) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) FileSplit(org.apache.hadoop.mapreduce.lib.input.FileSplit) File(java.io.File)

Aggregations

InputFormat (org.apache.hadoop.mapreduce.InputFormat)21 Configuration (org.apache.hadoop.conf.Configuration)11 Path (org.apache.hadoop.fs.Path)11 InputSplit (org.apache.hadoop.mapreduce.InputSplit)10 Job (org.apache.hadoop.mapreduce.Job)9 RecordReader (org.apache.hadoop.mapreduce.RecordReader)9 TaskAttemptContext (org.apache.hadoop.mapreduce.TaskAttemptContext)9 TaskAttemptID (org.apache.hadoop.mapreduce.TaskAttemptID)7 TaskAttemptContextImpl (org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl)7 Test (org.junit.Test)7 ArrayList (java.util.ArrayList)5 HashMap (java.util.HashMap)3 Map (java.util.Map)3 Mapper (org.apache.hadoop.mapreduce.Mapper)3 FileSplit (org.apache.hadoop.mapreduce.lib.input.FileSplit)3 File (java.io.File)2 List (java.util.List)2 KV (org.apache.beam.sdk.values.KV)2 Text (org.apache.hadoop.io.Text)2 FileInputFormat (org.apache.hadoop.mapreduce.lib.input.FileInputFormat)2