Search in sources :

Example 21 with JobContext

use of org.apache.hadoop.mapreduce.JobContext in project parquet-mr by apache.

the class PerfTest2 method load.

static void load(String out, int colsToLoad, StringBuilder results) throws Exception {
    StringBuilder schemaString = new StringBuilder("a0: chararray");
    for (int i = 1; i < colsToLoad; i++) {
        schemaString.append(", a" + i + ": chararray");
    }
    long t0 = System.currentTimeMillis();
    Job job = new Job(conf);
    int loadjobId = jobid++;
    LoadFunc loadFunc = new ParquetLoader(schemaString.toString());
    loadFunc.setUDFContextSignature("sigLoader" + loadjobId);
    String absPath = loadFunc.relativeToAbsolutePath(out, new Path(new File(".").getAbsoluteFile().toURI()));
    loadFunc.setLocation(absPath, job);
    // that's how the base class is defined
    @SuppressWarnings("unchecked") InputFormat<Void, Tuple> inputFormat = loadFunc.getInputFormat();
    JobContext jobContext = ContextUtil.newJobContext(ContextUtil.getConfiguration(job), new JobID("jt", loadjobId));
    List<InputSplit> splits = inputFormat.getSplits(jobContext);
    int i = 0;
    int taskid = 0;
    for (InputSplit split : splits) {
        TaskAttemptContext taskAttemptContext = ContextUtil.newTaskAttemptContext(ContextUtil.getConfiguration(job), new TaskAttemptID("jt", loadjobId, true, taskid++, 0));
        RecordReader<Void, Tuple> recordReader = inputFormat.createRecordReader(split, taskAttemptContext);
        loadFunc.prepareToRead(recordReader, null);
        recordReader.initialize(split, taskAttemptContext);
        Tuple t;
        while ((t = loadFunc.getNext()) != null) {
            if (DEBUG)
                System.out.println(t);
            ++i;
        }
    }
    assertEquals(ROW_COUNT, i);
    long t1 = System.currentTimeMillis();
    results.append((t1 - t0) + " ms to read " + colsToLoad + " columns\n");
}
Also used : Path(org.apache.hadoop.fs.Path) TaskAttemptID(org.apache.hadoop.mapreduce.TaskAttemptID) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) JobContext(org.apache.hadoop.mapreduce.JobContext) Job(org.apache.hadoop.mapreduce.Job) File(java.io.File) InputSplit(org.apache.hadoop.mapreduce.InputSplit) Tuple(org.apache.pig.data.Tuple) JobID(org.apache.hadoop.mapreduce.JobID) LoadFunc(org.apache.pig.LoadFunc)

Example 22 with JobContext

use of org.apache.hadoop.mapreduce.JobContext in project cdap by caskdata.

the class DatasetInputFormat method getSplits.

@Override
public InputSplit[] getSplits(JobConf jobConf, int numSplits) throws IOException {
    try (DatasetAccessor datasetAccessor = new DatasetAccessor(jobConf)) {
        try {
            datasetAccessor.initialize();
        } catch (Exception e) {
            throw new IOException("Could not get dataset", e);
        }
        try (RecordScannable recordScannable = datasetAccessor.getDataset()) {
            Job job = new Job(jobConf);
            JobContext jobContext = ShimLoader.getHadoopShims().newJobContext(job);
            Path[] tablePaths = FileInputFormat.getInputPaths(jobContext);
            List<Split> dsSplits = recordScannable.getSplits();
            InputSplit[] inputSplits = new InputSplit[dsSplits.size()];
            for (int i = 0; i < dsSplits.size(); i++) {
                inputSplits[i] = new DatasetInputSplit(dsSplits.get(i), tablePaths[0]);
            }
            return inputSplits;
        }
    }
}
Also used : Path(org.apache.hadoop.fs.Path) IOException(java.io.IOException) IOException(java.io.IOException) RecordScannable(co.cask.cdap.api.data.batch.RecordScannable) JobContext(org.apache.hadoop.mapreduce.JobContext) Job(org.apache.hadoop.mapreduce.Job) Split(co.cask.cdap.api.data.batch.Split) FileSplit(org.apache.hadoop.mapred.FileSplit) InputSplit(org.apache.hadoop.mapred.InputSplit) InputSplit(org.apache.hadoop.mapred.InputSplit)

Example 23 with JobContext

use of org.apache.hadoop.mapreduce.JobContext in project cdap by caskdata.

the class MultipleOutputsCommitter method commitJob.

@Override
public void commitJob(JobContext jobContext) throws IOException {
    rootOutputcommitter.commitJob(jobContext);
    for (Map.Entry<String, OutputCommitter> committer : committers.entrySet()) {
        JobContext namedJobContext = MultipleOutputs.getNamedJobContext(jobContext, committer.getKey());
        committer.getValue().commitJob(namedJobContext);
    }
}
Also used : OutputCommitter(org.apache.hadoop.mapreduce.OutputCommitter) JobContext(org.apache.hadoop.mapreduce.JobContext) Map(java.util.Map)

Example 24 with JobContext

use of org.apache.hadoop.mapreduce.JobContext in project cdap by caskdata.

the class MultipleOutputsCommitter method abortJob.

@Override
public void abortJob(JobContext jobContext, JobStatus.State state) throws IOException {
    rootOutputcommitter.abortJob(jobContext, state);
    for (Map.Entry<String, OutputCommitter> committer : committers.entrySet()) {
        JobContext namedJobContext = MultipleOutputs.getNamedJobContext(jobContext, committer.getKey());
        committer.getValue().abortJob(namedJobContext, state);
    }
}
Also used : OutputCommitter(org.apache.hadoop.mapreduce.OutputCommitter) JobContext(org.apache.hadoop.mapreduce.JobContext) Map(java.util.Map)

Example 25 with JobContext

use of org.apache.hadoop.mapreduce.JobContext in project mongo-hadoop by mongodb.

the class GridFSInputFormatTest method testReadWholeFileNoDelimiter.

@Test
public void testReadWholeFileNoDelimiter() throws IOException, InterruptedException {
    Configuration conf = getConfiguration();
    MongoConfigUtil.setGridFSWholeFileSplit(conf, true);
    JobContext jobContext = mockJobContext(conf);
    List<InputSplit> splits = inputFormat.getSplits(jobContext);
    // Empty delimiter == no delimiter.
    MongoConfigUtil.setGridFSDelimiterPattern(conf, "");
    TaskAttemptContext context = mockTaskAttemptContext(conf);
    assertEquals(1, splits.size());
    String fileText = null;
    for (InputSplit split : splits) {
        GridFSInputFormat.GridFSTextRecordReader reader = new GridFSInputFormat.GridFSTextRecordReader();
        reader.initialize(split, context);
        int i;
        for (i = 0; reader.nextKeyValue(); ++i) {
            fileText = reader.getCurrentValue().toString();
        }
        assertEquals(1, i);
    }
    assertEquals(fileContents.toString(), fileText);
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) JobContext(org.apache.hadoop.mapreduce.JobContext) InputSplit(org.apache.hadoop.mapreduce.InputSplit) Test(org.junit.Test) BaseHadoopTest(com.mongodb.hadoop.testutils.BaseHadoopTest)

Aggregations

JobContext (org.apache.hadoop.mapreduce.JobContext)85 Configuration (org.apache.hadoop.conf.Configuration)41 Job (org.apache.hadoop.mapreduce.Job)35 TaskAttemptContext (org.apache.hadoop.mapreduce.TaskAttemptContext)34 Test (org.junit.Test)31 JobContextImpl (org.apache.hadoop.mapreduce.task.JobContextImpl)29 InputSplit (org.apache.hadoop.mapreduce.InputSplit)28 TaskAttemptContextImpl (org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl)25 Path (org.apache.hadoop.fs.Path)24 IOException (java.io.IOException)22 File (java.io.File)19 TaskAttemptID (org.apache.hadoop.mapreduce.TaskAttemptID)16 ArrayList (java.util.ArrayList)13 RecordWriter (org.apache.hadoop.mapreduce.RecordWriter)11 JobConf (org.apache.hadoop.mapred.JobConf)10 OutputCommitter (org.apache.hadoop.mapreduce.OutputCommitter)10 LongWritable (org.apache.hadoop.io.LongWritable)9 MapFile (org.apache.hadoop.io.MapFile)9 JobID (org.apache.hadoop.mapreduce.JobID)7 FileSystem (org.apache.hadoop.fs.FileSystem)6