Search in sources :

Example 46 with OutputCommitter

use of org.apache.hadoop.mapreduce.OutputCommitter in project cdap by caskdata.

the class MultipleOutputsCommitter method setupJob.

@Override
public void setupJob(JobContext jobContext) throws IOException {
    for (Map.Entry<String, OutputCommitter> committer : committers.entrySet()) {
        JobContext namedJobContext = MultipleOutputs.getNamedJobContext(jobContext, committer.getKey());
        committer.getValue().setupJob(namedJobContext);
    }
}
Also used : OutputCommitter(org.apache.hadoop.mapreduce.OutputCommitter) JobContext(org.apache.hadoop.mapreduce.JobContext) Map(java.util.Map)

Example 47 with OutputCommitter

use of org.apache.hadoop.mapreduce.OutputCommitter in project cdap by caskdata.

the class MultipleOutputsMainOutputWrapper method getOutputCommitter.

@Override
public synchronized OutputCommitter getOutputCommitter(TaskAttemptContext context) throws IOException, InterruptedException {
    // return a MultipleOutputsCommitter that commits for the root output format as well as all delegate outputformats
    if (committer == null) {
        // use a linked hash map: it preserves the order of insertion, so the output committers are called in the
        // same order as outputs were added. This makes multi-output a little more predictable (and testable).
        Map<String, OutputCommitter> committers = new LinkedHashMap<>();
        for (String name : MultipleOutputs.getNamedOutputsList(context)) {
            Class<? extends OutputFormat> namedOutputFormatClass = MultipleOutputs.getNamedOutputFormatClass(context, name);
            TaskAttemptContext namedContext = MultipleOutputs.getNamedTaskContext(context, name);
            OutputFormat<K, V> outputFormat = ReflectionUtils.newInstance(namedOutputFormatClass, namedContext.getConfiguration());
            committers.put(name, outputFormat.getOutputCommitter(namedContext));
        }
        committer = new MultipleOutputsCommitter(committers);
    }
    return committer;
}
Also used : OutputCommitter(org.apache.hadoop.mapreduce.OutputCommitter) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) LinkedHashMap(java.util.LinkedHashMap)

Aggregations

OutputCommitter (org.apache.hadoop.mapreduce.OutputCommitter)47 Test (org.junit.Test)29 Configuration (org.apache.hadoop.conf.Configuration)23 TaskAttemptContext (org.apache.hadoop.mapreduce.TaskAttemptContext)18 JobContext (org.apache.hadoop.mapreduce.JobContext)13 CommitterEventHandler (org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler)13 JobEvent (org.apache.hadoop.mapreduce.v2.app.job.event.JobEvent)11 AsyncDispatcher (org.apache.hadoop.yarn.event.AsyncDispatcher)11 TaskAttemptID (org.apache.hadoop.mapreduce.TaskAttemptID)10 IOException (java.io.IOException)8 JobTaskEvent (org.apache.hadoop.mapreduce.v2.app.job.event.JobTaskEvent)8 HashMap (java.util.HashMap)7 JobId (org.apache.hadoop.mapreduce.v2.api.records.JobId)7 ArrayList (java.util.ArrayList)6 Map (java.util.Map)6 NullWritable (org.apache.hadoop.io.NullWritable)6 TaskId (org.apache.hadoop.mapreduce.v2.api.records.TaskId)6 AppContext (org.apache.hadoop.mapreduce.v2.app.AppContext)6 JobStartEvent (org.apache.hadoop.mapreduce.v2.app.job.event.JobStartEvent)6 TaskAttemptEvent (org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEvent)6