Search in sources :

Example 1 with OutputCollector

use of org.apache.hadoop.mapred.OutputCollector in project hadoop by apache.

the class PipeMapRed method waitOutputThreads.

void waitOutputThreads() throws IOException {
    try {
        if (outThread_ == null) {
            // This happens only when reducer has empty input(So reduce() is not
            // called at all in this task). If reducer still generates output,
            // which is very uncommon and we may not have to support this case.
            // So we don't write this output to HDFS, but we consume/collect
            // this output just to avoid reducer hanging forever.
            OutputCollector collector = new OutputCollector() {

                public void collect(Object key, Object value) throws IOException {
                //just consume it, no need to write the record anywhere
                }
            };
            //dummy reporter
            Reporter reporter = Reporter.NULL;
            startOutputThreads(collector, reporter);
        }
        int exitVal = sim.waitFor();
        // how'd it go?
        if (exitVal != 0) {
            if (nonZeroExitIsFailure_) {
                throw new RuntimeException("PipeMapRed.waitOutputThreads(): subprocess failed with code " + exitVal);
            } else {
                LOG.info("PipeMapRed.waitOutputThreads(): subprocess exited with " + "code " + exitVal + " in " + PipeMapRed.class.getName());
            }
        }
        if (outThread_ != null) {
            outThread_.join(joinDelay_);
        }
        if (errThread_ != null) {
            errThread_.join(joinDelay_);
        }
        if (outerrThreadsThrowable != null) {
            throw new RuntimeException(outerrThreadsThrowable);
        }
    } catch (InterruptedException e) {
    //ignore
    }
}
Also used : OutputCollector(org.apache.hadoop.mapred.OutputCollector) Reporter(org.apache.hadoop.mapred.Reporter)

Example 2 with OutputCollector

use of org.apache.hadoop.mapred.OutputCollector in project elephant-bird by twitter.

the class LzoBinaryScheme method sink.

@Override
public void sink(FlowProcess<JobConf> flowProcess, SinkCall<T, OutputCollector> sinkCall) throws IOException {
    OutputCollector collector = sinkCall.getOutput();
    TupleEntry entry = sinkCall.getOutgoingEntry();
    T writable = sinkCall.getContext();
    writable.set((M) entry.getTuple().getObject(0));
    collector.collect(null, writable);
}
Also used : OutputCollector(org.apache.hadoop.mapred.OutputCollector) TupleEntry(cascading.tuple.TupleEntry)

Example 3 with OutputCollector

use of org.apache.hadoop.mapred.OutputCollector in project parquet-mr by apache.

the class ParquetValueScheme method sink.

@SuppressWarnings("unchecked")
@Override
public void sink(FlowProcess<? extends JobConf> fp, SinkCall<Object[], OutputCollector> sc) throws IOException {
    TupleEntry tuple = sc.getOutgoingEntry();
    if (tuple.size() != 1) {
        throw new RuntimeException("ParquetValueScheme expects tuples with an arity of exactly 1, but found " + tuple.getFields());
    }
    T value = (T) tuple.getObject(0);
    OutputCollector output = sc.getOutput();
    output.collect(null, value);
}
Also used : OutputCollector(org.apache.hadoop.mapred.OutputCollector) TupleEntry(cascading.tuple.TupleEntry)

Example 4 with OutputCollector

use of org.apache.hadoop.mapred.OutputCollector in project tez by apache.

the class MapProcessor method runOldMapper.

void runOldMapper(final JobConf job, final MRTaskReporter reporter, final MRInputLegacy input, final KeyValueWriter output) throws IOException, InterruptedException {
    // Initialize input in-line since it sets parameters which may be used by the processor.
    // Done only for MRInput.
    // TODO use new method in MRInput to get required info
    // input.initialize(job, master);
    InputSplit inputSplit = input.getOldInputSplit();
    updateJobWithSplit(job, inputSplit);
    RecordReader in = new OldRecordReader(input);
    OutputCollector collector = new OldOutputCollector(output);
    MapRunnable runner = (MapRunnable) ReflectionUtils.newInstance(job.getMapRunnerClass(), job);
    runner.run(in, collector, (Reporter) reporter);
    // Set progress to 1.0f if there was no exception,
    reporter.setProgress(1.0f);
    // start the sort phase only if there are reducers
    this.statusUpdate();
}
Also used : OutputCollector(org.apache.hadoop.mapred.OutputCollector) MapRunnable(org.apache.hadoop.mapred.MapRunnable) RecordReader(org.apache.hadoop.mapred.RecordReader) InputSplit(org.apache.hadoop.mapred.InputSplit)

Example 5 with OutputCollector

use of org.apache.hadoop.mapred.OutputCollector in project tez by apache.

the class ReduceProcessor method runOldReducer.

void runOldReducer(JobConf job, final MRTaskReporter reporter, KeyValuesReader input, RawComparator comparator, Class keyClass, Class valueClass, final KeyValueWriter output) throws IOException, InterruptedException {
    Reducer reducer = ReflectionUtils.newInstance(job.getReducerClass(), job);
    // make output collector
    OutputCollector collector = new OutputCollector() {

        public void collect(Object key, Object value) throws IOException {
            output.write(key, value);
        }
    };
    // apply reduce function
    try {
        ReduceValuesIterator values = new ReduceValuesIterator(input, reporter, reduceInputValueCounter);
        values.informReduceProgress();
        while (values.more()) {
            reduceInputKeyCounter.increment(1);
            reducer.reduce(values.getKey(), values, collector, reporter);
            values.informReduceProgress();
        }
        // Set progress to 1.0f if there was no exception,
        reporter.setProgress(1.0f);
        // Clean up: repeated in catch block below
        reducer.close();
    // End of clean up.
    } catch (IOException ioe) {
        try {
            reducer.close();
        } catch (IOException ignored) {
        }
        throw ioe;
    }
}
Also used : OutputCollector(org.apache.hadoop.mapred.OutputCollector) IOException(java.io.IOException) Reducer(org.apache.hadoop.mapred.Reducer)

Aggregations

OutputCollector (org.apache.hadoop.mapred.OutputCollector)17 TupleEntry (cascading.tuple.TupleEntry)9 ImmutableBytesWritable (org.apache.hadoop.hbase.io.ImmutableBytesWritable)4 Tuple (cascading.tuple.Tuple)3 JobConf (org.apache.hadoop.mapred.JobConf)3 RecordReader (org.apache.hadoop.mapred.RecordReader)3 Reporter (org.apache.hadoop.mapred.Reporter)3 Test (org.junit.Test)3 FlowProcess (cascading.flow.FlowProcess)2 HadoopFlowProcess (cascading.flow.hadoop.HadoopFlowProcess)2 TempHfs (cascading.tap.hadoop.util.TempHfs)2 Fields (cascading.tuple.Fields)2 Configuration (org.apache.hadoop.conf.Configuration)2 Put (org.apache.hadoop.hbase.client.Put)2 Result (org.apache.hadoop.hbase.client.Result)2 InputSplit (org.apache.hadoop.mapred.InputSplit)2 Reducer (org.apache.hadoop.mapred.Reducer)2 IOException (java.io.IOException)1 AtomicBoolean (java.util.concurrent.atomic.AtomicBoolean)1 Cell (org.apache.hadoop.hbase.Cell)1