Search in sources :

Example 11 with MapReduceContext

use of co.cask.cdap.api.mapreduce.MapReduceContext in project cdap by caskdata.

the class HitCounterProgram method initialize.

@Override
public void initialize() throws Exception {
    MapReduceContext context = getContext();
    Job job = context.getHadoopJob();
    job.setMapperClass(Emitter.class);
    job.setReducerClass(Counter.class);
    context.addInput(Input.ofStream(LogAnalysisApp.LOG_STREAM));
    context.addOutput(Output.ofDataset(LogAnalysisApp.HIT_COUNT_STORE));
}
Also used : MapReduceContext(co.cask.cdap.api.mapreduce.MapReduceContext) Job(org.apache.hadoop.mapreduce.Job)

Example 12 with MapReduceContext

use of co.cask.cdap.api.mapreduce.MapReduceContext in project cdap by caskdata.

the class DataCleansingMapReduce method initialize.

@Override
public void initialize() throws Exception {
    MapReduceContext context = getContext();
    partitionCommitter = PartitionBatchInput.setInput(context, DataCleansing.RAW_RECORDS, new KVTableStatePersistor(DataCleansing.CONSUMING_STATE, "state.key"));
    // Each run writes its output to a partition for the league
    Long timeKey = Long.valueOf(context.getRuntimeArguments().get(OUTPUT_PARTITION_KEY));
    PartitionKey outputKey = PartitionKey.builder().addLongField("time", timeKey).build();
    Map<String, String> metadataToAssign = ImmutableMap.of("source.program", "DataCleansingMapReduce");
    // set up two outputs - one for invalid records and one for valid records
    Map<String, String> invalidRecordsArgs = new HashMap<>();
    PartitionedFileSetArguments.setOutputPartitionKey(invalidRecordsArgs, outputKey);
    PartitionedFileSetArguments.setOutputPartitionMetadata(invalidRecordsArgs, metadataToAssign);
    context.addOutput(Output.ofDataset(DataCleansing.INVALID_RECORDS, invalidRecordsArgs));
    Map<String, String> cleanRecordsArgs = new HashMap<>();
    PartitionedFileSetArguments.setDynamicPartitioner(cleanRecordsArgs, TimeAndZipPartitioner.class);
    PartitionedFileSetArguments.setOutputPartitionMetadata(cleanRecordsArgs, metadataToAssign);
    context.addOutput(Output.ofDataset(DataCleansing.CLEAN_RECORDS, cleanRecordsArgs));
    Job job = context.getHadoopJob();
    job.setMapperClass(SchemaMatchingFilter.class);
    job.setNumReduceTasks(0);
    // simply propagate the schema (if any) to be used by the mapper
    String schemaJson = context.getRuntimeArguments().get(SCHEMA_KEY);
    if (schemaJson != null) {
        job.getConfiguration().set(SCHEMA_KEY, schemaJson);
    }
}
Also used : MapReduceContext(co.cask.cdap.api.mapreduce.MapReduceContext) KVTableStatePersistor(co.cask.cdap.api.dataset.lib.partitioned.KVTableStatePersistor) HashMap(java.util.HashMap) PartitionKey(co.cask.cdap.api.dataset.lib.PartitionKey) Job(org.apache.hadoop.mapreduce.Job)

Aggregations

MapReduceContext (co.cask.cdap.api.mapreduce.MapReduceContext)12 Job (org.apache.hadoop.mapreduce.Job)12 PartitionKey (co.cask.cdap.api.dataset.lib.PartitionKey)3 HashMap (java.util.HashMap)3 PartitionedFileSet (co.cask.cdap.api.dataset.lib.PartitionedFileSet)2 WorkflowToken (co.cask.cdap.api.workflow.WorkflowToken)2 Resources (co.cask.cdap.api.Resources)1 TimePartitionedFileSet (co.cask.cdap.api.dataset.lib.TimePartitionedFileSet)1 KVTableStatePersistor (co.cask.cdap.api.dataset.lib.partitioned.KVTableStatePersistor)1 MacroEvaluator (co.cask.cdap.api.macro.MacroEvaluator)1 Value (co.cask.cdap.api.workflow.Value)1 BatchAggregator (co.cask.cdap.etl.api.batch.BatchAggregator)1 BatchJoiner (co.cask.cdap.etl.api.batch.BatchJoiner)1 BatchSinkContext (co.cask.cdap.etl.api.batch.BatchSinkContext)1 BatchSourceContext (co.cask.cdap.etl.api.batch.BatchSourceContext)1 BatchPhaseSpec (co.cask.cdap.etl.batch.BatchPhaseSpec)1 DefaultAggregatorContext (co.cask.cdap.etl.batch.DefaultAggregatorContext)1 DefaultJoinerContext (co.cask.cdap.etl.batch.DefaultJoinerContext)1 PipelinePluginInstantiator (co.cask.cdap.etl.batch.PipelinePluginInstantiator)1 StageFailureException (co.cask.cdap.etl.batch.StageFailureException)1