Search in sources :

Example 1 with DatasetStatePersistor

use of co.cask.cdap.api.dataset.lib.DatasetStatePersistor in project cdap by caskdata.

the class PartitionBatchInput method setInput.

/**
 * Used from the initialize method of the implementing batch job to configure as input a PartitionedFileSet that has
 * specified a set of {@link Partition}s of a {@link PartitionedFileSet} to be processed by the run of the batch job.
 * It does this by reading back the previous state, determining the new partitions to read, computing the new
 * state, and persisting this new state. It then configures this dataset as input to the mapreduce context that is
 * passed in.
 *
 * @param mapreduceContext MapReduce context used to access the PartitionedFileSet, and on which the input is
 *                         configured
 * @param partitionedFileSetName the name of the {@link PartitionedFileSet} to consume partitions from
 * @param statePersistor a {@link DatasetStatePersistor} responsible for defining how the partition consumer state is
 *                       managed
 * @param consumerConfiguration defines parameters for the partition consumption
 * @return a BatchPartitionCommitter used to persist the state of the partition consumer
 */
public static BatchPartitionCommitter setInput(MapReduceContext mapreduceContext, String partitionedFileSetName, DatasetStatePersistor statePersistor, ConsumerConfiguration consumerConfiguration) {
    PartitionedFileSet partitionedFileSet = mapreduceContext.getDataset(partitionedFileSetName);
    final PartitionConsumer partitionConsumer = new ConcurrentPartitionConsumer(partitionedFileSet, new DelegatingStatePersistor(mapreduceContext, statePersistor), consumerConfiguration);
    final List<PartitionDetail> consumedPartitions = partitionConsumer.consumePartitions().getPartitions();
    Map<String, String> arguments = new HashMap<>();
    PartitionedFileSetArguments.addInputPartitions(arguments, consumedPartitions);
    mapreduceContext.addInput(Input.ofDataset(partitionedFileSetName, arguments));
    return succeeded -> partitionConsumer.onFinish(consumedPartitions, succeeded);
}
Also used : Input(co.cask.cdap.api.data.batch.Input) PartitionDetail(co.cask.cdap.api.dataset.lib.PartitionDetail) DatasetStatePersistor(co.cask.cdap.api.dataset.lib.DatasetStatePersistor) MapReduceContext(co.cask.cdap.api.mapreduce.MapReduceContext) List(java.util.List) PartitionedFileSet(co.cask.cdap.api.dataset.lib.PartitionedFileSet) Beta(co.cask.cdap.api.annotation.Beta) Map(java.util.Map) HashMap(java.util.HashMap) PartitionedFileSetArguments(co.cask.cdap.api.dataset.lib.PartitionedFileSetArguments) Partition(co.cask.cdap.api.dataset.lib.Partition) HashMap(java.util.HashMap) PartitionedFileSet(co.cask.cdap.api.dataset.lib.PartitionedFileSet) PartitionDetail(co.cask.cdap.api.dataset.lib.PartitionDetail)

Aggregations

Beta (co.cask.cdap.api.annotation.Beta)1 Input (co.cask.cdap.api.data.batch.Input)1 DatasetStatePersistor (co.cask.cdap.api.dataset.lib.DatasetStatePersistor)1 Partition (co.cask.cdap.api.dataset.lib.Partition)1 PartitionDetail (co.cask.cdap.api.dataset.lib.PartitionDetail)1 PartitionedFileSet (co.cask.cdap.api.dataset.lib.PartitionedFileSet)1 PartitionedFileSetArguments (co.cask.cdap.api.dataset.lib.PartitionedFileSetArguments)1 MapReduceContext (co.cask.cdap.api.mapreduce.MapReduceContext)1 HashMap (java.util.HashMap)1 List (java.util.List)1 Map (java.util.Map)1