Search in sources :

Example 26 with PartitionDetail

use of co.cask.cdap.api.dataset.lib.PartitionDetail in project cdap by caskdata.

the class PartitionBatchInput method setInput.

/**
 * Used from the initialize method of the implementing batch job to configure as input a PartitionedFileSet that has
 * specified a set of {@link Partition}s of a {@link PartitionedFileSet} to be processed by the run of the batch job.
 * It does this by reading back the previous state, determining the new partitions to read, computing the new
 * state, and persisting this new state. It then configures this dataset as input to the mapreduce context that is
 * passed in.
 *
 * @param mapreduceContext MapReduce context used to access the PartitionedFileSet, and on which the input is
 *                         configured
 * @param partitionedFileSetName the name of the {@link PartitionedFileSet} to consume partitions from
 * @param statePersistor a {@link DatasetStatePersistor} responsible for defining how the partition consumer state is
 *                       managed
 * @param consumerConfiguration defines parameters for the partition consumption
 * @return a BatchPartitionCommitter used to persist the state of the partition consumer
 */
public static BatchPartitionCommitter setInput(MapReduceContext mapreduceContext, String partitionedFileSetName, DatasetStatePersistor statePersistor, ConsumerConfiguration consumerConfiguration) {
    PartitionedFileSet partitionedFileSet = mapreduceContext.getDataset(partitionedFileSetName);
    final PartitionConsumer partitionConsumer = new ConcurrentPartitionConsumer(partitionedFileSet, new DelegatingStatePersistor(mapreduceContext, statePersistor), consumerConfiguration);
    final List<PartitionDetail> consumedPartitions = partitionConsumer.consumePartitions().getPartitions();
    Map<String, String> arguments = new HashMap<>();
    PartitionedFileSetArguments.addInputPartitions(arguments, consumedPartitions);
    mapreduceContext.addInput(Input.ofDataset(partitionedFileSetName, arguments));
    return succeeded -> partitionConsumer.onFinish(consumedPartitions, succeeded);
}
Also used : Input(co.cask.cdap.api.data.batch.Input) PartitionDetail(co.cask.cdap.api.dataset.lib.PartitionDetail) DatasetStatePersistor(co.cask.cdap.api.dataset.lib.DatasetStatePersistor) MapReduceContext(co.cask.cdap.api.mapreduce.MapReduceContext) List(java.util.List) PartitionedFileSet(co.cask.cdap.api.dataset.lib.PartitionedFileSet) Beta(co.cask.cdap.api.annotation.Beta) Map(java.util.Map) HashMap(java.util.HashMap) PartitionedFileSetArguments(co.cask.cdap.api.dataset.lib.PartitionedFileSetArguments) Partition(co.cask.cdap.api.dataset.lib.Partition) HashMap(java.util.HashMap) PartitionedFileSet(co.cask.cdap.api.dataset.lib.PartitionedFileSet) PartitionDetail(co.cask.cdap.api.dataset.lib.PartitionDetail)

Aggregations

PartitionDetail (co.cask.cdap.api.dataset.lib.PartitionDetail)26 PartitionedFileSet (co.cask.cdap.api.dataset.lib.PartitionedFileSet)18 PartitionKey (co.cask.cdap.api.dataset.lib.PartitionKey)12 Test (org.junit.Test)11 TransactionAware (org.apache.tephra.TransactionAware)10 TransactionExecutor (org.apache.tephra.TransactionExecutor)10 Location (org.apache.twill.filesystem.Location)9 IOException (java.io.IOException)8 HashMap (java.util.HashMap)8 HashSet (java.util.HashSet)8 DataSetException (co.cask.cdap.api.dataset.DataSetException)6 PartitionNotFoundException (co.cask.cdap.api.dataset.PartitionNotFoundException)5 PartitionAlreadyExistsException (co.cask.cdap.api.dataset.lib.PartitionAlreadyExistsException)5 PartitionOutput (co.cask.cdap.api.dataset.lib.PartitionOutput)5 ImmutableMap (com.google.common.collect.ImmutableMap)4 Predicate (co.cask.cdap.api.Predicate)3 KeyValueTable (co.cask.cdap.api.dataset.lib.KeyValueTable)3 Partition (co.cask.cdap.api.dataset.lib.Partition)3 PartitionFilter (co.cask.cdap.api.dataset.lib.PartitionFilter)3 ConcurrentPartitionConsumer (co.cask.cdap.api.dataset.lib.partitioned.ConcurrentPartitionConsumer)3