Search in sources :

Example 1 with AggregatorGroupByFunction

use of io.cdap.cdap.etl.spark.function.AggregatorGroupByFunction in project cdap by caskdata.

the class BaseRDDCollection method aggregate.

@Override
public SparkCollection<RecordInfo<Object>> aggregate(StageSpec stageSpec, @Nullable Integer partitions, StageStatisticsCollector collector) {
    PluginFunctionContext pluginFunctionContext = new PluginFunctionContext(stageSpec, sec, collector);
    PairFlatMapFunction<T, Object, T> groupByFunction = new AggregatorGroupByFunction<>(pluginFunctionContext, functionCacheFactory.newCache());
    JavaPairRDD<Object, T> keyedCollection = rdd.flatMapToPair(groupByFunction);
    JavaPairRDD<Object, Iterable<T>> groupedCollection = partitions == null ? keyedCollection.groupByKey() : keyedCollection.groupByKey(partitions);
    FlatMapFunction<Tuple2<Object, Iterable<T>>, RecordInfo<Object>> sparkAggregateFunction = new AggregatorAggregateFunction<>(pluginFunctionContext, functionCacheFactory.newCache());
    return wrap(groupedCollection.flatMap(sparkAggregateFunction));
}
Also used : PluginFunctionContext(io.cdap.cdap.etl.spark.function.PluginFunctionContext) RecordInfo(io.cdap.cdap.etl.common.RecordInfo) Tuple2(scala.Tuple2) AggregatorAggregateFunction(io.cdap.cdap.etl.spark.function.AggregatorAggregateFunction) AggregatorGroupByFunction(io.cdap.cdap.etl.spark.function.AggregatorGroupByFunction)

Aggregations

RecordInfo (io.cdap.cdap.etl.common.RecordInfo)1 AggregatorAggregateFunction (io.cdap.cdap.etl.spark.function.AggregatorAggregateFunction)1 AggregatorGroupByFunction (io.cdap.cdap.etl.spark.function.AggregatorGroupByFunction)1 PluginFunctionContext (io.cdap.cdap.etl.spark.function.PluginFunctionContext)1 Tuple2 (scala.Tuple2)1