Search in sources :

Example 1 with AggregatorGroupByFunction

use of co.cask.cdap.etl.spark.function.AggregatorGroupByFunction in project cdap by caskdata.

the class RDDCollection method aggregate.

@Override
public SparkCollection<RecordInfo<Object>> aggregate(StageSpec stageSpec, @Nullable Integer partitions, StageStatisticsCollector collector) {
    PluginFunctionContext pluginFunctionContext = new PluginFunctionContext(stageSpec, sec, collector);
    PairFlatMapFunc<T, Object, T> groupByFunction = new AggregatorGroupByFunction<>(pluginFunctionContext);
    PairFlatMapFunction<T, Object, T> sparkGroupByFunction = Compat.convert(groupByFunction);
    JavaPairRDD<Object, T> keyedCollection = rdd.flatMapToPair(sparkGroupByFunction);
    JavaPairRDD<Object, Iterable<T>> groupedCollection = partitions == null ? keyedCollection.groupByKey() : keyedCollection.groupByKey(partitions);
    FlatMapFunc<Tuple2<Object, Iterable<T>>, RecordInfo<Object>> aggregateFunction = new AggregatorAggregateFunction<>(pluginFunctionContext);
    FlatMapFunction<Tuple2<Object, Iterable<T>>, RecordInfo<Object>> sparkAggregateFunction = Compat.convert(aggregateFunction);
    return wrap(groupedCollection.flatMap(sparkAggregateFunction));
}
Also used : RecordInfo(co.cask.cdap.etl.common.RecordInfo) AggregatorAggregateFunction(co.cask.cdap.etl.spark.function.AggregatorAggregateFunction) PluginFunctionContext(co.cask.cdap.etl.spark.function.PluginFunctionContext) Tuple2(scala.Tuple2) AggregatorGroupByFunction(co.cask.cdap.etl.spark.function.AggregatorGroupByFunction)

Example 2 with AggregatorGroupByFunction

use of co.cask.cdap.etl.spark.function.AggregatorGroupByFunction in project cdap by caskdata.

the class RDDCollection method aggregate.

@Override
public SparkCollection<Tuple2<Boolean, Object>> aggregate(StageInfo stageInfo, @Nullable Integer partitions) {
    PluginFunctionContext pluginFunctionContext = new PluginFunctionContext(stageInfo, sec);
    PairFlatMapFunc<T, Object, T> groupByFunction = new AggregatorGroupByFunction<>(pluginFunctionContext);
    PairFlatMapFunction<T, Object, T> sparkGroupByFunction = Compat.convert(groupByFunction);
    JavaPairRDD<Object, T> keyedCollection = rdd.flatMapToPair(sparkGroupByFunction);
    JavaPairRDD<Object, Iterable<T>> groupedCollection = partitions == null ? keyedCollection.groupByKey() : keyedCollection.groupByKey(partitions);
    FlatMapFunc<Tuple2<Object, Iterable<T>>, Tuple2<Boolean, Object>> aggregateFunction = new AggregatorAggregateFunction<>(pluginFunctionContext);
    FlatMapFunction<Tuple2<Object, Iterable<T>>, Tuple2<Boolean, Object>> sparkAggregateFunction = Compat.convert(aggregateFunction);
    return wrap(groupedCollection.flatMap(sparkAggregateFunction));
}
Also used : AggregatorAggregateFunction(co.cask.cdap.etl.spark.function.AggregatorAggregateFunction) PluginFunctionContext(co.cask.cdap.etl.spark.function.PluginFunctionContext) Tuple2(scala.Tuple2) AggregatorGroupByFunction(co.cask.cdap.etl.spark.function.AggregatorGroupByFunction)

Aggregations

AggregatorAggregateFunction (co.cask.cdap.etl.spark.function.AggregatorAggregateFunction)2 AggregatorGroupByFunction (co.cask.cdap.etl.spark.function.AggregatorGroupByFunction)2 PluginFunctionContext (co.cask.cdap.etl.spark.function.PluginFunctionContext)2 Tuple2 (scala.Tuple2)2 RecordInfo (co.cask.cdap.etl.common.RecordInfo)1