Examples with GroupReduceFunction - org.apache.flink.api.common.functions.GroupReduceFunction

Example 6 with GroupReduceFunction

use of org.apache.flink.api.common.functions.GroupReduceFunction in project flink by apache.

the class IterationWithChainingITCase method testProgram.

@Override
protected void testProgram() throws Exception {
    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(4);
    DataSet<Tuple2<Integer, CoordVector>> initialInput = env.readFile(new PointInFormat(), dataPath).setParallelism(1).name("Input");
    IterativeDataSet<Tuple2<Integer, CoordVector>> iteration = initialInput.iterate(2).name("Loop");
    DataSet<Tuple2<Integer, CoordVector>> identity = iteration.groupBy(0).reduceGroup(new GroupReduceFunction<Tuple2<Integer, CoordVector>, Tuple2<Integer, CoordVector>>() {

        @Override
        public void reduce(Iterable<Tuple2<Integer, CoordVector>> values, Collector<Tuple2<Integer, CoordVector>> out) throws Exception {
            for (Tuple2<Integer, CoordVector> value : values) {
                out.collect(value);
            }
        }
    }).map(new MapFunction<Tuple2<Integer, CoordVector>, Tuple2<Integer, CoordVector>>() {

        @Override
        public Tuple2<Integer, CoordVector> map(Tuple2<Integer, CoordVector> value) throws Exception {
            return value;
        }
    });
    iteration.closeWith(identity).writeAsFormattedText(resultPath, new PointFormatter());
    env.execute("Iteration with chained map test");
    compareResultsByLinesInMemory(DATA_POINTS, resultPath);
}

Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) GroupReduceFunction(org.apache.flink.api.common.functions.GroupReduceFunction) CoordVector(org.apache.flink.test.util.CoordVector) PointFormatter(org.apache.flink.test.util.PointFormatter) PointInFormat(org.apache.flink.test.util.PointInFormat) Tuple2(org.apache.flink.api.java.tuple.Tuple2) Collector(org.apache.flink.util.Collector)

Example 7 with GroupReduceFunction

use of org.apache.flink.api.common.functions.GroupReduceFunction in project flink by apache.

the class GroupReduceOperatorTest method testGroupReduceCollection.

@Test
public void testGroupReduceCollection() {
    try {
        final GroupReduceFunction<Tuple2<String, Integer>, Tuple2<String, Integer>> reducer = new GroupReduceFunction<Tuple2<String, Integer>, Tuple2<String, Integer>>() {

            @Override
            public void reduce(Iterable<Tuple2<String, Integer>> values, Collector<Tuple2<String, Integer>> out) throws Exception {
                Iterator<Tuple2<String, Integer>> input = values.iterator();
                Tuple2<String, Integer> result = input.next();
                int sum = result.f1;
                while (input.hasNext()) {
                    Tuple2<String, Integer> next = input.next();
                    sum += next.f1;
                }
                result.f1 = sum;
                out.collect(result);
            }
        };
        GroupReduceOperatorBase<Tuple2<String, Integer>, Tuple2<String, Integer>, GroupReduceFunction<Tuple2<String, Integer>, Tuple2<String, Integer>>> op = new GroupReduceOperatorBase<Tuple2<String, Integer>, Tuple2<String, Integer>, GroupReduceFunction<Tuple2<String, Integer>, Tuple2<String, Integer>>>(reducer, new UnaryOperatorInformation<Tuple2<String, Integer>, Tuple2<String, Integer>>(TypeInfoParser.<Tuple2<String, Integer>>parse("Tuple2<String, Integer>"), TypeInfoParser.<Tuple2<String, Integer>>parse("Tuple2<String, Integer>")), new int[] { 0 }, "TestReducer");
        List<Tuple2<String, Integer>> input = new ArrayList<Tuple2<String, Integer>>(asList(new Tuple2<String, Integer>("foo", 1), new Tuple2<String, Integer>("foo", 3), new Tuple2<String, Integer>("bar", 2), new Tuple2<String, Integer>("bar", 4)));
        ExecutionConfig executionConfig = new ExecutionConfig();
        executionConfig.disableObjectReuse();
        List<Tuple2<String, Integer>> resultMutableSafe = op.executeOnCollections(input, null, executionConfig);
        executionConfig.enableObjectReuse();
        List<Tuple2<String, Integer>> resultRegular = op.executeOnCollections(input, null, executionConfig);
        Set<Tuple2<String, Integer>> resultSetMutableSafe = new HashSet<Tuple2<String, Integer>>(resultMutableSafe);
        Set<Tuple2<String, Integer>> resultSetRegular = new HashSet<Tuple2<String, Integer>>(resultRegular);
        Set<Tuple2<String, Integer>> expectedResult = new HashSet<Tuple2<String, Integer>>(asList(new Tuple2<String, Integer>("foo", 4), new Tuple2<String, Integer>("bar", 6)));
        assertEquals(expectedResult, resultSetMutableSafe);
        assertEquals(expectedResult, resultSetRegular);
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}

Also used : RichGroupReduceFunction(org.apache.flink.api.common.functions.RichGroupReduceFunction) GroupReduceFunction(org.apache.flink.api.common.functions.GroupReduceFunction) ArrayList(java.util.ArrayList) ExecutionConfig(org.apache.flink.api.common.ExecutionConfig) Tuple2(org.apache.flink.api.java.tuple.Tuple2) Collector(org.apache.flink.util.Collector) HashSet(java.util.HashSet) Test(org.junit.Test)

Example 8 with GroupReduceFunction

use of org.apache.flink.api.common.functions.GroupReduceFunction in project flink by apache.

the class AggregateOperator method translateToDataFlow.

@SuppressWarnings("unchecked")
@Override
@Internal
protected org.apache.flink.api.common.operators.base.GroupReduceOperatorBase<IN, IN, GroupReduceFunction<IN, IN>> translateToDataFlow(Operator<IN> input) {
    // sanity check
    if (this.aggregationFunctions.isEmpty() || this.aggregationFunctions.size() != this.fields.size()) {
        throw new IllegalStateException();
    }
    // construct the aggregation function
    AggregationFunction<Object>[] aggFunctions = new AggregationFunction[this.aggregationFunctions.size()];
    int[] fields = new int[this.fields.size()];
    StringBuilder genName = new StringBuilder();
    for (int i = 0; i < fields.length; i++) {
        aggFunctions[i] = (AggregationFunction<Object>) this.aggregationFunctions.get(i);
        fields[i] = this.fields.get(i);
        genName.append(aggFunctions[i].toString()).append('(').append(fields[i]).append(')').append(',');
    }
    genName.append(" at ").append(aggregateLocationName);
    genName.setLength(genName.length() - 1);
    @SuppressWarnings("rawtypes") RichGroupReduceFunction<IN, IN> function = new AggregatingUdf(aggFunctions, fields);
    String name = getName() != null ? getName() : genName.toString();
    // distinguish between grouped reduce and non-grouped reduce
    if (this.grouping == null) {
        // non grouped aggregation
        UnaryOperatorInformation<IN, IN> operatorInfo = new UnaryOperatorInformation<>(getInputType(), getResultType());
        GroupReduceOperatorBase<IN, IN, GroupReduceFunction<IN, IN>> po = new GroupReduceOperatorBase<IN, IN, GroupReduceFunction<IN, IN>>(function, operatorInfo, new int[0], name);
        po.setCombinable(true);
        // set input
        po.setInput(input);
        // set parallelism
        po.setParallelism(this.getParallelism());
        return po;
    }
    if (this.grouping.getKeys() instanceof Keys.ExpressionKeys) {
        // grouped aggregation
        int[] logicalKeyPositions = this.grouping.getKeys().computeLogicalKeyPositions();
        UnaryOperatorInformation<IN, IN> operatorInfo = new UnaryOperatorInformation<>(getInputType(), getResultType());
        GroupReduceOperatorBase<IN, IN, GroupReduceFunction<IN, IN>> po = new GroupReduceOperatorBase<IN, IN, GroupReduceFunction<IN, IN>>(function, operatorInfo, logicalKeyPositions, name);
        po.setCombinable(true);
        po.setInput(input);
        po.setParallelism(this.getParallelism());
        po.setCustomPartitioner(grouping.getCustomPartitioner());
        SingleInputSemanticProperties props = new SingleInputSemanticProperties();
        for (int keyField : logicalKeyPositions) {
            boolean keyFieldUsedInAgg = false;
            for (int aggField : fields) {
                if (keyField == aggField) {
                    keyFieldUsedInAgg = true;
                    break;
                }
            }
            if (!keyFieldUsedInAgg) {
                props.addForwardedField(keyField, keyField);
            }
        }
        po.setSemanticProperties(props);
        return po;
    } else if (this.grouping.getKeys() instanceof Keys.SelectorFunctionKeys) {
        throw new UnsupportedOperationException("Aggregate does not support grouping with KeySelector functions, yet.");
    } else {
        throw new UnsupportedOperationException("Unrecognized key type.");
    }
}

Also used : GroupReduceFunction(org.apache.flink.api.common.functions.GroupReduceFunction) RichGroupReduceFunction(org.apache.flink.api.common.functions.RichGroupReduceFunction) AggregationFunction(org.apache.flink.api.java.aggregation.AggregationFunction) UnaryOperatorInformation(org.apache.flink.api.common.operators.UnaryOperatorInformation) Keys(org.apache.flink.api.common.operators.Keys) GroupReduceOperatorBase(org.apache.flink.api.common.operators.base.GroupReduceOperatorBase) SingleInputSemanticProperties(org.apache.flink.api.common.operators.SingleInputSemanticProperties) Internal(org.apache.flink.annotation.Internal)

Example 9 with GroupReduceFunction

use of org.apache.flink.api.common.functions.GroupReduceFunction in project flink by apache.

the class TransitiveClosureNaive method main.

public static void main(String... args) throws Exception {
    // Checking input parameters
    final ParameterTool params = ParameterTool.fromArgs(args);
    // set up execution environment
    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    // make parameters available in the web interface
    env.getConfig().setGlobalJobParameters(params);
    final int maxIterations = params.getInt("iterations", 10);
    DataSet<Tuple2<Long, Long>> edges;
    if (params.has("edges")) {
        edges = env.readCsvFile(params.get("edges")).fieldDelimiter(" ").types(Long.class, Long.class);
    } else {
        System.out.println("Executing TransitiveClosureNaive example with default edges data set.");
        System.out.println("Use --edges to specify file input.");
        edges = ConnectedComponentsData.getDefaultEdgeDataSet(env);
    }
    IterativeDataSet<Tuple2<Long, Long>> paths = edges.iterate(maxIterations);
    DataSet<Tuple2<Long, Long>> nextPaths = paths.join(edges).where(1).equalTo(0).with(new JoinFunction<Tuple2<Long, Long>, Tuple2<Long, Long>, Tuple2<Long, Long>>() {

        @Override
        public /**
						left: Path (z,x) - x is reachable by z
						right: Edge (x,y) - edge x-->y exists
						out: Path (z,y) - y is reachable by z
					 */
        Tuple2<Long, Long> join(Tuple2<Long, Long> left, Tuple2<Long, Long> right) throws Exception {
            return new Tuple2<Long, Long>(left.f0, right.f1);
        }
    }).withForwardedFieldsFirst("0").withForwardedFieldsSecond("1").union(paths).groupBy(0, 1).reduceGroup(new GroupReduceFunction<Tuple2<Long, Long>, Tuple2<Long, Long>>() {

        @Override
        public void reduce(Iterable<Tuple2<Long, Long>> values, Collector<Tuple2<Long, Long>> out) throws Exception {
            out.collect(values.iterator().next());
        }
    }).withForwardedFields("0;1");
    DataSet<Tuple2<Long, Long>> newPaths = paths.coGroup(nextPaths).where(0).equalTo(0).with(new CoGroupFunction<Tuple2<Long, Long>, Tuple2<Long, Long>, Tuple2<Long, Long>>() {

        Set<Tuple2<Long, Long>> prevSet = new HashSet<Tuple2<Long, Long>>();

        @Override
        public void coGroup(Iterable<Tuple2<Long, Long>> prevPaths, Iterable<Tuple2<Long, Long>> nextPaths, Collector<Tuple2<Long, Long>> out) throws Exception {
            for (Tuple2<Long, Long> prev : prevPaths) {
                prevSet.add(prev);
            }
            for (Tuple2<Long, Long> next : nextPaths) {
                if (!prevSet.contains(next)) {
                    out.collect(next);
                }
            }
        }
    }).withForwardedFieldsFirst("0").withForwardedFieldsSecond("0");
    DataSet<Tuple2<Long, Long>> transitiveClosure = paths.closeWith(nextPaths, newPaths);
    // emit result
    if (params.has("output")) {
        transitiveClosure.writeAsCsv(params.get("output"), "\n", " ");
        // execute program explicitly, because file sinks are lazy
        env.execute("Transitive Closure Example");
    } else {
        System.out.println("Printing result to stdout. Use --output to specify output path.");
        transitiveClosure.print();
    }
}

Also used : ParameterTool(org.apache.flink.api.java.utils.ParameterTool) ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) GroupReduceFunction(org.apache.flink.api.common.functions.GroupReduceFunction) Tuple2(org.apache.flink.api.java.tuple.Tuple2) Collector(org.apache.flink.util.Collector) HashSet(java.util.HashSet)

Example 10 with GroupReduceFunction

use of org.apache.flink.api.common.functions.GroupReduceFunction in project flink by apache.

the class GroupReduceOperatorTest method testGroupReduceCollectionWithRuntimeContext.

@Test
public void testGroupReduceCollectionWithRuntimeContext() {
    try {
        final String taskName = "Test Task";
        final AtomicBoolean opened = new AtomicBoolean();
        final AtomicBoolean closed = new AtomicBoolean();
        final RichGroupReduceFunction<Tuple2<String, Integer>, Tuple2<String, Integer>> reducer = new RichGroupReduceFunction<Tuple2<String, Integer>, Tuple2<String, Integer>>() {

            @Override
            public void reduce(Iterable<Tuple2<String, Integer>> values, Collector<Tuple2<String, Integer>> out) throws Exception {
                Iterator<Tuple2<String, Integer>> input = values.iterator();
                Tuple2<String, Integer> result = input.next();
                int sum = result.f1;
                while (input.hasNext()) {
                    Tuple2<String, Integer> next = input.next();
                    sum += next.f1;
                }
                result.f1 = sum;
                out.collect(result);
            }

            @Override
            public void open(Configuration parameters) throws Exception {
                opened.set(true);
                RuntimeContext ctx = getRuntimeContext();
                assertEquals(0, ctx.getIndexOfThisSubtask());
                assertEquals(1, ctx.getNumberOfParallelSubtasks());
                assertEquals(taskName, ctx.getTaskName());
            }

            @Override
            public void close() throws Exception {
                closed.set(true);
            }
        };
        GroupReduceOperatorBase<Tuple2<String, Integer>, Tuple2<String, Integer>, GroupReduceFunction<Tuple2<String, Integer>, Tuple2<String, Integer>>> op = new GroupReduceOperatorBase<Tuple2<String, Integer>, Tuple2<String, Integer>, GroupReduceFunction<Tuple2<String, Integer>, Tuple2<String, Integer>>>(reducer, new UnaryOperatorInformation<Tuple2<String, Integer>, Tuple2<String, Integer>>(TypeInfoParser.<Tuple2<String, Integer>>parse("Tuple2<String, Integer>"), TypeInfoParser.<Tuple2<String, Integer>>parse("Tuple2<String, Integer>")), new int[] { 0 }, "TestReducer");
        List<Tuple2<String, Integer>> input = new ArrayList<Tuple2<String, Integer>>(asList(new Tuple2<String, Integer>("foo", 1), new Tuple2<String, Integer>("foo", 3), new Tuple2<String, Integer>("bar", 2), new Tuple2<String, Integer>("bar", 4)));
        final TaskInfo taskInfo = new TaskInfo(taskName, 1, 0, 1, 0);
        ExecutionConfig executionConfig = new ExecutionConfig();
        executionConfig.disableObjectReuse();
        List<Tuple2<String, Integer>> resultMutableSafe = op.executeOnCollections(input, new RuntimeUDFContext(taskInfo, null, executionConfig, new HashMap<String, Future<Path>>(), new HashMap<String, Accumulator<?, ?>>(), new UnregisteredMetricsGroup()), executionConfig);
        executionConfig.enableObjectReuse();
        List<Tuple2<String, Integer>> resultRegular = op.executeOnCollections(input, new RuntimeUDFContext(taskInfo, null, executionConfig, new HashMap<String, Future<Path>>(), new HashMap<String, Accumulator<?, ?>>(), new UnregisteredMetricsGroup()), executionConfig);
        Set<Tuple2<String, Integer>> resultSetMutableSafe = new HashSet<Tuple2<String, Integer>>(resultMutableSafe);
        Set<Tuple2<String, Integer>> resultSetRegular = new HashSet<Tuple2<String, Integer>>(resultRegular);
        Set<Tuple2<String, Integer>> expectedResult = new HashSet<Tuple2<String, Integer>>(asList(new Tuple2<String, Integer>("foo", 4), new Tuple2<String, Integer>("bar", 6)));
        assertEquals(expectedResult, resultSetMutableSafe);
        assertEquals(expectedResult, resultSetRegular);
        assertTrue(opened.get());
        assertTrue(closed.get());
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}

Also used : RichGroupReduceFunction(org.apache.flink.api.common.functions.RichGroupReduceFunction) UnregisteredMetricsGroup(org.apache.flink.metrics.groups.UnregisteredMetricsGroup) Configuration(org.apache.flink.configuration.Configuration) HashMap(java.util.HashMap) ArrayList(java.util.ArrayList) ExecutionConfig(org.apache.flink.api.common.ExecutionConfig) TaskInfo(org.apache.flink.api.common.TaskInfo) Collector(org.apache.flink.util.Collector) RuntimeUDFContext(org.apache.flink.api.common.functions.util.RuntimeUDFContext) HashSet(java.util.HashSet) Path(org.apache.flink.core.fs.Path) RichGroupReduceFunction(org.apache.flink.api.common.functions.RichGroupReduceFunction) GroupReduceFunction(org.apache.flink.api.common.functions.GroupReduceFunction) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) Tuple2(org.apache.flink.api.java.tuple.Tuple2) RuntimeContext(org.apache.flink.api.common.functions.RuntimeContext) Test(org.junit.Test)

Aggregations

GroupReduceFunction (org.apache.flink.api.common.functions.GroupReduceFunction)13 RichGroupReduceFunction (org.apache.flink.api.common.functions.RichGroupReduceFunction)11 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)10 Test (org.junit.Test)8 ExecutionConfig (org.apache.flink.api.common.ExecutionConfig)7 TupleTypeInfo (org.apache.flink.api.java.typeutils.TupleTypeInfo)5 GroupReduceDriver (org.apache.flink.runtime.operators.GroupReduceDriver)5 Collector (org.apache.flink.util.Collector)5 RegularToMutableObjectIterator (org.apache.flink.runtime.util.RegularToMutableObjectIterator)4 HashSet (java.util.HashSet)3 UnaryOperatorInformation (org.apache.flink.api.common.operators.UnaryOperatorInformation)3 GroupReduceOperatorBase (org.apache.flink.api.common.operators.base.GroupReduceOperatorBase)3 ExecutionEnvironment (org.apache.flink.api.java.ExecutionEnvironment)3 IntValue (org.apache.flink.types.IntValue)3 StringValue (org.apache.flink.types.StringValue)3 ArrayList (java.util.ArrayList)2 Keys (org.apache.flink.api.common.operators.Keys)2 SingleInputSemanticProperties (org.apache.flink.api.common.operators.SingleInputSemanticProperties)2 AggregationFunction (org.apache.flink.api.java.aggregation.AggregationFunction)2 HashMap (java.util.HashMap)1