Search in sources :

Example 11 with IdentityMapper

use of org.apache.flink.optimizer.testfunctions.IdentityMapper in project flink by apache.

the class SortPartialReuseTest method testCustomPartitioningNotReused.

@Test
public void testCustomPartitioningNotReused() {
    try {
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        @SuppressWarnings("unchecked") DataSet<Tuple3<Long, Long, Long>> input = env.fromElements(new Tuple3<Long, Long, Long>(0L, 0L, 0L));
        input.partitionCustom(new Partitioner<Long>() {

            @Override
            public int partition(Long key, int numPartitions) {
                return 0;
            }
        }, 0).map(new IdentityMapper<Tuple3<Long, Long, Long>>()).withForwardedFields("0", "1", "2").groupBy(0, 1).reduceGroup(new IdentityGroupReducerCombinable<Tuple3<Long, Long, Long>>()).withForwardedFields("0", "1", "2").groupBy(1).reduceGroup(new IdentityGroupReducerCombinable<Tuple3<Long, Long, Long>>()).output(new DiscardingOutputFormat<Tuple3<Long, Long, Long>>());
        Plan p = env.createProgramPlan();
        OptimizedPlan op = compileNoStats(p);
        SinkPlanNode sink = op.getDataSinks().iterator().next();
        SingleInputPlanNode reducer2 = (SingleInputPlanNode) sink.getInput().getSource();
        SingleInputPlanNode combiner = (SingleInputPlanNode) reducer2.getInput().getSource();
        SingleInputPlanNode reducer1 = (SingleInputPlanNode) combiner.getInput().getSource();
        assertEquals(ShipStrategyType.FORWARD, sink.getInput().getShipStrategy());
        // should be locally forwarding, reusing sort and partitioning
        assertEquals(ShipStrategyType.PARTITION_HASH, reducer2.getInput().getShipStrategy());
        assertEquals(LocalStrategy.COMBININGSORT, reducer2.getInput().getLocalStrategy());
        assertEquals(ShipStrategyType.FORWARD, combiner.getInput().getShipStrategy());
        assertEquals(LocalStrategy.NONE, combiner.getInput().getLocalStrategy());
        assertEquals(ShipStrategyType.FORWARD, reducer1.getInput().getShipStrategy());
        assertEquals(LocalStrategy.COMBININGSORT, reducer1.getInput().getLocalStrategy());
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}
Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) SingleInputPlanNode(org.apache.flink.optimizer.plan.SingleInputPlanNode) IdentityMapper(org.apache.flink.optimizer.testfunctions.IdentityMapper) Tuple3(org.apache.flink.api.java.tuple.Tuple3) IdentityGroupReducerCombinable(org.apache.flink.optimizer.testfunctions.IdentityGroupReducerCombinable) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) Test(org.junit.Test)

Example 12 with IdentityMapper

use of org.apache.flink.optimizer.testfunctions.IdentityMapper in project flink by apache.

the class JobGraphGeneratorTest method testGeneratingJobGraphWithUnconsumedResultPartition.

@Test
public void testGeneratingJobGraphWithUnconsumedResultPartition() {
    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    DataSet<Tuple2<Long, Long>> input = env.fromElements(new Tuple2<>(1L, 2L)).setParallelism(1);
    DataSet<Tuple2<Long, Long>> ds = input.map(new IdentityMapper<>()).setParallelism(3);
    AbstractID intermediateDataSetID = new AbstractID();
    // this output branch will be excluded.
    ds.output(BlockingShuffleOutputFormat.createOutputFormat(intermediateDataSetID)).setParallelism(1);
    // this is the normal output branch.
    ds.output(new DiscardingOutputFormat<>()).setParallelism(1);
    JobGraph jobGraph = compileJob(env);
    Assert.assertEquals(3, jobGraph.getVerticesSortedTopologicallyFromSources().size());
    JobVertex mapVertex = jobGraph.getVerticesSortedTopologicallyFromSources().get(1);
    Assert.assertThat(mapVertex, Matchers.instanceOf(JobVertex.class));
    // there are 2 output result with one of them is ResultPartitionType.BLOCKING_PERSISTENT
    Assert.assertEquals(2, mapVertex.getProducedDataSets().size());
    Assert.assertTrue(mapVertex.getProducedDataSets().stream().anyMatch(dataSet -> dataSet.getId().equals(new IntermediateDataSetID(intermediateDataSetID)) && dataSet.getResultType() == ResultPartitionType.BLOCKING_PERSISTENT));
}
Also used : CoreMatchers.is(org.hamcrest.CoreMatchers.is) LongSumAggregator(org.apache.flink.api.common.aggregators.LongSumAggregator) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) Tuple2(org.apache.flink.api.java.tuple.Tuple2) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) ResultPartitionType(org.apache.flink.runtime.io.network.partition.ResultPartitionType) HashMap(java.util.HashMap) JobType(org.apache.flink.runtime.jobgraph.JobType) MapFunction(org.apache.flink.api.common.functions.MapFunction) DataSink(org.apache.flink.api.java.operators.DataSink) Assert.assertThat(org.junit.Assert.assertThat) DataSet(org.apache.flink.api.java.DataSet) JobGraphUtils(org.apache.flink.runtime.jobgraph.JobGraphUtils) DeltaIteration(org.apache.flink.api.java.operators.DeltaIteration) ResourceSpec(org.apache.flink.api.common.operators.ResourceSpec) Map(java.util.Map) Plan(org.apache.flink.api.common.Plan) Optimizer(org.apache.flink.optimizer.Optimizer) BlockingShuffleOutputFormat(org.apache.flink.api.java.io.BlockingShuffleOutputFormat) IdentityMapper(org.apache.flink.optimizer.testfunctions.IdentityMapper) Method(java.lang.reflect.Method) Path(java.nio.file.Path) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) DiscardingOutputFormat(org.apache.flink.api.java.io.DiscardingOutputFormat) Files(java.nio.file.Files) AbstractID(org.apache.flink.util.AbstractID) Assert.assertNotNull(org.junit.Assert.assertNotNull) IterativeDataSet(org.apache.flink.api.java.operators.IterativeDataSet) Configuration(org.apache.flink.configuration.Configuration) Matchers(org.hamcrest.Matchers) Assert.assertTrue(org.junit.Assert.assertTrue) Test(org.junit.Test) IOException(java.io.IOException) IntermediateDataSetID(org.apache.flink.runtime.jobgraph.IntermediateDataSetID) DistributedCache(org.apache.flink.api.common.cache.DistributedCache) Operator(org.apache.flink.api.java.operators.Operator) FilterFunction(org.apache.flink.api.common.functions.FilterFunction) JobID(org.apache.flink.api.common.JobID) Rule(org.junit.Rule) ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) Assert.assertFalse(org.junit.Assert.assertFalse) Assert(org.junit.Assert) TemporaryFolder(org.junit.rules.TemporaryFolder) Assert.assertEquals(org.junit.Assert.assertEquals) ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) IdentityMapper(org.apache.flink.optimizer.testfunctions.IdentityMapper) Tuple2(org.apache.flink.api.java.tuple.Tuple2) IntermediateDataSetID(org.apache.flink.runtime.jobgraph.IntermediateDataSetID) AbstractID(org.apache.flink.util.AbstractID) DiscardingOutputFormat(org.apache.flink.api.java.io.DiscardingOutputFormat) Test(org.junit.Test)

Example 13 with IdentityMapper

use of org.apache.flink.optimizer.testfunctions.IdentityMapper in project flink by apache.

the class DistinctAndGroupingOptimizerTest method testDistinctDestroysPartitioningOfNonDistinctFields.

@Test
public void testDistinctDestroysPartitioningOfNonDistinctFields() {
    try {
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(4);
        @SuppressWarnings("unchecked") DataSet<Tuple2<Long, Long>> data = env.fromElements(new Tuple2<Long, Long>(0L, 0L), new Tuple2<Long, Long>(1L, 1L)).map(new IdentityMapper<Tuple2<Long, Long>>()).setParallelism(4);
        data.distinct(1).groupBy(0).sum(1).output(new DiscardingOutputFormat<Tuple2<Long, Long>>());
        Plan p = env.createProgramPlan();
        OptimizedPlan op = compileNoStats(p);
        SinkPlanNode sink = op.getDataSinks().iterator().next();
        SingleInputPlanNode reducer = (SingleInputPlanNode) sink.getInput().getSource();
        SingleInputPlanNode combiner = (SingleInputPlanNode) reducer.getInput().getSource();
        SingleInputPlanNode distinctReducer = (SingleInputPlanNode) combiner.getInput().getSource();
        assertEquals(ShipStrategyType.FORWARD, sink.getInput().getShipStrategy());
        // reducer must repartition, because it works on a different field
        assertEquals(ShipStrategyType.PARTITION_HASH, reducer.getInput().getShipStrategy());
        assertEquals(ShipStrategyType.FORWARD, combiner.getInput().getShipStrategy());
        // distinct reducer is partitioned
        assertEquals(ShipStrategyType.PARTITION_HASH, distinctReducer.getInput().getShipStrategy());
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}
Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) SingleInputPlanNode(org.apache.flink.optimizer.plan.SingleInputPlanNode) IdentityMapper(org.apache.flink.optimizer.testfunctions.IdentityMapper) Tuple2(org.apache.flink.api.java.tuple.Tuple2) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) Test(org.junit.Test)

Example 14 with IdentityMapper

use of org.apache.flink.optimizer.testfunctions.IdentityMapper in project flink by apache.

the class IterationCompilerTest method testWorksetIterationWithUnionRoot.

@Test
public void testWorksetIterationWithUnionRoot() {
    try {
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(43);
        DataSet<Tuple2<Long, Long>> input = env.generateSequence(1, 20).map(new MapFunction<Long, Tuple2<Long, Long>>() {

            @Override
            public Tuple2<Long, Long> map(Long value) {
                return null;
            }
        });
        DeltaIteration<Tuple2<Long, Long>, Tuple2<Long, Long>> iter = input.iterateDelta(input, 100, 0);
        iter.closeWith(iter.getWorkset().map(new IdentityMapper<Tuple2<Long, Long>>()).union(iter.getWorkset().map(new IdentityMapper<Tuple2<Long, Long>>())), iter.getWorkset().map(new IdentityMapper<Tuple2<Long, Long>>()).union(iter.getWorkset().map(new IdentityMapper<Tuple2<Long, Long>>()))).output(new DiscardingOutputFormat<Tuple2<Long, Long>>());
        Plan p = env.createProgramPlan();
        OptimizedPlan op = compileNoStats(p);
        SinkPlanNode sink = op.getDataSinks().iterator().next();
        WorksetIterationPlanNode iterNode = (WorksetIterationPlanNode) sink.getInput().getSource();
        // make sure that the root is part of the dynamic path
        // the "NoOp"a that come after the union.
        SingleInputPlanNode nextWorksetNoop = (SingleInputPlanNode) iterNode.getNextWorkSetPlanNode();
        SingleInputPlanNode solutionDeltaNoop = (SingleInputPlanNode) iterNode.getSolutionSetDeltaPlanNode();
        NAryUnionPlanNode nextWorksetUnion = (NAryUnionPlanNode) nextWorksetNoop.getInput().getSource();
        NAryUnionPlanNode solutionDeltaUnion = (NAryUnionPlanNode) solutionDeltaNoop.getInput().getSource();
        assertTrue(nextWorksetNoop.isOnDynamicPath());
        assertTrue(nextWorksetNoop.getCostWeight() >= 1);
        assertTrue(solutionDeltaNoop.isOnDynamicPath());
        assertTrue(solutionDeltaNoop.getCostWeight() >= 1);
        assertTrue(nextWorksetUnion.isOnDynamicPath());
        assertTrue(nextWorksetUnion.getCostWeight() >= 1);
        assertTrue(solutionDeltaUnion.isOnDynamicPath());
        assertTrue(solutionDeltaUnion.getCostWeight() >= 1);
        new JobGraphGenerator().compileJobGraph(op);
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}
Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) WorksetIterationPlanNode(org.apache.flink.optimizer.plan.WorksetIterationPlanNode) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) SingleInputPlanNode(org.apache.flink.optimizer.plan.SingleInputPlanNode) NAryUnionPlanNode(org.apache.flink.optimizer.plan.NAryUnionPlanNode) IdentityMapper(org.apache.flink.optimizer.testfunctions.IdentityMapper) Tuple2(org.apache.flink.api.java.tuple.Tuple2) JobGraphGenerator(org.apache.flink.optimizer.plantranslate.JobGraphGenerator) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) Test(org.junit.Test)

Example 15 with IdentityMapper

use of org.apache.flink.optimizer.testfunctions.IdentityMapper in project flink by apache.

the class IterationCompilerTest method testIterationWithUnionRoot.

@Test
public void testIterationWithUnionRoot() {
    try {
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(43);
        IterativeDataSet<Long> iteration = env.generateSequence(-4, 1000).iterate(100);
        iteration.closeWith(iteration.map(new IdentityMapper<Long>()).union(iteration.map(new IdentityMapper<Long>()))).output(new DiscardingOutputFormat<Long>());
        Plan p = env.createProgramPlan();
        OptimizedPlan op = compileNoStats(p);
        SinkPlanNode sink = op.getDataSinks().iterator().next();
        BulkIterationPlanNode iterNode = (BulkIterationPlanNode) sink.getInput().getSource();
        // make sure that the root is part of the dynamic path
        // the "NoOp" that comes after the union.
        SingleInputPlanNode noop = (SingleInputPlanNode) iterNode.getRootOfStepFunction();
        NAryUnionPlanNode union = (NAryUnionPlanNode) noop.getInput().getSource();
        assertTrue(noop.isOnDynamicPath());
        assertTrue(noop.getCostWeight() >= 1);
        assertTrue(union.isOnDynamicPath());
        assertTrue(union.getCostWeight() >= 1);
        // see that the jobgraph generator can translate this
        new JobGraphGenerator().compileJobGraph(op);
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}
Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) SingleInputPlanNode(org.apache.flink.optimizer.plan.SingleInputPlanNode) NAryUnionPlanNode(org.apache.flink.optimizer.plan.NAryUnionPlanNode) IdentityMapper(org.apache.flink.optimizer.testfunctions.IdentityMapper) JobGraphGenerator(org.apache.flink.optimizer.plantranslate.JobGraphGenerator) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) BulkIterationPlanNode(org.apache.flink.optimizer.plan.BulkIterationPlanNode) Test(org.junit.Test)

Aggregations

Plan (org.apache.flink.api.common.Plan)27 ExecutionEnvironment (org.apache.flink.api.java.ExecutionEnvironment)27 OptimizedPlan (org.apache.flink.optimizer.plan.OptimizedPlan)27 IdentityMapper (org.apache.flink.optimizer.testfunctions.IdentityMapper)27 Test (org.junit.Test)27 SinkPlanNode (org.apache.flink.optimizer.plan.SinkPlanNode)16 SingleInputPlanNode (org.apache.flink.optimizer.plan.SingleInputPlanNode)15 IdentityGroupReducer (org.apache.flink.optimizer.testfunctions.IdentityGroupReducer)9 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)7 JobGraphGenerator (org.apache.flink.optimizer.plantranslate.JobGraphGenerator)7 ShipStrategyType (org.apache.flink.runtime.operators.shipping.ShipStrategyType)4 DiscardingOutputFormat (org.apache.flink.api.java.io.DiscardingOutputFormat)3 Tuple3 (org.apache.flink.api.java.tuple.Tuple3)3 DualInputPlanNode (org.apache.flink.optimizer.plan.DualInputPlanNode)3 NAryUnionPlanNode (org.apache.flink.optimizer.plan.NAryUnionPlanNode)3 IdentityGroupReducerCombinable (org.apache.flink.optimizer.testfunctions.IdentityGroupReducerCombinable)3 InvalidProgramException (org.apache.flink.api.common.InvalidProgramException)2 DataSet (org.apache.flink.api.java.DataSet)2 IdentityCrosser (org.apache.flink.optimizer.testfunctions.IdentityCrosser)2 Assert (org.junit.Assert)2