Examples with IdentityMapper - org.apache.flink.optimizer.testfunctions.IdentityMapper

Example 6 with IdentityMapper

use of org.apache.flink.optimizer.testfunctions.IdentityMapper in project flink by apache.

the class ParallelismChangeTest method checkPropertyHandlingWithIncreasingGlobalParallelism2.

/**
 * Simple Job: Map -> Reduce -> Map -> Reduce. All functions preserve all fields (hence all
 * properties).
 *
 * <p>Increases parallelism between 2nd map and 2nd reduce, so the hash partitioning from 1st
 * reduce is not reusable. Expected to re-establish partitioning between map and reduce (hash).
 */
@Test
public void checkPropertyHandlingWithIncreasingGlobalParallelism2() {
    final int p = DEFAULT_PARALLELISM;
    // construct the plan
    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(p);
    DataSet<Long> set1 = env.generateSequence(0, 1).setParallelism(p);
    set1.map(new IdentityMapper<Long>()).withForwardedFields("*").setParallelism(p).name("Map1").groupBy("*").reduceGroup(new IdentityGroupReducer<Long>()).withForwardedFields("*").setParallelism(p).name("Reduce1").map(new IdentityMapper<Long>()).withForwardedFields("*").setParallelism(p).name("Map2").groupBy("*").reduceGroup(new IdentityGroupReducer<Long>()).withForwardedFields("*").setParallelism(p * 2).name("Reduce2").output(new DiscardingOutputFormat<Long>()).setParallelism(p * 2).name("Sink");
    Plan plan = env.createProgramPlan();
    // submit the plan to the compiler
    OptimizedPlan oPlan = compileNoStats(plan);
    // check the optimized Plan
    // when reducer 1 distributes its data across the instances of map2, it needs to employ a
    // local hash method,
    // because map2 has twice as many instances and key/value pairs with the same key need to be
    // processed by the same
    // mapper respectively reducer
    SinkPlanNode sinkNode = oPlan.getDataSinks().iterator().next();
    SingleInputPlanNode red2Node = (SingleInputPlanNode) sinkNode.getPredecessor();
    SingleInputPlanNode map2Node = (SingleInputPlanNode) red2Node.getPredecessor();
    ShipStrategyType mapIn = map2Node.getInput().getShipStrategy();
    ShipStrategyType reduceIn = red2Node.getInput().getShipStrategy();
    Assert.assertEquals("Invalid ship strategy for an operator.", ShipStrategyType.FORWARD, mapIn);
    Assert.assertEquals("Invalid ship strategy for an operator.", ShipStrategyType.PARTITION_HASH, reduceIn);
}

Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) ShipStrategyType(org.apache.flink.runtime.operators.shipping.ShipStrategyType) SingleInputPlanNode(org.apache.flink.optimizer.plan.SingleInputPlanNode) IdentityMapper(org.apache.flink.optimizer.testfunctions.IdentityMapper) IdentityGroupReducer(org.apache.flink.optimizer.testfunctions.IdentityGroupReducer) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) Test(org.junit.Test)

Example 7 with IdentityMapper

use of org.apache.flink.optimizer.testfunctions.IdentityMapper in project flink by apache.

the class NestedIterationsTest method testBulkIterationInClosure.

@Test
public void testBulkIterationInClosure() {
    try {
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        DataSet<Long> data1 = env.generateSequence(1, 100);
        DataSet<Long> data2 = env.generateSequence(1, 100);
        IterativeDataSet<Long> firstIteration = data1.iterate(100);
        DataSet<Long> firstResult = firstIteration.closeWith(firstIteration.map(new IdentityMapper<Long>()));
        IterativeDataSet<Long> mainIteration = data2.map(new IdentityMapper<Long>()).iterate(100);
        DataSet<Long> joined = mainIteration.join(firstResult).where(new IdentityKeyExtractor<Long>()).equalTo(new IdentityKeyExtractor<Long>()).with(new DummyFlatJoinFunction<Long>());
        DataSet<Long> mainResult = mainIteration.closeWith(joined);
        mainResult.output(new DiscardingOutputFormat<Long>());
        Plan p = env.createProgramPlan();
        // optimizer should be able to translate this
        OptimizedPlan op = compileNoStats(p);
        // job graph generator should be able to translate this
        new JobGraphGenerator().compileJobGraph(op);
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}

Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) IdentityKeyExtractor(org.apache.flink.optimizer.testfunctions.IdentityKeyExtractor) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) IdentityMapper(org.apache.flink.optimizer.testfunctions.IdentityMapper) JobGraphGenerator(org.apache.flink.optimizer.plantranslate.JobGraphGenerator) Test(org.junit.Test)

Example 8 with IdentityMapper

use of org.apache.flink.optimizer.testfunctions.IdentityMapper in project flink by apache.

the class PipelineBreakerTest method testPipelineBreakerBroadcastedPartialSolution.

/**
 * <pre>
 *                                +----------- ITERATION ---------+
 *                                |                               |
 *                               +--+                           +----+
 *  (source 1) ----------------->|PS| ------------ +        +-->|next|---> (sink)
 *                               +--+              | (BC)   |   +----+
 *                                |                V        |     |
 *  (source 2) --> (map) --+------|-----------> (MAPPER) ---+     |
 *                         |      |                ^              |
 *                         |      |                | (BC)         |
 *                         |      +----------------|--------------+
 *                         |                       |
 *                         +--(map) --> (reduce) --+
 * </pre>
 */
@Test
public void testPipelineBreakerBroadcastedPartialSolution() {
    try {
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        env.getConfig().setExecutionMode(ExecutionMode.PIPELINED);
        env.setParallelism(64);
        DataSet<Long> initialSource = env.generateSequence(1, 10);
        IterativeDataSet<Long> iteration = initialSource.iterate(100);
        DataSet<Long> sourceWithMapper = env.generateSequence(1, 10).map(new IdentityMapper<Long>());
        DataSet<Long> bcInput1 = sourceWithMapper.map(new IdentityMapper<Long>()).reduce(new SelectOneReducer<Long>());
        DataSet<Long> result = sourceWithMapper.map(new IdentityMapper<Long>()).withBroadcastSet(iteration, "bc2").withBroadcastSet(bcInput1, "bc1");
        iteration.closeWith(result).output(new DiscardingOutputFormat<Long>());
        Plan p = env.createProgramPlan();
        OptimizedPlan op = compileNoStats(p);
        SinkPlanNode sink = op.getDataSinks().iterator().next();
        BulkIterationPlanNode iterationPlanNode = (BulkIterationPlanNode) sink.getInput().getSource();
        SingleInputPlanNode mapper = (SingleInputPlanNode) iterationPlanNode.getRootOfStepFunction();
        assertEquals(TempMode.CACHED, mapper.getInput().getTempMode());
        assertEquals(DataExchangeMode.BATCH, mapper.getInput().getDataExchangeMode());
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}

Example 9 with IdentityMapper

use of org.apache.flink.optimizer.testfunctions.IdentityMapper in project flink by apache.

the class CoGroupCustomPartitioningTest method testIncompatibleHashAndCustomPartitioning.

@Test
public void testIncompatibleHashAndCustomPartitioning() {
    try {
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        DataSet<Tuple3<Long, Long, Long>> input = env.fromElements(new Tuple3<Long, Long, Long>(0L, 0L, 0L));
        DataSet<Tuple3<Long, Long, Long>> partitioned = input.partitionCustom(new Partitioner<Long>() {

            @Override
            public int partition(Long key, int numPartitions) {
                return 0;
            }
        }, 0).map(new IdentityMapper<Tuple3<Long, Long, Long>>()).withForwardedFields("0", "1", "2");
        DataSet<Tuple3<Long, Long, Long>> grouped = partitioned.distinct(0, 1).groupBy(1).sortGroup(0, Order.ASCENDING).reduceGroup(new IdentityGroupReducerCombinable<Tuple3<Long, Long, Long>>()).withForwardedFields("0", "1");
        grouped.coGroup(partitioned).where(0).equalTo(0).with(new DummyCoGroupFunction<Tuple3<Long, Long, Long>, Tuple3<Long, Long, Long>>()).output(new DiscardingOutputFormat<Tuple2<Tuple3<Long, Long, Long>, Tuple3<Long, Long, Long>>>());
        Plan p = env.createProgramPlan();
        OptimizedPlan op = compileNoStats(p);
        SinkPlanNode sink = op.getDataSinks().iterator().next();
        DualInputPlanNode coGroup = (DualInputPlanNode) sink.getInput().getSource();
        assertEquals(ShipStrategyType.PARTITION_HASH, coGroup.getInput1().getShipStrategy());
        assertTrue(coGroup.getInput2().getShipStrategy() == ShipStrategyType.PARTITION_HASH || coGroup.getInput2().getShipStrategy() == ShipStrategyType.FORWARD);
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}

Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) InvalidProgramException(org.apache.flink.api.common.InvalidProgramException) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) DualInputPlanNode(org.apache.flink.optimizer.plan.DualInputPlanNode) IdentityMapper(org.apache.flink.optimizer.testfunctions.IdentityMapper) Tuple2(org.apache.flink.api.java.tuple.Tuple2) Tuple3(org.apache.flink.api.java.tuple.Tuple3) IdentityGroupReducerCombinable(org.apache.flink.optimizer.testfunctions.IdentityGroupReducerCombinable) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) DummyCoGroupFunction(org.apache.flink.optimizer.testfunctions.DummyCoGroupFunction) Test(org.junit.Test)

Example 10 with IdentityMapper

use of org.apache.flink.optimizer.testfunctions.IdentityMapper in project flink by apache.

the class SortPartialReuseTest method testPartialPartitioningReuse.

@Test
public void testPartialPartitioningReuse() {
    try {
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        @SuppressWarnings("unchecked") DataSet<Tuple3<Long, Long, Long>> input = env.fromElements(new Tuple3<Long, Long, Long>(0L, 0L, 0L));
        input.partitionByHash(0).map(new IdentityMapper<Tuple3<Long, Long, Long>>()).withForwardedFields("0", "1", "2").groupBy(0, 1).reduceGroup(new IdentityGroupReducerCombinable<Tuple3<Long, Long, Long>>()).withForwardedFields("0", "1", "2").groupBy(0).reduceGroup(new IdentityGroupReducerCombinable<Tuple3<Long, Long, Long>>()).output(new DiscardingOutputFormat<Tuple3<Long, Long, Long>>());
        Plan p = env.createProgramPlan();
        OptimizedPlan op = compileNoStats(p);
        SinkPlanNode sink = op.getDataSinks().iterator().next();
        SingleInputPlanNode reducer2 = (SingleInputPlanNode) sink.getInput().getSource();
        SingleInputPlanNode reducer1 = (SingleInputPlanNode) reducer2.getInput().getSource();
        assertEquals(ShipStrategyType.FORWARD, sink.getInput().getShipStrategy());
        // should be locally forwarding, reusing sort and partitioning
        assertEquals(ShipStrategyType.FORWARD, reducer2.getInput().getShipStrategy());
        assertEquals(LocalStrategy.NONE, reducer2.getInput().getLocalStrategy());
        assertEquals(ShipStrategyType.FORWARD, reducer1.getInput().getShipStrategy());
        assertEquals(LocalStrategy.COMBININGSORT, reducer1.getInput().getLocalStrategy());
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}

Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) SingleInputPlanNode(org.apache.flink.optimizer.plan.SingleInputPlanNode) IdentityMapper(org.apache.flink.optimizer.testfunctions.IdentityMapper) Tuple3(org.apache.flink.api.java.tuple.Tuple3) IdentityGroupReducerCombinable(org.apache.flink.optimizer.testfunctions.IdentityGroupReducerCombinable) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) Test(org.junit.Test)

Aggregations

Plan (org.apache.flink.api.common.Plan)27 ExecutionEnvironment (org.apache.flink.api.java.ExecutionEnvironment)27 OptimizedPlan (org.apache.flink.optimizer.plan.OptimizedPlan)27 IdentityMapper (org.apache.flink.optimizer.testfunctions.IdentityMapper)27 Test (org.junit.Test)27 SinkPlanNode (org.apache.flink.optimizer.plan.SinkPlanNode)16 SingleInputPlanNode (org.apache.flink.optimizer.plan.SingleInputPlanNode)15 IdentityGroupReducer (org.apache.flink.optimizer.testfunctions.IdentityGroupReducer)9 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)7 JobGraphGenerator (org.apache.flink.optimizer.plantranslate.JobGraphGenerator)7 ShipStrategyType (org.apache.flink.runtime.operators.shipping.ShipStrategyType)4 DiscardingOutputFormat (org.apache.flink.api.java.io.DiscardingOutputFormat)3 Tuple3 (org.apache.flink.api.java.tuple.Tuple3)3 DualInputPlanNode (org.apache.flink.optimizer.plan.DualInputPlanNode)3 NAryUnionPlanNode (org.apache.flink.optimizer.plan.NAryUnionPlanNode)3 IdentityGroupReducerCombinable (org.apache.flink.optimizer.testfunctions.IdentityGroupReducerCombinable)3 InvalidProgramException (org.apache.flink.api.common.InvalidProgramException)2 DataSet (org.apache.flink.api.java.DataSet)2 IdentityCrosser (org.apache.flink.optimizer.testfunctions.IdentityCrosser)2 Assert (org.junit.Assert)2