Search in sources :

Example 51 with DualInputPlanNode

use of org.apache.flink.optimizer.plan.DualInputPlanNode in project flink by apache.

the class UnionReplacementTest method testConsecutiveUnionsWithBroadcast.

/**
	 *
	 * Checks that a plan with consecutive UNIONs followed by broadcast-fwd JOIN is correctly translated.
	 *
	 * The program can be illustrated as follows:
	 *
	 * Src1 -\
	 *        >-> Union12--<
	 * Src2 -/              \
	 *                       >-> Union123 --> bc-fwd-Join -> Output
	 * Src3 ----------------/             /
	 *                                   /
	 * Src4 ----------------------------/
	 *
	 * In the resulting plan, the broadcasting must be
	 * pushed to the inputs of the unions (Src1, Src2, Src3).
	 *
	 */
@Test
public void testConsecutiveUnionsWithBroadcast() throws Exception {
    // -----------------------------------------------------------------------------------------
    // Build test program
    // -----------------------------------------------------------------------------------------
    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(DEFAULT_PARALLELISM);
    DataSet<Tuple2<Long, Long>> src1 = env.fromElements(new Tuple2<>(0L, 0L));
    DataSet<Tuple2<Long, Long>> src2 = env.fromElements(new Tuple2<>(0L, 0L));
    DataSet<Tuple2<Long, Long>> src3 = env.fromElements(new Tuple2<>(0L, 0L));
    DataSet<Tuple2<Long, Long>> src4 = env.fromElements(new Tuple2<>(0L, 0L));
    DataSet<Tuple2<Long, Long>> union12 = src1.union(src2);
    DataSet<Tuple2<Long, Long>> union123 = union12.union(src3);
    union123.join(src4, JoinOperatorBase.JoinHint.BROADCAST_HASH_FIRST).where(0).equalTo(0).name("join").output(new DiscardingOutputFormat<Tuple2<Tuple2<Long, Long>, Tuple2<Long, Long>>>()).name("out");
    // -----------------------------------------------------------------------------------------
    // Verify optimized plan
    // -----------------------------------------------------------------------------------------
    OptimizedPlan optimizedPlan = compileNoStats(env.createProgramPlan());
    OptimizerPlanNodeResolver resolver = getOptimizerPlanNodeResolver(optimizedPlan);
    DualInputPlanNode join = resolver.getNode("join");
    // check input of join is broadcasted
    assertEquals("First join input should be fully replicated.", PartitioningProperty.FULL_REPLICATION, join.getInput1().getGlobalProperties().getPartitioning());
    NAryUnionPlanNode union = (NAryUnionPlanNode) join.getInput1().getSource();
    // check that all union inputs are broadcasted
    for (Channel c : union.getInputs()) {
        assertEquals("Union input should be fully replicated", PartitioningProperty.FULL_REPLICATION, c.getGlobalProperties().getPartitioning());
        assertEquals("Union input channel should be broadcasting", ShipStrategyType.BROADCAST, c.getShipStrategy());
    }
}
Also used : DualInputPlanNode(org.apache.flink.optimizer.plan.DualInputPlanNode) NAryUnionPlanNode(org.apache.flink.optimizer.plan.NAryUnionPlanNode) ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) Tuple2(org.apache.flink.api.java.tuple.Tuple2) Channel(org.apache.flink.optimizer.plan.Channel) DiscardingOutputFormat(org.apache.flink.api.java.io.DiscardingOutputFormat) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) Test(org.junit.Test)

Example 52 with DualInputPlanNode

use of org.apache.flink.optimizer.plan.DualInputPlanNode in project flink by apache.

the class WorksetIterationsRecordApiCompilerTest method testRecordApiWithDirectSoltionSetUpdate.

@Test
public void testRecordApiWithDirectSoltionSetUpdate() {
    Plan plan = getTestPlan(true, false);
    OptimizedPlan oPlan;
    try {
        oPlan = compileNoStats(plan);
    } catch (CompilerException ce) {
        ce.printStackTrace();
        fail("The pact compiler is unable to compile this plan correctly.");
        // silence the compiler
        return;
    }
    OptimizerPlanNodeResolver resolver = getOptimizerPlanNodeResolver(oPlan);
    DualInputPlanNode joinWithInvariantNode = resolver.getNode(JOIN_WITH_INVARIANT_NAME);
    DualInputPlanNode joinWithSolutionSetNode = resolver.getNode(JOIN_WITH_SOLUTION_SET);
    SingleInputPlanNode worksetReducer = resolver.getNode(NEXT_WORKSET_REDUCER_NAME);
    // iteration preserves partitioning in reducer, so the first partitioning is out of the loop, 
    // the in-loop partitioning is before the final reducer
    // verify joinWithInvariant
    assertEquals(ShipStrategyType.FORWARD, joinWithInvariantNode.getInput1().getShipStrategy());
    assertEquals(ShipStrategyType.PARTITION_HASH, joinWithInvariantNode.getInput2().getShipStrategy());
    assertEquals(list0, joinWithInvariantNode.getKeysForInput1());
    assertEquals(list0, joinWithInvariantNode.getKeysForInput2());
    // verify joinWithSolutionSet
    assertEquals(ShipStrategyType.FORWARD, joinWithSolutionSetNode.getInput1().getShipStrategy());
    assertEquals(ShipStrategyType.FORWARD, joinWithSolutionSetNode.getInput2().getShipStrategy());
    // verify reducer
    assertEquals(ShipStrategyType.FORWARD, worksetReducer.getInput().getShipStrategy());
    assertEquals(list0, worksetReducer.getKeys(0));
    // verify solution delta
    assertEquals(1, joinWithSolutionSetNode.getOutgoingChannels().size());
    assertEquals(ShipStrategyType.FORWARD, joinWithSolutionSetNode.getOutgoingChannels().get(0).getShipStrategy());
    new JobGraphGenerator().compileJobGraph(oPlan);
}
Also used : DualInputPlanNode(org.apache.flink.optimizer.plan.DualInputPlanNode) SingleInputPlanNode(org.apache.flink.optimizer.plan.SingleInputPlanNode) JobGraphGenerator(org.apache.flink.optimizer.plantranslate.JobGraphGenerator) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) Test(org.junit.Test)

Example 53 with DualInputPlanNode

use of org.apache.flink.optimizer.plan.DualInputPlanNode in project flink by apache.

the class ReplicatingDataSourceTest method checkJoinWithReplicatedSourceInputBehindMap.

/**
	 * Tests join program with replicated data source behind map.
	 */
@Test
public void checkJoinWithReplicatedSourceInputBehindMap() {
    ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
    env.setParallelism(DEFAULT_PARALLELISM);
    TupleTypeInfo<Tuple1<String>> typeInfo = TupleTypeInfo.getBasicTupleTypeInfo(String.class);
    ReplicatingInputFormat<Tuple1<String>, FileInputSplit> rif = new ReplicatingInputFormat<Tuple1<String>, FileInputSplit>(new TupleCsvInputFormat<Tuple1<String>>(new Path("/some/path"), typeInfo));
    DataSet<Tuple1<String>> source1 = env.createInput(rif, new TupleTypeInfo<Tuple1<String>>(BasicTypeInfo.STRING_TYPE_INFO));
    DataSet<Tuple1<String>> source2 = env.readCsvFile("/some/otherpath").types(String.class);
    DataSink<Tuple2<Tuple1<String>, Tuple1<String>>> out = source1.map(new IdMap()).join(source2).where("*").equalTo("*").writeAsText("/some/newpath");
    Plan plan = env.createProgramPlan();
    // submit the plan to the compiler
    OptimizedPlan oPlan = compileNoStats(plan);
    // check the optimized Plan
    // when join should have forward strategy on both sides
    SinkPlanNode sinkNode = oPlan.getDataSinks().iterator().next();
    DualInputPlanNode joinNode = (DualInputPlanNode) sinkNode.getPredecessor();
    ShipStrategyType joinIn1 = joinNode.getInput1().getShipStrategy();
    ShipStrategyType joinIn2 = joinNode.getInput2().getShipStrategy();
    Assert.assertEquals("Invalid ship strategy for an operator.", ShipStrategyType.FORWARD, joinIn1);
    Assert.assertEquals("Invalid ship strategy for an operator.", ShipStrategyType.FORWARD, joinIn2);
}
Also used : Path(org.apache.flink.core.fs.Path) ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) ShipStrategyType(org.apache.flink.runtime.operators.shipping.ShipStrategyType) DualInputPlanNode(org.apache.flink.optimizer.plan.DualInputPlanNode) ReplicatingInputFormat(org.apache.flink.api.common.io.ReplicatingInputFormat) FileInputSplit(org.apache.flink.core.fs.FileInputSplit) Tuple1(org.apache.flink.api.java.tuple.Tuple1) Tuple2(org.apache.flink.api.java.tuple.Tuple2) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) Test(org.junit.Test)

Example 54 with DualInputPlanNode

use of org.apache.flink.optimizer.plan.DualInputPlanNode in project flink by apache.

the class ReplicatingDataSourceTest method checkJoinWithReplicatedSourceInputBehindFlatMap.

/**
	 * Tests join program with replicated data source behind flatMap.
	 */
@Test
public void checkJoinWithReplicatedSourceInputBehindFlatMap() {
    ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
    env.setParallelism(DEFAULT_PARALLELISM);
    TupleTypeInfo<Tuple1<String>> typeInfo = TupleTypeInfo.getBasicTupleTypeInfo(String.class);
    ReplicatingInputFormat<Tuple1<String>, FileInputSplit> rif = new ReplicatingInputFormat<Tuple1<String>, FileInputSplit>(new TupleCsvInputFormat<Tuple1<String>>(new Path("/some/path"), typeInfo));
    DataSet<Tuple1<String>> source1 = env.createInput(rif, new TupleTypeInfo<Tuple1<String>>(BasicTypeInfo.STRING_TYPE_INFO));
    DataSet<Tuple1<String>> source2 = env.readCsvFile("/some/otherpath").types(String.class);
    DataSink<Tuple2<Tuple1<String>, Tuple1<String>>> out = source1.flatMap(new IdFlatMap()).join(source2).where("*").equalTo("*").writeAsText("/some/newpath");
    Plan plan = env.createProgramPlan();
    // submit the plan to the compiler
    OptimizedPlan oPlan = compileNoStats(plan);
    // check the optimized Plan
    // when join should have forward strategy on both sides
    SinkPlanNode sinkNode = oPlan.getDataSinks().iterator().next();
    DualInputPlanNode joinNode = (DualInputPlanNode) sinkNode.getPredecessor();
    ShipStrategyType joinIn1 = joinNode.getInput1().getShipStrategy();
    ShipStrategyType joinIn2 = joinNode.getInput2().getShipStrategy();
    Assert.assertEquals("Invalid ship strategy for an operator.", ShipStrategyType.FORWARD, joinIn1);
    Assert.assertEquals("Invalid ship strategy for an operator.", ShipStrategyType.FORWARD, joinIn2);
}
Also used : Path(org.apache.flink.core.fs.Path) ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) ShipStrategyType(org.apache.flink.runtime.operators.shipping.ShipStrategyType) DualInputPlanNode(org.apache.flink.optimizer.plan.DualInputPlanNode) ReplicatingInputFormat(org.apache.flink.api.common.io.ReplicatingInputFormat) FileInputSplit(org.apache.flink.core.fs.FileInputSplit) Tuple1(org.apache.flink.api.java.tuple.Tuple1) Tuple2(org.apache.flink.api.java.tuple.Tuple2) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) Test(org.junit.Test)

Example 55 with DualInputPlanNode

use of org.apache.flink.optimizer.plan.DualInputPlanNode in project flink by apache.

the class ReplicatingDataSourceTest method checkJoinWithReplicatedSourceInputBehindFilter.

/**
	 * Tests join program with replicated data source behind filter.
	 */
@Test
public void checkJoinWithReplicatedSourceInputBehindFilter() {
    ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
    env.setParallelism(DEFAULT_PARALLELISM);
    TupleTypeInfo<Tuple1<String>> typeInfo = TupleTypeInfo.getBasicTupleTypeInfo(String.class);
    ReplicatingInputFormat<Tuple1<String>, FileInputSplit> rif = new ReplicatingInputFormat<Tuple1<String>, FileInputSplit>(new TupleCsvInputFormat<Tuple1<String>>(new Path("/some/path"), typeInfo));
    DataSet<Tuple1<String>> source1 = env.createInput(rif, new TupleTypeInfo<Tuple1<String>>(BasicTypeInfo.STRING_TYPE_INFO));
    DataSet<Tuple1<String>> source2 = env.readCsvFile("/some/otherpath").types(String.class);
    DataSink<Tuple2<Tuple1<String>, Tuple1<String>>> out = source1.filter(new NoFilter()).join(source2).where("*").equalTo("*").writeAsText("/some/newpath");
    Plan plan = env.createProgramPlan();
    // submit the plan to the compiler
    OptimizedPlan oPlan = compileNoStats(plan);
    // check the optimized Plan
    // when join should have forward strategy on both sides
    SinkPlanNode sinkNode = oPlan.getDataSinks().iterator().next();
    DualInputPlanNode joinNode = (DualInputPlanNode) sinkNode.getPredecessor();
    ShipStrategyType joinIn1 = joinNode.getInput1().getShipStrategy();
    ShipStrategyType joinIn2 = joinNode.getInput2().getShipStrategy();
    Assert.assertEquals("Invalid ship strategy for an operator.", ShipStrategyType.FORWARD, joinIn1);
    Assert.assertEquals("Invalid ship strategy for an operator.", ShipStrategyType.FORWARD, joinIn2);
}
Also used : Path(org.apache.flink.core.fs.Path) ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) ShipStrategyType(org.apache.flink.runtime.operators.shipping.ShipStrategyType) DualInputPlanNode(org.apache.flink.optimizer.plan.DualInputPlanNode) ReplicatingInputFormat(org.apache.flink.api.common.io.ReplicatingInputFormat) FileInputSplit(org.apache.flink.core.fs.FileInputSplit) Tuple1(org.apache.flink.api.java.tuple.Tuple1) Tuple2(org.apache.flink.api.java.tuple.Tuple2) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) Test(org.junit.Test)

Aggregations

DualInputPlanNode (org.apache.flink.optimizer.plan.DualInputPlanNode)96 Test (org.junit.Test)86 OptimizedPlan (org.apache.flink.optimizer.plan.OptimizedPlan)81 Plan (org.apache.flink.api.common.Plan)76 SinkPlanNode (org.apache.flink.optimizer.plan.SinkPlanNode)67 ExecutionEnvironment (org.apache.flink.api.java.ExecutionEnvironment)65 Tuple3 (org.apache.flink.api.java.tuple.Tuple3)36 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)31 SingleInputPlanNode (org.apache.flink.optimizer.plan.SingleInputPlanNode)27 JobGraphGenerator (org.apache.flink.optimizer.plantranslate.JobGraphGenerator)19 Channel (org.apache.flink.optimizer.plan.Channel)14 WorksetIterationPlanNode (org.apache.flink.optimizer.plan.WorksetIterationPlanNode)13 FieldList (org.apache.flink.api.common.operators.util.FieldList)12 InvalidProgramException (org.apache.flink.api.common.InvalidProgramException)11 DiscardingOutputFormat (org.apache.flink.api.java.io.DiscardingOutputFormat)11 PlanNode (org.apache.flink.optimizer.plan.PlanNode)11 Tuple1 (org.apache.flink.api.java.tuple.Tuple1)10 SourcePlanNode (org.apache.flink.optimizer.plan.SourcePlanNode)10 ShipStrategyType (org.apache.flink.runtime.operators.shipping.ShipStrategyType)10 ReplicatingInputFormat (org.apache.flink.api.common.io.ReplicatingInputFormat)8