Search in sources :

Example 1 with FlattenInstruction

use of com.google.api.services.dataflow.model.FlattenInstruction in project beam by apache.

the class RemoveFlattenInstructionsFunctionTest method testFlattenMultiplePCollectionsHavingMultipleConsumers.

@Test
public void testFlattenMultiplePCollectionsHavingMultipleConsumers() {
    Node a = ParallelInstructionNode.create(new ParallelInstruction().setName("A"), Nodes.ExecutionLocation.UNKNOWN);
    Node aPCollection = InstructionOutputNode.create(new InstructionOutput().setName("A.out"), PCOLLECTION_ID);
    Edge aOutput = DefaultEdge.create();
    Node b = ParallelInstructionNode.create(new ParallelInstruction().setName("B"), Nodes.ExecutionLocation.UNKNOWN);
    Edge bOutput = DefaultEdge.create();
    Node bPCollection = InstructionOutputNode.create(new InstructionOutput().setName("B.out"), PCOLLECTION_ID);
    Node flatten = ParallelInstructionNode.create(new ParallelInstruction().setName("Flatten").setFlatten(new FlattenInstruction()), Nodes.ExecutionLocation.UNKNOWN);
    Node flattenPCollection = InstructionOutputNode.create(new InstructionOutput().setName("Flatten.out"), PCOLLECTION_ID);
    Node c = ParallelInstructionNode.create(new ParallelInstruction().setName("C"), Nodes.ExecutionLocation.UNKNOWN);
    Edge cOutput = DefaultEdge.create();
    Node cPCollection = InstructionOutputNode.create(new InstructionOutput().setName("C.out"), PCOLLECTION_ID);
    Node d = ParallelInstructionNode.create(new ParallelInstruction().setName("D"), Nodes.ExecutionLocation.UNKNOWN);
    Edge dOutput = DefaultEdge.create();
    Node dPCollection = InstructionOutputNode.create(new InstructionOutput().setName("D.out"), PCOLLECTION_ID);
    // A --\
    // -> Flatten --> C
    // B --/-------------> D
    MutableNetwork<Node, Edge> network = createEmptyNetwork();
    network.addNode(a);
    network.addNode(aPCollection);
    network.addNode(b);
    network.addNode(bPCollection);
    network.addNode(flatten);
    network.addNode(flattenPCollection);
    network.addNode(c);
    network.addNode(cPCollection);
    network.addEdge(a, aPCollection, aOutput);
    network.addEdge(aPCollection, flatten, DefaultEdge.create());
    network.addEdge(b, bPCollection, bOutput);
    network.addEdge(bPCollection, flatten, DefaultEdge.create());
    network.addEdge(bPCollection, d, DefaultEdge.create());
    network.addEdge(flatten, flattenPCollection, DefaultEdge.create());
    network.addEdge(flattenPCollection, c, DefaultEdge.create());
    network.addEdge(c, cPCollection, cOutput);
    network.addEdge(d, dPCollection, dOutput);
    // A --\
    // -> C
    // B --/-> D
    assertThatFlattenIsProperlyRemoved(network);
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) Edge(org.apache.beam.runners.dataflow.worker.graph.Edges.Edge) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge) DefaultEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge) FlattenInstruction(com.google.api.services.dataflow.model.FlattenInstruction) Test(org.junit.Test)

Example 2 with FlattenInstruction

use of com.google.api.services.dataflow.model.FlattenInstruction in project beam by apache.

the class RemoveFlattenInstructionsFunctionTest method testRemoveFlattenOnMultiOutputInstruction.

@Test
public void testRemoveFlattenOnMultiOutputInstruction() {
    Node a = ParallelInstructionNode.create(new ParallelInstruction().setName("A"), Nodes.ExecutionLocation.UNKNOWN);
    Node aOut1PCollection = InstructionOutputNode.create(new InstructionOutput().setName("A.out1"), PCOLLECTION_ID);
    Node aOut2PCollection = InstructionOutputNode.create(new InstructionOutput().setName("A.out2"), PCOLLECTION_ID);
    Node aOut3PCollection = InstructionOutputNode.create(new InstructionOutput().setName("A.out3"), PCOLLECTION_ID);
    Edge aOut1 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out1"));
    Edge aOut2 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out2"));
    Edge aOut3 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out3"));
    Edge aOut1PCollectionEdge = DefaultEdge.create();
    Node b = ParallelInstructionNode.create(new ParallelInstruction().setName("B"), Nodes.ExecutionLocation.UNKNOWN);
    Node bOut1PCollection = InstructionOutputNode.create(new InstructionOutput().setName("B.out1"), PCOLLECTION_ID);
    Node bOut2PCollection = InstructionOutputNode.create(new InstructionOutput().setName("B.out1"), PCOLLECTION_ID);
    Edge bOut1 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out1"));
    Edge bOut2 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out2"));
    Edge bOut1PCollectionEdge = DefaultEdge.create();
    Node flatten = ParallelInstructionNode.create(new ParallelInstruction().setName("Flatten").setFlatten(new FlattenInstruction()), Nodes.ExecutionLocation.UNKNOWN);
    Node flattenPCollection = InstructionOutputNode.create(new InstructionOutput().setName("Flatten.out"), PCOLLECTION_ID);
    Node c = ParallelInstructionNode.create(new ParallelInstruction().setName("C"), Nodes.ExecutionLocation.UNKNOWN);
    Edge cOutput = DefaultEdge.create();
    Node cPCollection = InstructionOutputNode.create(new InstructionOutput().setName("C.out"), PCOLLECTION_ID);
    Node d = ParallelInstructionNode.create(new ParallelInstruction().setName("D"), Nodes.ExecutionLocation.UNKNOWN);
    Edge dOutput = DefaultEdge.create();
    Node dPCollection = InstructionOutputNode.create(new InstructionOutput().setName("D.out"), PCOLLECTION_ID);
    Node e = ParallelInstructionNode.create(new ParallelInstruction().setName("E"), Nodes.ExecutionLocation.UNKNOWN);
    Edge eOutput = DefaultEdge.create();
    Node ePCollection = InstructionOutputNode.create(new InstructionOutput().setName("E.out"), PCOLLECTION_ID);
    // /-out1-> C
    // A -out2-\
    // \-out3--> Flatten --> D
    // B -out2-/
    // \-out1-> E
    MutableNetwork<Node, Edge> network = createEmptyNetwork();
    network.addNode(a);
    network.addNode(aOut1PCollection);
    network.addNode(aOut2PCollection);
    network.addNode(aOut3PCollection);
    network.addNode(b);
    network.addNode(bOut1PCollection);
    network.addNode(bOut2PCollection);
    network.addNode(flatten);
    network.addNode(flattenPCollection);
    network.addNode(c);
    network.addNode(cPCollection);
    network.addNode(d);
    network.addNode(dPCollection);
    network.addNode(e);
    network.addNode(ePCollection);
    network.addEdge(a, aOut1PCollection, aOut1);
    network.addEdge(a, aOut2PCollection, aOut2);
    network.addEdge(a, aOut3PCollection, aOut3);
    network.addEdge(aOut1PCollection, c, aOut1PCollectionEdge);
    network.addEdge(aOut2PCollection, flatten, DefaultEdge.create());
    network.addEdge(aOut3PCollection, flatten, DefaultEdge.create());
    network.addEdge(b, bOut1PCollection, bOut1);
    network.addEdge(b, bOut2PCollection, bOut2);
    network.addEdge(bOut1PCollection, e, bOut1PCollectionEdge);
    network.addEdge(bOut2PCollection, flatten, DefaultEdge.create());
    network.addEdge(flatten, flattenPCollection, DefaultEdge.create());
    network.addEdge(flattenPCollection, d, DefaultEdge.create());
    network.addEdge(c, cPCollection, cOutput);
    network.addEdge(d, dPCollection, dOutput);
    network.addEdge(e, ePCollection, eOutput);
    // /-out1-> C
    // A -out2-\
    // \-out3--> D
    // B -out2-/
    // \-out1-> E
    assertThatFlattenIsProperlyRemoved(network);
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) MultiOutputInfo(com.google.api.services.dataflow.model.MultiOutputInfo) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) Edge(org.apache.beam.runners.dataflow.worker.graph.Edges.Edge) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge) DefaultEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge) FlattenInstruction(com.google.api.services.dataflow.model.FlattenInstruction) Test(org.junit.Test)

Example 3 with FlattenInstruction

use of com.google.api.services.dataflow.model.FlattenInstruction in project beam by apache.

the class MapTaskToNetworkFunctionTest method testParallelEdgeFlatten.

@Test
public void testParallelEdgeFlatten() {
    // /---\
    // Read --> Read.out --> Flatten
    // \---/
    InstructionOutput readOutput = createInstructionOutput("Read.out");
    ParallelInstruction read = createParallelInstruction("Read", readOutput);
    read.setRead(new ReadInstruction());
    FlattenInstruction flattenInstruction = new FlattenInstruction();
    flattenInstruction.setInputs(ImmutableList.of(// Read.out
    createInstructionInput(0, 0), // Read.out
    createInstructionInput(0, 0), // Read.out
    createInstructionInput(0, 0)));
    InstructionOutput flattenOutput = createInstructionOutput("Flatten.out");
    ParallelInstruction flatten = createParallelInstruction("Flatten", flattenOutput);
    flatten.setFlatten(flattenInstruction);
    MapTask mapTask = new MapTask();
    mapTask.setInstructions(ImmutableList.of(read, flatten));
    mapTask.setFactory(Transport.getJsonFactory());
    Network<Node, Edge> network = new MapTaskToNetworkFunction(IdGenerators.decrementingLongs()).apply(mapTask);
    assertNetworkProperties(network);
    assertEquals(4, network.nodes().size());
    assertEquals(5, network.edges().size());
    ParallelInstructionNode readNode = get(network, read);
    InstructionOutputNode readOutputNode = getOnlySuccessor(network, readNode);
    assertEquals(readOutput, readOutputNode.getInstructionOutput());
    ParallelInstructionNode flattenNode = getOnlySuccessor(network, readOutputNode);
    // Assert that the three parallel edges are maintained
    assertEquals(3, network.edgesConnecting(readOutputNode, flattenNode).size());
    InstructionOutputNode flattenOutputNode = getOnlySuccessor(network, flattenNode);
    assertEquals(flattenOutput, flattenOutputNode.getInstructionOutput());
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) MapTask(com.google.api.services.dataflow.model.MapTask) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) ReadInstruction(com.google.api.services.dataflow.model.ReadInstruction) FlattenInstruction(com.google.api.services.dataflow.model.FlattenInstruction) Edge(org.apache.beam.runners.dataflow.worker.graph.Edges.Edge) DefaultEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge) Test(org.junit.Test)

Example 4 with FlattenInstruction

use of com.google.api.services.dataflow.model.FlattenInstruction in project beam by apache.

the class IntrinsicMapTaskExecutorFactoryTest method createFlattenInstruction.

static ParallelInstruction createFlattenInstruction(int producerIndex1, int producerOutputNum1, int producerIndex2, int producerOutputNum2, String systemName) {
    List<InstructionInput> cloudInputs = new ArrayList<>();
    InstructionInput cloudInput1 = new InstructionInput();
    cloudInput1.setProducerInstructionIndex(producerIndex1);
    cloudInput1.setOutputNum(producerOutputNum1);
    cloudInputs.add(cloudInput1);
    InstructionInput cloudInput2 = new InstructionInput();
    cloudInput2.setProducerInstructionIndex(producerIndex2);
    cloudInput2.setOutputNum(producerOutputNum2);
    cloudInputs.add(cloudInput2);
    FlattenInstruction flattenInstruction = new FlattenInstruction();
    flattenInstruction.setInputs(cloudInputs);
    InstructionOutput output = new InstructionOutput();
    output.setName("flatten_output_name");
    output.setCodec(CloudObjects.asCloudObject(StringUtf8Coder.of(), /*sdkComponents=*/
    null));
    output.setOriginalName("originalName");
    output.setSystemName("systemName");
    ParallelInstruction instruction = new ParallelInstruction();
    instruction.setFlatten(flattenInstruction);
    instruction.setOutputs(Arrays.asList(output));
    instruction.setSystemName(systemName);
    instruction.setOriginalName(systemName + "OriginalName");
    return instruction;
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) ArrayList(java.util.ArrayList) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) InstructionInput(com.google.api.services.dataflow.model.InstructionInput) FlattenInstruction(com.google.api.services.dataflow.model.FlattenInstruction)

Example 5 with FlattenInstruction

use of com.google.api.services.dataflow.model.FlattenInstruction in project beam by apache.

the class DeduceNodeLocationsFunctionTest method testGraphWithNonDeducibleNodes.

/**
 * Tests that graphs with deducible and non-deducible nodes are maintained correctly.
 */
@Test
public void testGraphWithNonDeducibleNodes() throws Exception {
    // A --> out1 --\
    // --> Flatten --> D
    // B --> out2 --/-->C
    Node a = createReadNode("A", CUSTOM_SOURCE);
    Node out1 = InstructionOutputNode.create(new InstructionOutput(), "fakeId");
    Node b = createReadNode("B", RUNNER_SOURCE);
    Node out2 = InstructionOutputNode.create(new InstructionOutput(), "fakeId");
    Node c = createParDoNode("C", "RunnerDoFn");
    Node flatten = ParallelInstructionNode.create(new ParallelInstruction().setName("Flatten").setFlatten(new FlattenInstruction()), Nodes.ExecutionLocation.UNKNOWN);
    Node d = createParDoNode("D", DO_FN);
    MutableNetwork<Node, Edge> network = createEmptyNetwork();
    network.addNode(a);
    network.addNode(out1);
    network.addNode(b);
    network.addNode(out2);
    network.addNode(c);
    network.addNode(flatten);
    network.addNode(d);
    network.addEdge(a, out1, DefaultEdge.create());
    network.addEdge(b, out2, DefaultEdge.create());
    network.addEdge(out1, flatten, DefaultEdge.create());
    network.addEdge(out2, flatten, DefaultEdge.create());
    network.addEdge(out2, c, DefaultEdge.create());
    network.addEdge(flatten, d, DefaultEdge.create());
    Network<Node, Edge> inputNetwork = ImmutableNetwork.copyOf(network);
    network = new DeduceNodeLocationsFunction().apply(network);
    assertThatNetworksAreIdentical(inputNetwork, network);
    assertAllNodesDeducedExceptFlattens(network);
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) FlattenInstruction(com.google.api.services.dataflow.model.FlattenInstruction) Edge(org.apache.beam.runners.dataflow.worker.graph.Edges.Edge) DefaultEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge) Test(org.junit.Test)

Aggregations

FlattenInstruction (com.google.api.services.dataflow.model.FlattenInstruction)8 InstructionOutput (com.google.api.services.dataflow.model.InstructionOutput)8 ParallelInstruction (com.google.api.services.dataflow.model.ParallelInstruction)8 DefaultEdge (org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge)7 Edge (org.apache.beam.runners.dataflow.worker.graph.Edges.Edge)7 InstructionOutputNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode)7 Node (org.apache.beam.runners.dataflow.worker.graph.Nodes.Node)7 ParallelInstructionNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode)7 Test (org.junit.Test)7 MultiOutputInfoEdge (org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge)6 MapTask (com.google.api.services.dataflow.model.MapTask)2 MultiOutputInfo (com.google.api.services.dataflow.model.MultiOutputInfo)2 ReadInstruction (com.google.api.services.dataflow.model.ReadInstruction)2 InstructionInput (com.google.api.services.dataflow.model.InstructionInput)1 ArrayList (java.util.ArrayList)1