Search in sources :

Example 1 with MultiOutputInfo

use of com.google.api.services.dataflow.model.MultiOutputInfo in project beam by apache.

the class MapTaskToNetworkFunctionTest method testMultipleOutput.

@Test
public void testMultipleOutput() {
    // /---> WriteA
    // Read ---> ParDo
    // \---> WriteB
    InstructionOutput readOutput = createInstructionOutput("Read.out");
    ParallelInstruction read = createParallelInstruction("Read", readOutput);
    read.setRead(new ReadInstruction());
    MultiOutputInfo parDoMultiOutput1 = createMultiOutputInfo("output1");
    MultiOutputInfo parDoMultiOutput2 = createMultiOutputInfo("output2");
    ParDoInstruction parDoInstruction = new ParDoInstruction();
    // Read.out
    parDoInstruction.setInput(createInstructionInput(0, 0));
    parDoInstruction.setMultiOutputInfos(ImmutableList.of(parDoMultiOutput1, parDoMultiOutput2));
    InstructionOutput parDoOutput1 = createInstructionOutput("ParDo.out1");
    InstructionOutput parDoOutput2 = createInstructionOutput("ParDo.out2");
    ParallelInstruction parDo = createParallelInstruction("ParDo", parDoOutput1, parDoOutput2);
    parDo.setParDo(parDoInstruction);
    WriteInstruction writeAInstruction = new WriteInstruction();
    // ParDo.out1
    writeAInstruction.setInput(createInstructionInput(1, 0));
    ParallelInstruction writeA = createParallelInstruction("WriteA");
    writeA.setWrite(writeAInstruction);
    WriteInstruction writeBInstruction = new WriteInstruction();
    // ParDo.out2
    writeBInstruction.setInput(createInstructionInput(1, 1));
    ParallelInstruction writeB = createParallelInstruction("WriteB");
    writeB.setWrite(writeBInstruction);
    MapTask mapTask = new MapTask();
    mapTask.setInstructions(ImmutableList.of(read, parDo, writeA, writeB));
    mapTask.setFactory(Transport.getJsonFactory());
    Network<Node, Edge> network = new MapTaskToNetworkFunction(IdGenerators.decrementingLongs()).apply(mapTask);
    assertNetworkProperties(network);
    assertEquals(7, network.nodes().size());
    assertEquals(6, network.edges().size());
    ParallelInstructionNode parDoNode = get(network, parDo);
    ParallelInstructionNode writeANode = get(network, writeA);
    ParallelInstructionNode writeBNode = get(network, writeB);
    InstructionOutputNode parDoOutput1Node = getOnlyPredecessor(network, writeANode);
    assertEquals(parDoOutput1, parDoOutput1Node.getInstructionOutput());
    InstructionOutputNode parDoOutput2Node = getOnlyPredecessor(network, writeBNode);
    assertEquals(parDoOutput2, parDoOutput2Node.getInstructionOutput());
    assertThat(network.successors(parDoNode), Matchers.<Node>containsInAnyOrder(parDoOutput1Node, parDoOutput2Node));
    assertEquals(parDoMultiOutput1, ((MultiOutputInfoEdge) Iterables.getOnlyElement(network.edgesConnecting(parDoNode, parDoOutput1Node))).getMultiOutputInfo());
    assertEquals(parDoMultiOutput2, ((MultiOutputInfoEdge) Iterables.getOnlyElement(network.edgesConnecting(parDoNode, parDoOutput2Node))).getMultiOutputInfo());
}
Also used : MultiOutputInfo(com.google.api.services.dataflow.model.MultiOutputInfo) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) ReadInstruction(com.google.api.services.dataflow.model.ReadInstruction) ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) ParDoInstruction(com.google.api.services.dataflow.model.ParDoInstruction) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) MapTask(com.google.api.services.dataflow.model.MapTask) WriteInstruction(com.google.api.services.dataflow.model.WriteInstruction) Edge(org.apache.beam.runners.dataflow.worker.graph.Edges.Edge) DefaultEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge) Test(org.junit.Test)

Example 2 with MultiOutputInfo

use of com.google.api.services.dataflow.model.MultiOutputInfo in project beam by apache.

the class MapTaskToNetworkFunctionTest method createMultiOutputInfo.

private static MultiOutputInfo createMultiOutputInfo(String tag) {
    MultiOutputInfo rval = new MultiOutputInfo();
    rval.setTag(tag);
    return rval;
}
Also used : MultiOutputInfo(com.google.api.services.dataflow.model.MultiOutputInfo)

Example 3 with MultiOutputInfo

use of com.google.api.services.dataflow.model.MultiOutputInfo in project beam by apache.

the class FixMultiOutputInfosOnParDoInstructionsTest method createMapTaskWithParDo.

private static MapTask createMapTaskWithParDo(int numOutputs, String... tags) {
    ParDoInstruction parDoInstruction = new ParDoInstruction();
    parDoInstruction.setNumOutputs(numOutputs);
    List<MultiOutputInfo> multiOutputInfos = new ArrayList<>(tags.length);
    for (String tag : tags) {
        MultiOutputInfo multiOutputInfo = new MultiOutputInfo();
        multiOutputInfo.setTag(tag);
        multiOutputInfos.add(multiOutputInfo);
    }
    parDoInstruction.setMultiOutputInfos(multiOutputInfos);
    ParallelInstruction instruction = new ParallelInstruction();
    instruction.setParDo(parDoInstruction);
    MapTask mapTask = new MapTask();
    mapTask.setInstructions(ImmutableList.of(instruction));
    return mapTask;
}
Also used : ParDoInstruction(com.google.api.services.dataflow.model.ParDoInstruction) ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) MultiOutputInfo(com.google.api.services.dataflow.model.MultiOutputInfo) MapTask(com.google.api.services.dataflow.model.MapTask) ArrayList(java.util.ArrayList)

Example 4 with MultiOutputInfo

use of com.google.api.services.dataflow.model.MultiOutputInfo in project beam by apache.

the class EdgesTest method testInstructionOutputNode.

@Test
public void testInstructionOutputNode() {
    MultiOutputInfo param = new MultiOutputInfo();
    MultiOutputInfoEdge edge = MultiOutputInfoEdge.create(param);
    assertSame(param, edge.getMultiOutputInfo());
    assertNotEquals(edge, MultiOutputInfoEdge.create(param));
    assertNotEquals(edge, edge.clone());
    assertSame(param, MultiOutputInfoEdge.create(param).clone().getMultiOutputInfo());
}
Also used : MultiOutputInfo(com.google.api.services.dataflow.model.MultiOutputInfo) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge) Test(org.junit.Test)

Example 5 with MultiOutputInfo

use of com.google.api.services.dataflow.model.MultiOutputInfo in project beam by apache.

the class RemoveFlattenInstructionsFunctionTest method testRemoveFlattenOnMultiOutputInstruction.

@Test
public void testRemoveFlattenOnMultiOutputInstruction() {
    Node a = ParallelInstructionNode.create(new ParallelInstruction().setName("A"), Nodes.ExecutionLocation.UNKNOWN);
    Node aOut1PCollection = InstructionOutputNode.create(new InstructionOutput().setName("A.out1"), PCOLLECTION_ID);
    Node aOut2PCollection = InstructionOutputNode.create(new InstructionOutput().setName("A.out2"), PCOLLECTION_ID);
    Node aOut3PCollection = InstructionOutputNode.create(new InstructionOutput().setName("A.out3"), PCOLLECTION_ID);
    Edge aOut1 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out1"));
    Edge aOut2 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out2"));
    Edge aOut3 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out3"));
    Edge aOut1PCollectionEdge = DefaultEdge.create();
    Node b = ParallelInstructionNode.create(new ParallelInstruction().setName("B"), Nodes.ExecutionLocation.UNKNOWN);
    Node bOut1PCollection = InstructionOutputNode.create(new InstructionOutput().setName("B.out1"), PCOLLECTION_ID);
    Node bOut2PCollection = InstructionOutputNode.create(new InstructionOutput().setName("B.out1"), PCOLLECTION_ID);
    Edge bOut1 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out1"));
    Edge bOut2 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out2"));
    Edge bOut1PCollectionEdge = DefaultEdge.create();
    Node flatten = ParallelInstructionNode.create(new ParallelInstruction().setName("Flatten").setFlatten(new FlattenInstruction()), Nodes.ExecutionLocation.UNKNOWN);
    Node flattenPCollection = InstructionOutputNode.create(new InstructionOutput().setName("Flatten.out"), PCOLLECTION_ID);
    Node c = ParallelInstructionNode.create(new ParallelInstruction().setName("C"), Nodes.ExecutionLocation.UNKNOWN);
    Edge cOutput = DefaultEdge.create();
    Node cPCollection = InstructionOutputNode.create(new InstructionOutput().setName("C.out"), PCOLLECTION_ID);
    Node d = ParallelInstructionNode.create(new ParallelInstruction().setName("D"), Nodes.ExecutionLocation.UNKNOWN);
    Edge dOutput = DefaultEdge.create();
    Node dPCollection = InstructionOutputNode.create(new InstructionOutput().setName("D.out"), PCOLLECTION_ID);
    Node e = ParallelInstructionNode.create(new ParallelInstruction().setName("E"), Nodes.ExecutionLocation.UNKNOWN);
    Edge eOutput = DefaultEdge.create();
    Node ePCollection = InstructionOutputNode.create(new InstructionOutput().setName("E.out"), PCOLLECTION_ID);
    // /-out1-> C
    // A -out2-\
    // \-out3--> Flatten --> D
    // B -out2-/
    // \-out1-> E
    MutableNetwork<Node, Edge> network = createEmptyNetwork();
    network.addNode(a);
    network.addNode(aOut1PCollection);
    network.addNode(aOut2PCollection);
    network.addNode(aOut3PCollection);
    network.addNode(b);
    network.addNode(bOut1PCollection);
    network.addNode(bOut2PCollection);
    network.addNode(flatten);
    network.addNode(flattenPCollection);
    network.addNode(c);
    network.addNode(cPCollection);
    network.addNode(d);
    network.addNode(dPCollection);
    network.addNode(e);
    network.addNode(ePCollection);
    network.addEdge(a, aOut1PCollection, aOut1);
    network.addEdge(a, aOut2PCollection, aOut2);
    network.addEdge(a, aOut3PCollection, aOut3);
    network.addEdge(aOut1PCollection, c, aOut1PCollectionEdge);
    network.addEdge(aOut2PCollection, flatten, DefaultEdge.create());
    network.addEdge(aOut3PCollection, flatten, DefaultEdge.create());
    network.addEdge(b, bOut1PCollection, bOut1);
    network.addEdge(b, bOut2PCollection, bOut2);
    network.addEdge(bOut1PCollection, e, bOut1PCollectionEdge);
    network.addEdge(bOut2PCollection, flatten, DefaultEdge.create());
    network.addEdge(flatten, flattenPCollection, DefaultEdge.create());
    network.addEdge(flattenPCollection, d, DefaultEdge.create());
    network.addEdge(c, cPCollection, cOutput);
    network.addEdge(d, dPCollection, dOutput);
    network.addEdge(e, ePCollection, eOutput);
    // /-out1-> C
    // A -out2-\
    // \-out3--> D
    // B -out2-/
    // \-out1-> E
    assertThatFlattenIsProperlyRemoved(network);
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) MultiOutputInfo(com.google.api.services.dataflow.model.MultiOutputInfo) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) Edge(org.apache.beam.runners.dataflow.worker.graph.Edges.Edge) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge) DefaultEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge) FlattenInstruction(com.google.api.services.dataflow.model.FlattenInstruction) Test(org.junit.Test)

Aggregations

MultiOutputInfo (com.google.api.services.dataflow.model.MultiOutputInfo)13 ParallelInstruction (com.google.api.services.dataflow.model.ParallelInstruction)10 InstructionOutput (com.google.api.services.dataflow.model.InstructionOutput)8 ParDoInstruction (com.google.api.services.dataflow.model.ParDoInstruction)8 MultiOutputInfoEdge (org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge)8 Edge (org.apache.beam.runners.dataflow.worker.graph.Edges.Edge)7 InstructionOutputNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode)7 Node (org.apache.beam.runners.dataflow.worker.graph.Nodes.Node)7 ParallelInstructionNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode)7 DefaultEdge (org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge)6 Test (org.junit.Test)5 ReadInstruction (com.google.api.services.dataflow.model.ReadInstruction)4 CloudObject (org.apache.beam.runners.dataflow.util.CloudObject)4 MapTask (com.google.api.services.dataflow.model.MapTask)3 FlattenInstruction (com.google.api.services.dataflow.model.FlattenInstruction)2 InstructionInput (com.google.api.services.dataflow.model.InstructionInput)2 IOException (java.io.IOException)2 ArrayList (java.util.ArrayList)2 HashMap (java.util.HashMap)2 Map (java.util.Map)2