Search in sources :

Example 66 with ParallelInstruction

use of com.google.api.services.dataflow.model.ParallelInstruction in project beam by apache.

the class MapTaskToNetworkFunctionTest method testPartialGroupByKey.

@Test
public void testPartialGroupByKey() {
    // Read --> PGBK --> Write
    InstructionOutput readOutput = createInstructionOutput("Read.out");
    ParallelInstruction read = createParallelInstruction("Read", readOutput);
    read.setRead(new ReadInstruction());
    PartialGroupByKeyInstruction pgbkInstruction = new PartialGroupByKeyInstruction();
    // Read.out
    pgbkInstruction.setInput(createInstructionInput(0, 0));
    InstructionOutput pgbkOutput = createInstructionOutput("PGBK.out");
    ParallelInstruction pgbk = createParallelInstruction("PGBK", pgbkOutput);
    pgbk.setPartialGroupByKey(pgbkInstruction);
    WriteInstruction writeInstruction = new WriteInstruction();
    // PGBK.out
    writeInstruction.setInput(createInstructionInput(1, 0));
    ParallelInstruction write = createParallelInstruction("Write");
    write.setWrite(writeInstruction);
    MapTask mapTask = new MapTask();
    mapTask.setInstructions(ImmutableList.of(read, pgbk, write));
    mapTask.setFactory(Transport.getJsonFactory());
    Network<Node, Edge> network = new MapTaskToNetworkFunction(IdGenerators.decrementingLongs()).apply(mapTask);
    assertNetworkProperties(network);
    assertEquals(5, network.nodes().size());
    assertEquals(4, network.edges().size());
    ParallelInstructionNode readNode = get(network, read);
    InstructionOutputNode readOutputNode = getOnlySuccessor(network, readNode);
    assertEquals(readOutput, readOutputNode.getInstructionOutput());
    ParallelInstructionNode pgbkNode = getOnlySuccessor(network, readOutputNode);
    InstructionOutputNode pgbkOutputNode = getOnlySuccessor(network, pgbkNode);
    assertEquals(pgbkOutput, pgbkOutputNode.getInstructionOutput());
    getOnlySuccessor(network, pgbkOutputNode);
    assertNotNull(write);
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) MapTask(com.google.api.services.dataflow.model.MapTask) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) WriteInstruction(com.google.api.services.dataflow.model.WriteInstruction) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) PartialGroupByKeyInstruction(com.google.api.services.dataflow.model.PartialGroupByKeyInstruction) ReadInstruction(com.google.api.services.dataflow.model.ReadInstruction) Edge(org.apache.beam.runners.dataflow.worker.graph.Edges.Edge) DefaultEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge) Test(org.junit.Test)

Example 67 with ParallelInstruction

use of com.google.api.services.dataflow.model.ParallelInstruction in project beam by apache.

the class MapTaskToNetworkFunctionTest method createParallelInstruction.

private static ParallelInstruction createParallelInstruction(String name, InstructionOutput... outputs) {
    ParallelInstruction rval = new ParallelInstruction();
    rval.setName(name);
    rval.setOutputs(Arrays.asList(outputs));
    return rval;
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction)

Example 68 with ParallelInstruction

use of com.google.api.services.dataflow.model.ParallelInstruction in project beam by apache.

the class MapTaskToNetworkFunctionTest method testWrite.

@Test
public void testWrite() {
    InstructionOutput readOutput = createInstructionOutput("Read.out");
    ParallelInstruction read = createParallelInstruction("Read", readOutput);
    read.setRead(new ReadInstruction());
    WriteInstruction writeInstruction = new WriteInstruction();
    // Read.out
    writeInstruction.setInput(createInstructionInput(0, 0));
    ParallelInstruction write = createParallelInstruction("Write");
    write.setWrite(writeInstruction);
    MapTask mapTask = new MapTask();
    mapTask.setInstructions(ImmutableList.of(read, write));
    mapTask.setFactory(Transport.getJsonFactory());
    Network<Node, Edge> network = new MapTaskToNetworkFunction(IdGenerators.decrementingLongs()).apply(mapTask);
    assertNetworkProperties(network);
    assertEquals(3, network.nodes().size());
    assertEquals(2, network.edges().size());
    ParallelInstructionNode readNode = get(network, read);
    InstructionOutputNode readOutputNode = getOnlySuccessor(network, readNode);
    assertEquals(readOutput, readOutputNode.getInstructionOutput());
    ParallelInstructionNode writeNode = getOnlySuccessor(network, readOutputNode);
    assertNotNull(writeNode);
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) MapTask(com.google.api.services.dataflow.model.MapTask) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) WriteInstruction(com.google.api.services.dataflow.model.WriteInstruction) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) ReadInstruction(com.google.api.services.dataflow.model.ReadInstruction) Edge(org.apache.beam.runners.dataflow.worker.graph.Edges.Edge) DefaultEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge) Test(org.junit.Test)

Example 69 with ParallelInstruction

use of com.google.api.services.dataflow.model.ParallelInstruction in project beam by apache.

the class ReplacePgbkWithPrecombineFunctionTest method testPrecombinePgbkIsReplaced.

@Test
public void testPrecombinePgbkIsReplaced() throws Exception {
    // Network:
    // out1 --> precombine_pgbk --> out2
    Map<String, Object> valueCombiningFn = new HashMap<>();
    Node out1 = createInstructionOutputNode("out1");
    String pgbkName = "precombine_pgbk";
    Node precombinePgbk = createPrecombinePgbkNode(pgbkName, valueCombiningFn);
    Node out2 = createInstructionOutputNode("out2");
    MutableNetwork<Node, Edge> network = createEmptyNetwork();
    network.addNode(out1);
    network.addNode(precombinePgbk);
    network.addNode(out2);
    network.addEdge(out1, precombinePgbk, DefaultEdge.create());
    network.addEdge(precombinePgbk, out2, DefaultEdge.create());
    Network<Node, Edge> inputNetwork = ImmutableNetwork.copyOf(network);
    network = new ReplacePgbkWithPrecombineFunction().apply(network);
    // Assert that network has same structure (same number of nodes and paths).
    assertEquals(inputNetwork.nodes().size(), network.nodes().size());
    assertEquals(inputNetwork.edges().size(), network.edges().size());
    List<List<Node>> oldPaths = Networks.allPathsFromRootsToLeaves(inputNetwork);
    List<List<Node>> newPaths = Networks.allPathsFromRootsToLeaves(network);
    assertEquals(oldPaths.size(), newPaths.size());
    // Assert that the pgbk node has been replaced.
    for (Node node : network.nodes()) {
        if (node instanceof ParallelInstructionNode) {
            ParallelInstructionNode createdCombineNode = (ParallelInstructionNode) node;
            ParallelInstruction parallelInstruction = createdCombineNode.getParallelInstruction();
            assertEquals(parallelInstruction.getName(), pgbkName);
            assertNull(parallelInstruction.getPartialGroupByKey());
            assertNotNull(parallelInstruction.getParDo());
            ParDoInstruction parDoInstruction = parallelInstruction.getParDo();
            assertEquals(parDoInstruction.getUserFn(), valueCombiningFn);
            break;
        }
    }
}
Also used : HashMap(java.util.HashMap) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) ParDoInstruction(com.google.api.services.dataflow.model.ParDoInstruction) List(java.util.List) Edge(org.apache.beam.runners.dataflow.worker.graph.Edges.Edge) DefaultEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge) Test(org.junit.Test)

Example 70 with ParallelInstruction

use of com.google.api.services.dataflow.model.ParallelInstruction in project beam by apache.

the class LengthPrefixUnknownCodersTest method testLengthPrefixReadInstructionCoder.

@Test
public void testLengthPrefixReadInstructionCoder() throws Exception {
    ReadInstruction readInstruction = new ReadInstruction();
    readInstruction.setSource(new Source().setCodec(CloudObjects.asCloudObject(windowedValueCoder, /*sdkComponents=*/
    null)));
    instruction.setRead(readInstruction);
    ParallelInstruction prefixedInstruction = forParallelInstruction(instruction, false);
    assertEqualsAsJson(CloudObjects.asCloudObject(prefixedWindowedValueCoder, /*sdkComponents=*/
    null), prefixedInstruction.getRead().getSource().getCodec());
    // Should not mutate the instruction.
    assertEqualsAsJson(readInstruction.getSource().getCodec(), CloudObjects.asCloudObject(windowedValueCoder, /*sdkComponents=*/
    null));
}
Also used : LengthPrefixUnknownCoders.forParallelInstruction(org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCoders.forParallelInstruction) ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) ReadInstruction(com.google.api.services.dataflow.model.ReadInstruction) Source(com.google.api.services.dataflow.model.Source) Test(org.junit.Test)

Aggregations

ParallelInstruction (com.google.api.services.dataflow.model.ParallelInstruction)73 Test (org.junit.Test)39 InstructionOutput (com.google.api.services.dataflow.model.InstructionOutput)27 ParallelInstructionNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode)26 CloudObject (org.apache.beam.runners.dataflow.util.CloudObject)24 Node (org.apache.beam.runners.dataflow.worker.graph.Nodes.Node)22 InstructionOutputNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode)21 Edge (org.apache.beam.runners.dataflow.worker.graph.Edges.Edge)20 ParDoInstruction (com.google.api.services.dataflow.model.ParDoInstruction)18 ReadInstruction (com.google.api.services.dataflow.model.ReadInstruction)17 DefaultEdge (org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge)17 MultiOutputInfoEdge (org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge)16 Structs.addString (org.apache.beam.runners.dataflow.util.Structs.addString)12 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)12 InstructionInput (com.google.api.services.dataflow.model.InstructionInput)11 MapTask (com.google.api.services.dataflow.model.MapTask)11 AtomicLong (java.util.concurrent.atomic.AtomicLong)11 DataflowCounterUpdateExtractor.splitIntToLong (org.apache.beam.runners.dataflow.worker.counters.DataflowCounterUpdateExtractor.splitIntToLong)11 WorkItemCommitRequest (org.apache.beam.runners.dataflow.worker.windmill.Windmill.WorkItemCommitRequest)11 UnsignedLong (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.UnsignedLong)11