Search in sources :

Example 1 with InstructionOutput

use of com.google.api.services.dataflow.model.InstructionOutput in project beam by apache.

the class LengthPrefixUnknownCodersTest method createInstructionOutputNode.

private static InstructionOutputNode createInstructionOutputNode(String name, Coder<?> coder) {
    InstructionOutput instructionOutput = new InstructionOutput().setName(name).setCodec(CloudObjects.asCloudObject(coder, /*sdkComponents=*/
    null));
    instructionOutput.setFactory(new JacksonFactory());
    return InstructionOutputNode.create(instructionOutput, "fakeId");
}
Also used : InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) LengthPrefixUnknownCoders.forInstructionOutput(org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCoders.forInstructionOutput) JacksonFactory(com.google.api.client.json.jackson2.JacksonFactory)

Example 2 with InstructionOutput

use of com.google.api.services.dataflow.model.InstructionOutput in project beam by apache.

the class RemoveFlattenInstructionsFunctionTest method testFlattenMultiplePCollectionsHavingMultipleConsumers.

@Test
public void testFlattenMultiplePCollectionsHavingMultipleConsumers() {
    Node a = ParallelInstructionNode.create(new ParallelInstruction().setName("A"), Nodes.ExecutionLocation.UNKNOWN);
    Node aPCollection = InstructionOutputNode.create(new InstructionOutput().setName("A.out"), PCOLLECTION_ID);
    Edge aOutput = DefaultEdge.create();
    Node b = ParallelInstructionNode.create(new ParallelInstruction().setName("B"), Nodes.ExecutionLocation.UNKNOWN);
    Edge bOutput = DefaultEdge.create();
    Node bPCollection = InstructionOutputNode.create(new InstructionOutput().setName("B.out"), PCOLLECTION_ID);
    Node flatten = ParallelInstructionNode.create(new ParallelInstruction().setName("Flatten").setFlatten(new FlattenInstruction()), Nodes.ExecutionLocation.UNKNOWN);
    Node flattenPCollection = InstructionOutputNode.create(new InstructionOutput().setName("Flatten.out"), PCOLLECTION_ID);
    Node c = ParallelInstructionNode.create(new ParallelInstruction().setName("C"), Nodes.ExecutionLocation.UNKNOWN);
    Edge cOutput = DefaultEdge.create();
    Node cPCollection = InstructionOutputNode.create(new InstructionOutput().setName("C.out"), PCOLLECTION_ID);
    Node d = ParallelInstructionNode.create(new ParallelInstruction().setName("D"), Nodes.ExecutionLocation.UNKNOWN);
    Edge dOutput = DefaultEdge.create();
    Node dPCollection = InstructionOutputNode.create(new InstructionOutput().setName("D.out"), PCOLLECTION_ID);
    // A --\
    // -> Flatten --> C
    // B --/-------------> D
    MutableNetwork<Node, Edge> network = createEmptyNetwork();
    network.addNode(a);
    network.addNode(aPCollection);
    network.addNode(b);
    network.addNode(bPCollection);
    network.addNode(flatten);
    network.addNode(flattenPCollection);
    network.addNode(c);
    network.addNode(cPCollection);
    network.addEdge(a, aPCollection, aOutput);
    network.addEdge(aPCollection, flatten, DefaultEdge.create());
    network.addEdge(b, bPCollection, bOutput);
    network.addEdge(bPCollection, flatten, DefaultEdge.create());
    network.addEdge(bPCollection, d, DefaultEdge.create());
    network.addEdge(flatten, flattenPCollection, DefaultEdge.create());
    network.addEdge(flattenPCollection, c, DefaultEdge.create());
    network.addEdge(c, cPCollection, cOutput);
    network.addEdge(d, dPCollection, dOutput);
    // A --\
    // -> C
    // B --/-> D
    assertThatFlattenIsProperlyRemoved(network);
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) Edge(org.apache.beam.runners.dataflow.worker.graph.Edges.Edge) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge) DefaultEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge) FlattenInstruction(com.google.api.services.dataflow.model.FlattenInstruction) Test(org.junit.Test)

Example 3 with InstructionOutput

use of com.google.api.services.dataflow.model.InstructionOutput in project beam by apache.

the class RemoveFlattenInstructionsFunctionTest method testRemoveFlattenOnMultiOutputInstruction.

@Test
public void testRemoveFlattenOnMultiOutputInstruction() {
    Node a = ParallelInstructionNode.create(new ParallelInstruction().setName("A"), Nodes.ExecutionLocation.UNKNOWN);
    Node aOut1PCollection = InstructionOutputNode.create(new InstructionOutput().setName("A.out1"), PCOLLECTION_ID);
    Node aOut2PCollection = InstructionOutputNode.create(new InstructionOutput().setName("A.out2"), PCOLLECTION_ID);
    Node aOut3PCollection = InstructionOutputNode.create(new InstructionOutput().setName("A.out3"), PCOLLECTION_ID);
    Edge aOut1 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out1"));
    Edge aOut2 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out2"));
    Edge aOut3 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out3"));
    Edge aOut1PCollectionEdge = DefaultEdge.create();
    Node b = ParallelInstructionNode.create(new ParallelInstruction().setName("B"), Nodes.ExecutionLocation.UNKNOWN);
    Node bOut1PCollection = InstructionOutputNode.create(new InstructionOutput().setName("B.out1"), PCOLLECTION_ID);
    Node bOut2PCollection = InstructionOutputNode.create(new InstructionOutput().setName("B.out1"), PCOLLECTION_ID);
    Edge bOut1 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out1"));
    Edge bOut2 = MultiOutputInfoEdge.create(new MultiOutputInfo().setTag("out2"));
    Edge bOut1PCollectionEdge = DefaultEdge.create();
    Node flatten = ParallelInstructionNode.create(new ParallelInstruction().setName("Flatten").setFlatten(new FlattenInstruction()), Nodes.ExecutionLocation.UNKNOWN);
    Node flattenPCollection = InstructionOutputNode.create(new InstructionOutput().setName("Flatten.out"), PCOLLECTION_ID);
    Node c = ParallelInstructionNode.create(new ParallelInstruction().setName("C"), Nodes.ExecutionLocation.UNKNOWN);
    Edge cOutput = DefaultEdge.create();
    Node cPCollection = InstructionOutputNode.create(new InstructionOutput().setName("C.out"), PCOLLECTION_ID);
    Node d = ParallelInstructionNode.create(new ParallelInstruction().setName("D"), Nodes.ExecutionLocation.UNKNOWN);
    Edge dOutput = DefaultEdge.create();
    Node dPCollection = InstructionOutputNode.create(new InstructionOutput().setName("D.out"), PCOLLECTION_ID);
    Node e = ParallelInstructionNode.create(new ParallelInstruction().setName("E"), Nodes.ExecutionLocation.UNKNOWN);
    Edge eOutput = DefaultEdge.create();
    Node ePCollection = InstructionOutputNode.create(new InstructionOutput().setName("E.out"), PCOLLECTION_ID);
    // /-out1-> C
    // A -out2-\
    // \-out3--> Flatten --> D
    // B -out2-/
    // \-out1-> E
    MutableNetwork<Node, Edge> network = createEmptyNetwork();
    network.addNode(a);
    network.addNode(aOut1PCollection);
    network.addNode(aOut2PCollection);
    network.addNode(aOut3PCollection);
    network.addNode(b);
    network.addNode(bOut1PCollection);
    network.addNode(bOut2PCollection);
    network.addNode(flatten);
    network.addNode(flattenPCollection);
    network.addNode(c);
    network.addNode(cPCollection);
    network.addNode(d);
    network.addNode(dPCollection);
    network.addNode(e);
    network.addNode(ePCollection);
    network.addEdge(a, aOut1PCollection, aOut1);
    network.addEdge(a, aOut2PCollection, aOut2);
    network.addEdge(a, aOut3PCollection, aOut3);
    network.addEdge(aOut1PCollection, c, aOut1PCollectionEdge);
    network.addEdge(aOut2PCollection, flatten, DefaultEdge.create());
    network.addEdge(aOut3PCollection, flatten, DefaultEdge.create());
    network.addEdge(b, bOut1PCollection, bOut1);
    network.addEdge(b, bOut2PCollection, bOut2);
    network.addEdge(bOut1PCollection, e, bOut1PCollectionEdge);
    network.addEdge(bOut2PCollection, flatten, DefaultEdge.create());
    network.addEdge(flatten, flattenPCollection, DefaultEdge.create());
    network.addEdge(flattenPCollection, d, DefaultEdge.create());
    network.addEdge(c, cPCollection, cOutput);
    network.addEdge(d, dPCollection, dOutput);
    network.addEdge(e, ePCollection, eOutput);
    // /-out1-> C
    // A -out2-\
    // \-out3--> D
    // B -out2-/
    // \-out1-> E
    assertThatFlattenIsProperlyRemoved(network);
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) MultiOutputInfo(com.google.api.services.dataflow.model.MultiOutputInfo) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) Edge(org.apache.beam.runners.dataflow.worker.graph.Edges.Edge) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge) DefaultEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge) FlattenInstruction(com.google.api.services.dataflow.model.FlattenInstruction) Test(org.junit.Test)

Example 4 with InstructionOutput

use of com.google.api.services.dataflow.model.InstructionOutput in project beam by apache.

the class StreamingDataflowWorkerTest method testExceptionInvalidatesCache.

@Test
public void testExceptionInvalidatesCache() throws Exception {
    // We'll need to force the system to limit bundles to one message at a time.
    // Sequence is as follows:
    // 01. GetWork[0] (token 0)
    // 02. Create counter reader
    // 03. Counter yields 0
    // 04. GetData[0] (state as null)
    // 05. Read state as null
    // 06. Set state as 42
    // 07. THROW on taking counter reader checkpoint
    // 08. Create counter reader
    // 09. Counter yields 0
    // 10. GetData[1] (state as null)
    // 11. Read state as null (*** not 42 ***)
    // 12. Take counter reader checkpoint as 0
    // 13. CommitWork[0] (message 0:0, state 42, checkpoint 0)
    // 14. GetWork[1] (token 1, checkpoint as 0)
    // 15. Counter yields 1
    // 16. Read (cached) state as 42
    // 17. Take counter reader checkpoint 1
    // 18. CommitWork[1] (message 0:1, checkpoint 1)
    // 19. GetWork[2] (token 2, checkpoint as 1)
    // 20. Counter yields 2
    // 21. THROW on processElement
    // 22. Recreate reader from checkpoint 1
    // 23. Counter yields 2 (*** not eof ***)
    // 24. GetData[2] (state as 42)
    // 25. Read state as 42
    // 26. Take counter reader checkpoint 2
    // 27. CommitWork[2] (message 0:2, checkpoint 2)
    FakeWindmillServer server = new FakeWindmillServer(errorCollector);
    server.setExpectedExceptionCount(2);
    DataflowPipelineOptions options = createTestingPipelineOptions(server);
    options.setNumWorkers(1);
    DataflowPipelineDebugOptions debugOptions = options.as(DataflowPipelineDebugOptions.class);
    debugOptions.setUnboundedReaderMaxElements(1);
    CloudObject codec = CloudObjects.asCloudObject(WindowedValue.getFullCoder(ValueWithRecordId.ValueWithRecordIdCoder.of(KvCoder.of(VarIntCoder.of(), VarIntCoder.of())), GlobalWindow.Coder.INSTANCE), /*sdkComponents=*/
    null);
    TestCountingSource counter = new TestCountingSource(3).withThrowOnFirstSnapshot(true);
    List<ParallelInstruction> instructions = Arrays.asList(new ParallelInstruction().setOriginalName("OriginalReadName").setSystemName("Read").setName(DEFAULT_PARDO_USER_NAME).setRead(new ReadInstruction().setSource(CustomSources.serializeToCloudSource(counter, options).setCodec(codec))).setOutputs(Arrays.asList(new InstructionOutput().setName("read_output").setOriginalName(DEFAULT_OUTPUT_ORIGINAL_NAME).setSystemName(DEFAULT_OUTPUT_SYSTEM_NAME).setCodec(codec))), makeDoFnInstruction(new TestExceptionInvalidatesCacheFn(), 0, StringUtf8Coder.of(), WindowingStrategy.globalDefault()), makeSinkInstruction(StringUtf8Coder.of(), 1, GlobalWindow.Coder.INSTANCE));
    StreamingDataflowWorker worker = makeWorker(instructions, options.as(StreamingDataflowWorkerOptions.class), true);
    worker.setRetryLocallyDelayMs(100);
    worker.start();
    // Three GetData requests
    for (int i = 0; i < 3; i++) {
        ByteString state;
        if (i == 0 || i == 1) {
            state = ByteString.EMPTY;
        } else {
            state = ByteString.copyFrom(new byte[] { 42 });
        }
        Windmill.GetDataResponse.Builder dataResponse = Windmill.GetDataResponse.newBuilder();
        dataResponse.addDataBuilder().setComputationId(DEFAULT_COMPUTATION_ID).addDataBuilder().setKey(ByteString.copyFromUtf8("0000000000000001")).setShardingKey(1).addValuesBuilder().setTag(ByteString.copyFromUtf8("//+uint")).setStateFamily(DEFAULT_PARDO_STATE_FAMILY).getValueBuilder().setTimestamp(0).setData(state);
        server.addDataToOffer(dataResponse.build());
    }
    // Three GetWork requests and commits
    for (int i = 0; i < 3; i++) {
        StringBuilder sb = new StringBuilder();
        sb.append("work {\n");
        sb.append("  computation_id: \"computation\"\n");
        sb.append("  input_data_watermark: 0\n");
        sb.append("  work {\n");
        sb.append("    key: \"0000000000000001\"\n");
        sb.append("    sharding_key: 1\n");
        sb.append("    work_token: ");
        sb.append(i);
        sb.append("    cache_token: 1");
        sb.append("\n");
        if (i > 0) {
            int previousCheckpoint = i - 1;
            sb.append("    source_state {\n");
            sb.append("      state: \"");
            sb.append((char) previousCheckpoint);
            sb.append("\"\n");
            // We'll elide the finalize ids since it's not necessary to trigger the finalizer
            // for this test.
            sb.append("    }\n");
        }
        sb.append("  }\n");
        sb.append("}\n");
        server.addWorkToOffer(buildInput(sb.toString(), null));
        Map<Long, Windmill.WorkItemCommitRequest> result = server.waitForAndGetCommits(1);
        Windmill.WorkItemCommitRequest commit = result.get((long) i);
        UnsignedLong finalizeId = UnsignedLong.fromLongBits(commit.getSourceStateUpdates().getFinalizeIds(0));
        sb = new StringBuilder();
        sb.append("key: \"0000000000000001\"\n");
        sb.append("sharding_key: 1\n");
        sb.append("work_token: ");
        sb.append(i);
        sb.append("\n");
        sb.append("cache_token: 1\n");
        sb.append("output_messages {\n");
        sb.append("  destination_stream_id: \"out\"\n");
        sb.append("  bundles {\n");
        sb.append("    key: \"0000000000000001\"\n");
        int messageNum = i;
        sb.append("    messages {\n");
        sb.append("      timestamp: ");
        sb.append(messageNum * 1000);
        sb.append("\n");
        sb.append("      data: \"0:");
        sb.append(messageNum);
        sb.append("\"\n");
        sb.append("    }\n");
        sb.append("    messages_ids: \"\"\n");
        sb.append("  }\n");
        sb.append("}\n");
        if (i == 0) {
            sb.append("value_updates {\n");
            sb.append("  tag: \"//+uint\"\n");
            sb.append("  value {\n");
            sb.append("    timestamp: 0\n");
            sb.append("    data: \"");
            sb.append((char) 42);
            sb.append("\"\n");
            sb.append("  }\n");
            sb.append("  state_family: \"parDoStateFamily\"\n");
            sb.append("}\n");
        }
        int sourceState = i;
        sb.append("source_state_updates {\n");
        sb.append("  state: \"");
        sb.append((char) sourceState);
        sb.append("\"\n");
        sb.append("  finalize_ids: ");
        sb.append(finalizeId);
        sb.append("}\n");
        sb.append("source_watermark: ");
        sb.append((sourceState + 1) * 1000);
        sb.append("\n");
        sb.append("source_backlog_bytes: 7\n");
        assertThat(// for the current test.
        setValuesTimestamps(commit.toBuilder().clearOutputTimers()).build(), equalTo(setMessagesMetadata(PaneInfo.NO_FIRING, CoderUtils.encodeToByteArray(CollectionCoder.of(GlobalWindow.Coder.INSTANCE), ImmutableList.of(GlobalWindow.INSTANCE)), parseCommitRequest(sb.toString())).build()));
    }
}
Also used : DataflowPipelineOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineOptions) UnsignedLong(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.UnsignedLong) ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) ReadInstruction(com.google.api.services.dataflow.model.ReadInstruction) WorkItemCommitRequest(org.apache.beam.runners.dataflow.worker.windmill.Windmill.WorkItemCommitRequest) StreamingDataflowWorkerOptions(org.apache.beam.runners.dataflow.worker.options.StreamingDataflowWorkerOptions) ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) WorkItemCommitRequest(org.apache.beam.runners.dataflow.worker.windmill.Windmill.WorkItemCommitRequest) TestCountingSource(org.apache.beam.runners.dataflow.worker.testing.TestCountingSource) AtomicLong(java.util.concurrent.atomic.AtomicLong) DataflowCounterUpdateExtractor.splitIntToLong(org.apache.beam.runners.dataflow.worker.counters.DataflowCounterUpdateExtractor.splitIntToLong) UnsignedLong(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.UnsignedLong) ComputationGetDataResponse(org.apache.beam.runners.dataflow.worker.windmill.Windmill.ComputationGetDataResponse) KeyedGetDataResponse(org.apache.beam.runners.dataflow.worker.windmill.Windmill.KeyedGetDataResponse) GetDataResponse(org.apache.beam.runners.dataflow.worker.windmill.Windmill.GetDataResponse) Windmill(org.apache.beam.runners.dataflow.worker.windmill.Windmill) DataflowPipelineDebugOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineDebugOptions) Test(org.junit.Test)

Example 5 with InstructionOutput

use of com.google.api.services.dataflow.model.InstructionOutput in project beam by apache.

the class IntrinsicMapTaskExecutorFactory method createOutputReceiversTransform.

/**
 * Returns a function which can convert {@link InstructionOutput}s into {@link OutputReceiver}s.
 */
static Function<Node, Node> createOutputReceiversTransform(final String stageName, final CounterFactory counterFactory) {
    return new TypeSafeNodeFunction<InstructionOutputNode>(InstructionOutputNode.class) {

        @Override
        public Node typedApply(InstructionOutputNode input) {
            InstructionOutput cloudOutput = input.getInstructionOutput();
            OutputReceiver outputReceiver = new OutputReceiver();
            Coder<?> coder = CloudObjects.coderFromCloudObject(CloudObject.fromSpec(cloudOutput.getCodec()));
            @SuppressWarnings("unchecked") ElementCounter outputCounter = new DataflowOutputCounter(cloudOutput.getName(), new ElementByteSizeObservableCoder<>(coder), counterFactory, NameContext.create(stageName, cloudOutput.getOriginalName(), cloudOutput.getSystemName(), cloudOutput.getName()));
            outputReceiver.addOutputCounter(outputCounter);
            return OutputReceiverNode.create(outputReceiver, coder, input.getPcollectionId());
        }
    };
}
Also used : InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) TypeSafeNodeFunction(org.apache.beam.runners.dataflow.worker.graph.Networks.TypeSafeNodeFunction) ElementCounter(org.apache.beam.runners.dataflow.worker.util.common.worker.ElementCounter)

Aggregations

InstructionOutput (com.google.api.services.dataflow.model.InstructionOutput)36 ParallelInstruction (com.google.api.services.dataflow.model.ParallelInstruction)27 Test (org.junit.Test)20 InstructionOutputNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode)19 DefaultEdge (org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge)17 Edge (org.apache.beam.runners.dataflow.worker.graph.Edges.Edge)17 Node (org.apache.beam.runners.dataflow.worker.graph.Nodes.Node)17 ParallelInstructionNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode)17 MultiOutputInfoEdge (org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge)14 ReadInstruction (com.google.api.services.dataflow.model.ReadInstruction)13 CloudObject (org.apache.beam.runners.dataflow.util.CloudObject)12 ParDoInstruction (com.google.api.services.dataflow.model.ParDoInstruction)10 InstructionInput (com.google.api.services.dataflow.model.InstructionInput)9 FlattenInstruction (com.google.api.services.dataflow.model.FlattenInstruction)8 MultiOutputInfo (com.google.api.services.dataflow.model.MultiOutputInfo)8 MapTask (com.google.api.services.dataflow.model.MapTask)7 ArrayList (java.util.ArrayList)6 SdkComponents (org.apache.beam.runners.core.construction.SdkComponents)6 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)6 HashMap (java.util.HashMap)5