Search in sources :

Example 1 with ParallelInstruction

use of com.google.api.services.dataflow.model.ParallelInstruction in project beam by apache.

the class BatchDataflowWorkerTest method testWhenNoWorkIsReturnedThatWeImmediatelyRetry.

@Test
public void testWhenNoWorkIsReturnedThatWeImmediatelyRetry() throws Exception {
    final String workItemId = "14";
    BatchDataflowWorker worker = new BatchDataflowWorker(null, /* pipeline */
    SdkHarnessRegistries.emptySdkHarnessRegistry(), mockWorkUnitClient, IntrinsicMapTaskExecutorFactory.defaultFactory(), options);
    WorkItem workItem = new WorkItem();
    workItem.setId(Long.parseLong(workItemId));
    workItem.setJobId("SuccessfulEmptyMapTask");
    workItem.setInitialReportIndex(12L);
    workItem.setMapTask(new MapTask().setInstructions(new ArrayList<ParallelInstruction>()).setStageName("testStage"));
    workItem.setLeaseExpireTime(TimeUtil.toCloudTime(Instant.now()));
    workItem.setReportStatusInterval(TimeUtil.toCloudDuration(Duration.standardMinutes(1)));
    when(mockWorkUnitClient.getWorkItem()).thenReturn(Optional.<WorkItem>absent()).thenReturn(Optional.of(workItem));
    assertTrue(worker.getAndPerformWork());
    verify(mockWorkUnitClient).reportWorkItemStatus(MockitoHamcrest.argThat(new TypeSafeMatcher<WorkItemStatus>() {

        @Override
        public void describeTo(Description description) {
        }

        @Override
        protected boolean matchesSafely(WorkItemStatus item) {
            assertTrue(item.getCompleted());
            assertEquals(workItemId, item.getWorkItemId());
            return true;
        }
    }));
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) TypeSafeMatcher(org.hamcrest.TypeSafeMatcher) Description(org.hamcrest.Description) WorkItemStatus(com.google.api.services.dataflow.model.WorkItemStatus) MapTask(com.google.api.services.dataflow.model.MapTask) WorkItem(com.google.api.services.dataflow.model.WorkItem) Test(org.junit.Test)

Example 2 with ParallelInstruction

use of com.google.api.services.dataflow.model.ParallelInstruction in project beam by apache.

the class BeamFnMapTaskExecutorFactory method createWriteOperation.

OperationNode createWriteOperation(ParallelInstructionNode node, PipelineOptions options, SinkFactory sinkFactory, DataflowExecutionContext executionContext, DataflowOperationContext context) throws Exception {
    ParallelInstruction instruction = node.getParallelInstruction();
    WriteInstruction write = instruction.getWrite();
    Coder<?> coder = CloudObjects.coderFromCloudObject(CloudObject.fromSpec(write.getSink().getCodec()));
    CloudObject cloudSink = CloudObject.fromSpec(write.getSink().getSpec());
    Sink<?> sink = sinkFactory.create(cloudSink, coder, options, executionContext, context);
    return OperationNode.create(WriteOperation.create(sink, EMPTY_OUTPUT_RECEIVER_ARRAY, context));
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) WriteInstruction(com.google.api.services.dataflow.model.WriteInstruction)

Example 3 with ParallelInstruction

use of com.google.api.services.dataflow.model.ParallelInstruction in project beam by apache.

the class BeamFnMapTaskExecutorFactory method createOperationTransformForParallelInstructionNodes.

/**
 * Creates an {@link Operation} from the given {@link ParallelInstruction} definition using the
 * provided {@link ReaderFactory}.
 */
Function<Node, Node> createOperationTransformForParallelInstructionNodes(final String stageName, final Network<Node, Edge> network, final PipelineOptions options, final ReaderFactory readerFactory, final SinkFactory sinkFactory, final DataflowExecutionContext<?> executionContext) {
    return new TypeSafeNodeFunction<ParallelInstructionNode>(ParallelInstructionNode.class) {

        @Override
        public Node typedApply(ParallelInstructionNode node) {
            ParallelInstruction instruction = node.getParallelInstruction();
            NameContext nameContext = NameContext.create(stageName, instruction.getOriginalName(), instruction.getSystemName(), instruction.getName());
            try {
                DataflowOperationContext context = executionContext.createOperationContext(nameContext);
                if (instruction.getRead() != null) {
                    return createReadOperation(network, node, options, readerFactory, executionContext, context);
                } else if (instruction.getWrite() != null) {
                    return createWriteOperation(node, options, sinkFactory, executionContext, context);
                } else if (instruction.getParDo() != null) {
                    return createParDoOperation(network, node, options, executionContext, context);
                } else if (instruction.getPartialGroupByKey() != null) {
                    return createPartialGroupByKeyOperation(network, node, options, executionContext, context);
                } else if (instruction.getFlatten() != null) {
                    return createFlattenOperation(network, node, context);
                } else {
                    throw new IllegalArgumentException(String.format("Unexpected instruction: %s", instruction));
                }
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    };
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) NameContext(org.apache.beam.runners.dataflow.worker.counters.NameContext) TypeSafeNodeFunction(org.apache.beam.runners.dataflow.worker.graph.Networks.TypeSafeNodeFunction)

Example 4 with ParallelInstruction

use of com.google.api.services.dataflow.model.ParallelInstruction in project beam by apache.

the class BeamFnMapTaskExecutorFactory method createReadOperation.

OperationNode createReadOperation(Network<Node, Edge> network, ParallelInstructionNode node, PipelineOptions options, ReaderFactory readerFactory, DataflowExecutionContext<?> executionContext, DataflowOperationContext operationContext) throws Exception {
    ParallelInstruction instruction = node.getParallelInstruction();
    ReadInstruction read = instruction.getRead();
    Source cloudSource = CloudSourceUtils.flattenBaseSpecs(read.getSource());
    CloudObject sourceSpec = CloudObject.fromSpec(cloudSource.getSpec());
    Coder<?> coder = CloudObjects.coderFromCloudObject(CloudObject.fromSpec(cloudSource.getCodec()));
    NativeReader<?> reader = readerFactory.create(sourceSpec, coder, options, executionContext, operationContext);
    OutputReceiver[] receivers = getOutputReceivers(network, node);
    return OperationNode.create(ReadOperation.create(reader, receivers, operationContext));
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) ReadInstruction(com.google.api.services.dataflow.model.ReadInstruction) Source(com.google.api.services.dataflow.model.Source)

Example 5 with ParallelInstruction

use of com.google.api.services.dataflow.model.ParallelInstruction in project beam by apache.

the class BeamFnMapTaskExecutorFactory method createParDoOperation.

private OperationNode createParDoOperation(Network<Node, Edge> network, ParallelInstructionNode node, PipelineOptions options, DataflowExecutionContext<?> executionContext, DataflowOperationContext operationContext) throws Exception {
    ParallelInstruction instruction = node.getParallelInstruction();
    ParDoInstruction parDo = instruction.getParDo();
    TupleTag<?> mainOutputTag = tupleTag(parDo.getMultiOutputInfos().get(0));
    ImmutableMap.Builder<TupleTag<?>, Integer> outputTagsToReceiverIndicesBuilder = ImmutableMap.builder();
    int successorOffset = 0;
    for (Node successor : network.successors(node)) {
        for (Edge edge : network.edgesConnecting(node, successor)) {
            outputTagsToReceiverIndicesBuilder.put(tupleTag(((MultiOutputInfoEdge) edge).getMultiOutputInfo()), successorOffset);
        }
        successorOffset += 1;
    }
    ParDoFn fn = parDoFnFactory.create(options, CloudObject.fromSpec(parDo.getUserFn()), parDo.getSideInputs(), mainOutputTag, outputTagsToReceiverIndicesBuilder.build(), executionContext, operationContext);
    OutputReceiver[] receivers = getOutputReceivers(network, node);
    return OperationNode.create(new ParDoOperation(fn, receivers, operationContext));
}
Also used : RegisterRequestNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.RegisterRequestNode) FetchAndFilterStreamingSideInputsNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.FetchAndFilterStreamingSideInputsNode) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) OperationNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.OperationNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) ExecutableStageNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ExecutableStageNode) RemoteGrpcPortNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.RemoteGrpcPortNode) OutputReceiverNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.OutputReceiverNode) TupleTag(org.apache.beam.sdk.values.TupleTag) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) ParDoOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation) ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) ParDoInstruction(com.google.api.services.dataflow.model.ParDoInstruction) Edge(org.apache.beam.runners.dataflow.worker.graph.Edges.Edge) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge)

Aggregations

ParallelInstruction (com.google.api.services.dataflow.model.ParallelInstruction)73 Test (org.junit.Test)39 InstructionOutput (com.google.api.services.dataflow.model.InstructionOutput)27 ParallelInstructionNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode)26 CloudObject (org.apache.beam.runners.dataflow.util.CloudObject)24 Node (org.apache.beam.runners.dataflow.worker.graph.Nodes.Node)22 InstructionOutputNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode)21 Edge (org.apache.beam.runners.dataflow.worker.graph.Edges.Edge)20 ParDoInstruction (com.google.api.services.dataflow.model.ParDoInstruction)18 ReadInstruction (com.google.api.services.dataflow.model.ReadInstruction)17 DefaultEdge (org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge)17 MultiOutputInfoEdge (org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge)16 Structs.addString (org.apache.beam.runners.dataflow.util.Structs.addString)12 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)12 InstructionInput (com.google.api.services.dataflow.model.InstructionInput)11 MapTask (com.google.api.services.dataflow.model.MapTask)11 AtomicLong (java.util.concurrent.atomic.AtomicLong)11 DataflowCounterUpdateExtractor.splitIntToLong (org.apache.beam.runners.dataflow.worker.counters.DataflowCounterUpdateExtractor.splitIntToLong)11 WorkItemCommitRequest (org.apache.beam.runners.dataflow.worker.windmill.Windmill.WorkItemCommitRequest)11 UnsignedLong (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.primitives.UnsignedLong)11