Search in sources :

Example 1 with Operation

use of org.apache.beam.runners.dataflow.worker.util.common.worker.Operation in project beam by apache.

the class BeamFnMapTaskExecutorFactory method createOperationTransformForGrpcPortNodes.

private Function<Node, Node> createOperationTransformForGrpcPortNodes(final Network<Node, Edge> network, final FnDataService beamFnDataService, final OperationContext context) {
    return new TypeSafeNodeFunction<RemoteGrpcPortNode>(RemoteGrpcPortNode.class) {

        @Override
        public Node typedApply(RemoteGrpcPortNode input) {
            RegisterAndProcessBundleOperation registerFnOperation = (RegisterAndProcessBundleOperation) Iterables.getOnlyElement(Iterables.filter(network.adjacentNodes(input), OperationNode.class)).getOperation();
            // The coder comes from the one and only adjacent output node
            Coder<?> coder = Iterables.getOnlyElement(Iterables.filter(network.adjacentNodes(input), OutputReceiverNode.class)).getCoder();
            // We figure out whether we are outputting some where if the output node is a
            // successor.
            Iterable<OutputReceiverNode> outputReceiverNodes = Iterables.filter(network.successors(input), OutputReceiverNode.class);
            Operation operation;
            if (outputReceiverNodes.iterator().hasNext()) {
                OutputReceiver[] outputReceivers = new OutputReceiver[] { Iterables.getOnlyElement(outputReceiverNodes).getOutputReceiver() };
                operation = new RemoteGrpcPortReadOperation<>(beamFnDataService, input.getPrimitiveTransformId(), registerFnOperation::getProcessBundleInstructionId, (Coder) coder, outputReceivers, context);
            } else {
                operation = new RemoteGrpcPortWriteOperation<>(beamFnDataService, input.getPrimitiveTransformId(), registerFnOperation::getProcessBundleInstructionId, (Coder) coder, context);
            }
            return OperationNode.create(operation);
        }
    };
}
Also used : WindowedValueCoder(org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder) KvCoder(org.apache.beam.sdk.coders.KvCoder) Coder(org.apache.beam.sdk.coders.Coder) RemoteGrpcPortNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.RemoteGrpcPortNode) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) FlattenOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.FlattenOperation) ParDoOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation) ReadOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation) WriteOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation) RemoteGrpcPortReadOperation(org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortReadOperation) RegisterAndProcessBundleOperation(org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation) RemoteGrpcPortWriteOperation(org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation) ProcessRemoteBundleOperation(org.apache.beam.runners.dataflow.worker.fn.control.ProcessRemoteBundleOperation) Operation(org.apache.beam.runners.dataflow.worker.util.common.worker.Operation) RegisterAndProcessBundleOperation(org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation) OutputReceiverNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.OutputReceiverNode) OperationNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.OperationNode) TypeSafeNodeFunction(org.apache.beam.runners.dataflow.worker.graph.Networks.TypeSafeNodeFunction)

Example 2 with Operation

use of org.apache.beam.runners.dataflow.worker.util.common.worker.Operation in project beam by apache.

the class BeamFnMapTaskExecutorFactory method create.

/**
 * Creates a new {@link DataflowMapTaskExecutor} from the given {@link MapTask} definition using
 * the provided {@link ReaderFactory}.
 */
@Override
public DataflowMapTaskExecutor create(InstructionRequestHandler instructionRequestHandler, GrpcFnServer<GrpcDataService> grpcDataFnServer, Endpoints.ApiServiceDescriptor dataApiServiceDescriptor, GrpcFnServer<GrpcStateService> grpcStateFnServer, MutableNetwork<Node, Edge> network, PipelineOptions options, String stageName, ReaderFactory readerFactory, SinkFactory sinkFactory, DataflowExecutionContext<?> executionContext, CounterSet counterSet, IdGenerator idGenerator) {
    // TODO: remove this once we trust the code paths
    checkArgument(DataflowRunner.hasExperiment(options.as(DataflowPipelineDebugOptions.class), "beam_fn_api"), "%s should only be used when beam_fn_api is enabled", getClass().getSimpleName());
    // Swap out all the InstructionOutput nodes with OutputReceiver nodes
    Networks.replaceDirectedNetworkNodes(network, createOutputReceiversTransform(stageName, counterSet));
    if (DataflowRunner.hasExperiment(options.as(DataflowPipelineDebugOptions.class), "use_executable_stage_bundle_execution")) {
        LOG.debug("Using SingleEnvironmentInstanceJobBundleFactory");
        JobBundleFactory jobBundleFactory = SingleEnvironmentInstanceJobBundleFactory.create(StaticRemoteEnvironmentFactory.forService(instructionRequestHandler), grpcDataFnServer, grpcStateFnServer, idGenerator);
        // If the use_executable_stage_bundle_execution is enabled, use ExecutableStage instead.
        Networks.replaceDirectedNetworkNodes(network, createOperationTransformForExecutableStageNode(network, stageName, executionContext, jobBundleFactory));
    } else {
        // Swap out all the RegisterFnRequest nodes with Operation nodes
        Networks.replaceDirectedNetworkNodes(network, createOperationTransformForRegisterFnNodes(idGenerator, instructionRequestHandler, grpcStateFnServer.getService(), stageName, executionContext));
        // Swap out all the RemoteGrpcPort nodes with Operation nodes, note that it is expected
        // that the RegisterFnRequest nodes have already been replaced.
        Networks.replaceDirectedNetworkNodes(network, createOperationTransformForGrpcPortNodes(network, grpcDataFnServer.getService(), // TODO: Set NameContext properly for these operations.
        executionContext.createOperationContext(NameContext.create(stageName, stageName, stageName, stageName))));
    }
    // Swap out all the FetchAndFilterStreamingSideInput nodes with operation nodes
    Networks.replaceDirectedNetworkNodes(network, createOperationTransformForFetchAndFilterStreamingSideInputNodes(network, idGenerator, instructionRequestHandler, grpcDataFnServer.getService(), dataApiServiceDescriptor, executionContext, stageName));
    // Swap out all the ParallelInstruction nodes with Operation nodes
    Networks.replaceDirectedNetworkNodes(network, createOperationTransformForParallelInstructionNodes(stageName, network, options, readerFactory, sinkFactory, executionContext));
    // Collect all the operations within the network and attach all the operations as receivers
    // to preceding output receivers.
    List<Operation> topoSortedOperations = new ArrayList<>();
    for (OperationNode node : Iterables.filter(Networks.topologicalOrder(network), OperationNode.class)) {
        topoSortedOperations.add(node.getOperation());
        for (Node predecessor : Iterables.filter(network.predecessors(node), OutputReceiverNode.class)) {
            ((OutputReceiverNode) predecessor).getOutputReceiver().addOutput((Receiver) node.getOperation());
        }
    }
    if (LOG.isDebugEnabled()) {
        LOG.info("Map task network: {}", Networks.toDot(network));
    }
    return BeamFnMapTaskExecutor.withSharedCounterSet(topoSortedOperations, counterSet, executionContext.getExecutionStateTracker());
}
Also used : JobBundleFactory(org.apache.beam.runners.fnexecution.control.JobBundleFactory) SingleEnvironmentInstanceJobBundleFactory(org.apache.beam.runners.fnexecution.control.SingleEnvironmentInstanceJobBundleFactory) OperationNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.OperationNode) RegisterRequestNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.RegisterRequestNode) FetchAndFilterStreamingSideInputsNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.FetchAndFilterStreamingSideInputsNode) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) OperationNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.OperationNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) ExecutableStageNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ExecutableStageNode) RemoteGrpcPortNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.RemoteGrpcPortNode) OutputReceiverNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.OutputReceiverNode) ArrayList(java.util.ArrayList) DataflowPipelineDebugOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineDebugOptions) FlattenOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.FlattenOperation) ParDoOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation) ReadOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation) WriteOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation) RemoteGrpcPortReadOperation(org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortReadOperation) RegisterAndProcessBundleOperation(org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation) RemoteGrpcPortWriteOperation(org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation) ProcessRemoteBundleOperation(org.apache.beam.runners.dataflow.worker.fn.control.ProcessRemoteBundleOperation) Operation(org.apache.beam.runners.dataflow.worker.util.common.worker.Operation)

Example 3 with Operation

use of org.apache.beam.runners.dataflow.worker.util.common.worker.Operation in project beam by apache.

the class IntrinsicMapTaskExecutorTest method testExceptionInFinishAbortsAllOperations.

@Test
public void testExceptionInFinishAbortsAllOperations() throws Exception {
    Operation o1 = Mockito.mock(Operation.class);
    Operation o2 = Mockito.mock(Operation.class);
    Operation o3 = Mockito.mock(Operation.class);
    Mockito.doThrow(new Exception("in finish")).when(o2).finish();
    ExecutionStateTracker stateTracker = ExecutionStateTracker.newForTest();
    try (IntrinsicMapTaskExecutor executor = IntrinsicMapTaskExecutor.withSharedCounterSet(Arrays.<Operation>asList(o1, o2, o3), counterSet, stateTracker)) {
        executor.execute();
        fail("Should have thrown");
    } catch (Exception e) {
        InOrder inOrder = Mockito.inOrder(o1, o2, o3);
        inOrder.verify(o3).start();
        inOrder.verify(o2).start();
        inOrder.verify(o1).start();
        inOrder.verify(o1).finish();
        inOrder.verify(o2).finish();
        // Order of abort doesn't matter
        Mockito.verify(o1).abort();
        Mockito.verify(o2).abort();
        Mockito.verify(o3).abort();
        Mockito.verifyNoMoreInteractions(o1, o2, o3);
    }
}
Also used : InOrder(org.mockito.InOrder) ExecutionStateTracker(org.apache.beam.runners.core.metrics.ExecutionStateTracker) DataflowExecutionStateTracker(org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowExecutionStateTracker) ParDoOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation) ReadOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation) Operation(org.apache.beam.runners.dataflow.worker.util.common.worker.Operation) ExpectedException(org.junit.rules.ExpectedException) Test(org.junit.Test)

Example 4 with Operation

use of org.apache.beam.runners.dataflow.worker.util.common.worker.Operation in project beam by apache.

the class IntrinsicMapTaskExecutorTest method testValidOperations.

@Test
public void testValidOperations() throws Exception {
    TestOutputReceiver receiver = new TestOutputReceiver(counterSet, NameContextsForTests.nameContextForTest());
    List<Operation> operations = Arrays.<Operation>asList(new TestReadOperation(receiver, createContext("ReadOperation")));
    ExecutionStateTracker stateTracker = ExecutionStateTracker.newForTest();
    try (IntrinsicMapTaskExecutor executor = IntrinsicMapTaskExecutor.withSharedCounterSet(operations, counterSet, stateTracker)) {
        Assert.assertEquals(operations.get(0), executor.getReadOperation());
    }
}
Also used : ExecutionStateTracker(org.apache.beam.runners.core.metrics.ExecutionStateTracker) DataflowExecutionStateTracker(org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowExecutionStateTracker) ParDoOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation) ReadOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation) Operation(org.apache.beam.runners.dataflow.worker.util.common.worker.Operation) TestOutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver) Test(org.junit.Test)

Example 5 with Operation

use of org.apache.beam.runners.dataflow.worker.util.common.worker.Operation in project beam by apache.

the class IntrinsicMapTaskExecutorTest method testGetMetricContainers.

@Test
@SuppressWarnings("unchecked")
public /**
 * This test makes sure that any metrics reported within an operation are part of the metric
 * containers returned by {@link getMetricContainers}.
 */
void testGetMetricContainers() throws Exception {
    ExecutionStateTracker stateTracker = new DataflowExecutionStateTracker(ExecutionStateSampler.newForTest(), new TestDataflowExecutionState(NameContext.forStage("testStage"), "other", null, /* requestingStepName */
    null, /* sideInputIndex */
    null, /* metricsContainer */
    NoopProfileScope.NOOP), new CounterSet(), PipelineOptionsFactory.create(), "test-work-item-id");
    final String o1 = "o1";
    TestOperationContext context1 = createContext(o1, stateTracker);
    final String o2 = "o2";
    TestOperationContext context2 = createContext(o2, stateTracker);
    final String o3 = "o3";
    TestOperationContext context3 = createContext(o3, stateTracker);
    List<Operation> operations = Arrays.asList(new Operation(new OutputReceiver[] {}, context1) {

        @Override
        public void start() throws Exception {
            super.start();
            try (Closeable scope = context.enterStart()) {
                Metrics.counter("TestMetric", "MetricCounter").inc(1L);
            }
        }
    }, new Operation(new OutputReceiver[] {}, context2) {

        @Override
        public void start() throws Exception {
            super.start();
            try (Closeable scope = context.enterStart()) {
                Metrics.counter("TestMetric", "MetricCounter").inc(2L);
            }
        }
    }, new Operation(new OutputReceiver[] {}, context3) {

        @Override
        public void start() throws Exception {
            super.start();
            try (Closeable scope = context.enterStart()) {
                Metrics.counter("TestMetric", "MetricCounter").inc(3L);
            }
        }
    });
    try (IntrinsicMapTaskExecutor executor = IntrinsicMapTaskExecutor.withSharedCounterSet(operations, counterSet, stateTracker)) {
        // Call execute so that we run all the counters
        executor.execute();
        assertThat(context1.metricsContainer().getUpdates().counterUpdates(), contains(metricUpdate("TestMetric", "MetricCounter", o1, 1L)));
        assertThat(context2.metricsContainer().getUpdates().counterUpdates(), contains(metricUpdate("TestMetric", "MetricCounter", o2, 2L)));
        assertThat(context3.metricsContainer().getUpdates().counterUpdates(), contains(metricUpdate("TestMetric", "MetricCounter", o3, 3L)));
    }
}
Also used : Closeable(java.io.Closeable) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) TestOutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver) TestDataflowExecutionState(org.apache.beam.runners.dataflow.worker.TestOperationContext.TestDataflowExecutionState) ParDoOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation) ReadOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation) Operation(org.apache.beam.runners.dataflow.worker.util.common.worker.Operation) ExpectedException(org.junit.rules.ExpectedException) CounterSet(org.apache.beam.runners.dataflow.worker.counters.CounterSet) ExecutionStateTracker(org.apache.beam.runners.core.metrics.ExecutionStateTracker) DataflowExecutionStateTracker(org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowExecutionStateTracker) DataflowExecutionStateTracker(org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowExecutionStateTracker) Test(org.junit.Test)

Aggregations

Operation (org.apache.beam.runners.dataflow.worker.util.common.worker.Operation)15 ParDoOperation (org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation)14 ReadOperation (org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation)14 Test (org.junit.Test)12 DataflowExecutionStateTracker (org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowExecutionStateTracker)11 ExecutionStateTracker (org.apache.beam.runners.core.metrics.ExecutionStateTracker)10 TestOutputReceiver (org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver)4 ExpectedException (org.junit.rules.ExpectedException)4 InOrder (org.mockito.InOrder)4 DataflowPipelineDebugOptions (org.apache.beam.runners.dataflow.options.DataflowPipelineDebugOptions)3 OperationNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.OperationNode)3 OutputReceiverNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.OutputReceiverNode)3 FlattenOperation (org.apache.beam.runners.dataflow.worker.util.common.worker.FlattenOperation)3 OutputReceiver (org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver)3 WriteOperation (org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation)3 ArrayList (java.util.ArrayList)2 TestDataflowExecutionState (org.apache.beam.runners.dataflow.worker.TestOperationContext.TestDataflowExecutionState)2 CounterSet (org.apache.beam.runners.dataflow.worker.counters.CounterSet)2 ProcessRemoteBundleOperation (org.apache.beam.runners.dataflow.worker.fn.control.ProcessRemoteBundleOperation)2 RegisterAndProcessBundleOperation (org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation)2