Search in sources :

Example 6 with OutputReceiver

use of org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver in project beam by apache.

the class UserParDoFnFactoryTest method testFactoryDoesNotReuseAfterAborted.

@Test
public void testFactoryDoesNotReuseAfterAborted() throws Exception {
    PipelineOptions options = PipelineOptionsFactory.create();
    CounterSet counters = new CounterSet();
    TestDoFn initialFn = new TestDoFn(Collections.<TupleTag<String>>emptyList());
    CloudObject cloudObject = getCloudObject(initialFn);
    ParDoFn parDoFn = factory.create(options, cloudObject, null, MAIN_OUTPUT, ImmutableMap.<TupleTag<?>, Integer>of(MAIN_OUTPUT, 0), BatchModeExecutionContext.forTesting(options, "testStage"), TestOperationContext.create(counters));
    Receiver rcvr = new OutputReceiver();
    parDoFn.startBundle(rcvr);
    parDoFn.processElement(WindowedValue.valueInGlobalWindow("foo"));
    TestDoFn fn = (TestDoFn) ((SimpleParDoFn) parDoFn).getDoFnInfo().getDoFn();
    parDoFn.abort();
    assertThat(fn.state, equalTo(TestDoFn.State.TORN_DOWN));
    // The fn should not be torn down here
    ParDoFn secondParDoFn = factory.create(options, cloudObject.clone(), null, MAIN_OUTPUT, ImmutableMap.<TupleTag<?>, Integer>of(MAIN_OUTPUT, 0), BatchModeExecutionContext.forTesting(options, "testStage"), TestOperationContext.create(counters));
    secondParDoFn.startBundle(rcvr);
    secondParDoFn.processElement(WindowedValue.valueInGlobalWindow("foo"));
    TestDoFn secondFn = (TestDoFn) ((SimpleParDoFn) secondParDoFn).getDoFnInfo().getDoFn();
    assertThat(secondFn, not(theInstance(fn)));
    assertThat(fn.state, equalTo(TestDoFn.State.TORN_DOWN));
    assertThat(secondFn.state, equalTo(TestDoFn.State.PROCESSING));
}
Also used : CounterSet(org.apache.beam.runners.dataflow.worker.counters.CounterSet) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) Receiver(org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Example 7 with OutputReceiver

use of org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver in project beam by apache.

the class UserParDoFnFactoryTest method testCleanupRegistered.

@Test
public void testCleanupRegistered() throws Exception {
    PipelineOptions options = PipelineOptionsFactory.create();
    CounterSet counters = new CounterSet();
    DoFn<?, ?> initialFn = new TestStatefulDoFn();
    CloudObject cloudObject = getCloudObject(initialFn, WindowingStrategy.globalDefault().withWindowFn(FixedWindows.of(Duration.millis(10))));
    TimerInternals timerInternals = mock(TimerInternals.class);
    DataflowStepContext stepContext = mock(DataflowStepContext.class);
    when(stepContext.timerInternals()).thenReturn(timerInternals);
    DataflowExecutionContext<DataflowStepContext> executionContext = mock(DataflowExecutionContext.class);
    TestOperationContext operationContext = TestOperationContext.create(counters);
    when(executionContext.getStepContext(operationContext)).thenReturn(stepContext);
    when(executionContext.getSideInputReader(any(), any(), any())).thenReturn(NullSideInputReader.empty());
    ParDoFn parDoFn = factory.create(options, cloudObject, Collections.emptyList(), MAIN_OUTPUT, ImmutableMap.of(MAIN_OUTPUT, 0), executionContext, operationContext);
    Receiver rcvr = new OutputReceiver();
    parDoFn.startBundle(rcvr);
    IntervalWindow firstWindow = new IntervalWindow(new Instant(0), new Instant(10));
    parDoFn.processElement(WindowedValue.of("foo", new Instant(1), firstWindow, PaneInfo.NO_FIRING));
    verify(stepContext).setStateCleanupTimer(SimpleParDoFn.CLEANUP_TIMER_ID, firstWindow, IntervalWindow.getCoder(), firstWindow.maxTimestamp().plus(Duration.millis(1L)), firstWindow.maxTimestamp().plus(Duration.millis(1L)));
}
Also used : Instant(org.joda.time.Instant) Receiver(org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) DataflowStepContext(org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowStepContext) TimerInternals(org.apache.beam.runners.core.TimerInternals) CounterSet(org.apache.beam.runners.dataflow.worker.counters.CounterSet) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) IntervalWindow(org.apache.beam.sdk.transforms.windowing.IntervalWindow) Test(org.junit.Test)

Example 8 with OutputReceiver

use of org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver in project beam by apache.

the class UserParDoFnFactoryTest method testFactoryReuseInStep.

@Test
public void testFactoryReuseInStep() throws Exception {
    PipelineOptions options = PipelineOptionsFactory.create();
    CounterSet counters = new CounterSet();
    TestDoFn initialFn = new TestDoFn(Collections.<TupleTag<String>>emptyList());
    CloudObject cloudObject = getCloudObject(initialFn);
    TestOperationContext operationContext = TestOperationContext.create(counters);
    ParDoFn parDoFn = factory.create(options, cloudObject, null, MAIN_OUTPUT, ImmutableMap.<TupleTag<?>, Integer>of(MAIN_OUTPUT, 0), BatchModeExecutionContext.forTesting(options, "testStage"), operationContext);
    Receiver rcvr = new OutputReceiver();
    parDoFn.startBundle(rcvr);
    parDoFn.processElement(WindowedValue.valueInGlobalWindow("foo"));
    TestDoFn fn = (TestDoFn) ((SimpleParDoFn) parDoFn).getDoFnInfo().getDoFn();
    assertThat(fn, not(theInstance(initialFn)));
    parDoFn.finishBundle();
    assertThat(fn.state, equalTo(TestDoFn.State.FINISHED));
    // The fn should be reused for the second call to create
    ParDoFn secondParDoFn = factory.create(options, cloudObject, null, MAIN_OUTPUT, ImmutableMap.<TupleTag<?>, Integer>of(MAIN_OUTPUT, 0), BatchModeExecutionContext.forTesting(options, "testStage"), operationContext);
    // The fn should still be finished from the last call; it should not be set up again
    assertThat(fn.state, equalTo(TestDoFn.State.FINISHED));
    secondParDoFn.startBundle(rcvr);
    secondParDoFn.processElement(WindowedValue.valueInGlobalWindow("spam"));
    TestDoFn reobtainedFn = (TestDoFn) ((SimpleParDoFn) secondParDoFn).getDoFnInfo().getDoFn();
    secondParDoFn.finishBundle();
    assertThat(reobtainedFn.state, equalTo(TestDoFn.State.FINISHED));
    assertThat(fn, theInstance(reobtainedFn));
}
Also used : CounterSet(org.apache.beam.runners.dataflow.worker.counters.CounterSet) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) Receiver(org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Example 9 with OutputReceiver

use of org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver in project beam by apache.

the class IntrinsicMapTaskExecutorTest method testGetMetricContainers.

@Test
@SuppressWarnings("unchecked")
public /**
 * This test makes sure that any metrics reported within an operation are part of the metric
 * containers returned by {@link getMetricContainers}.
 */
void testGetMetricContainers() throws Exception {
    ExecutionStateTracker stateTracker = new DataflowExecutionStateTracker(ExecutionStateSampler.newForTest(), new TestDataflowExecutionState(NameContext.forStage("testStage"), "other", null, /* requestingStepName */
    null, /* sideInputIndex */
    null, /* metricsContainer */
    NoopProfileScope.NOOP), new CounterSet(), PipelineOptionsFactory.create(), "test-work-item-id");
    final String o1 = "o1";
    TestOperationContext context1 = createContext(o1, stateTracker);
    final String o2 = "o2";
    TestOperationContext context2 = createContext(o2, stateTracker);
    final String o3 = "o3";
    TestOperationContext context3 = createContext(o3, stateTracker);
    List<Operation> operations = Arrays.asList(new Operation(new OutputReceiver[] {}, context1) {

        @Override
        public void start() throws Exception {
            super.start();
            try (Closeable scope = context.enterStart()) {
                Metrics.counter("TestMetric", "MetricCounter").inc(1L);
            }
        }
    }, new Operation(new OutputReceiver[] {}, context2) {

        @Override
        public void start() throws Exception {
            super.start();
            try (Closeable scope = context.enterStart()) {
                Metrics.counter("TestMetric", "MetricCounter").inc(2L);
            }
        }
    }, new Operation(new OutputReceiver[] {}, context3) {

        @Override
        public void start() throws Exception {
            super.start();
            try (Closeable scope = context.enterStart()) {
                Metrics.counter("TestMetric", "MetricCounter").inc(3L);
            }
        }
    });
    try (IntrinsicMapTaskExecutor executor = IntrinsicMapTaskExecutor.withSharedCounterSet(operations, counterSet, stateTracker)) {
        // Call execute so that we run all the counters
        executor.execute();
        assertThat(context1.metricsContainer().getUpdates().counterUpdates(), contains(metricUpdate("TestMetric", "MetricCounter", o1, 1L)));
        assertThat(context2.metricsContainer().getUpdates().counterUpdates(), contains(metricUpdate("TestMetric", "MetricCounter", o2, 2L)));
        assertThat(context3.metricsContainer().getUpdates().counterUpdates(), contains(metricUpdate("TestMetric", "MetricCounter", o3, 3L)));
    }
}
Also used : Closeable(java.io.Closeable) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) TestOutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver) TestDataflowExecutionState(org.apache.beam.runners.dataflow.worker.TestOperationContext.TestDataflowExecutionState) ParDoOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation) ReadOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation) Operation(org.apache.beam.runners.dataflow.worker.util.common.worker.Operation) ExpectedException(org.junit.rules.ExpectedException) CounterSet(org.apache.beam.runners.dataflow.worker.counters.CounterSet) ExecutionStateTracker(org.apache.beam.runners.core.metrics.ExecutionStateTracker) DataflowExecutionStateTracker(org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowExecutionStateTracker) DataflowExecutionStateTracker(org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowExecutionStateTracker) Test(org.junit.Test)

Example 10 with OutputReceiver

use of org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver in project beam by apache.

the class IntrinsicMapTaskExecutorFactory method createOutputReceiversTransform.

/**
 * Returns a function which can convert {@link InstructionOutput}s into {@link OutputReceiver}s.
 */
static Function<Node, Node> createOutputReceiversTransform(final String stageName, final CounterFactory counterFactory) {
    return new TypeSafeNodeFunction<InstructionOutputNode>(InstructionOutputNode.class) {

        @Override
        public Node typedApply(InstructionOutputNode input) {
            InstructionOutput cloudOutput = input.getInstructionOutput();
            OutputReceiver outputReceiver = new OutputReceiver();
            Coder<?> coder = CloudObjects.coderFromCloudObject(CloudObject.fromSpec(cloudOutput.getCodec()));
            @SuppressWarnings("unchecked") ElementCounter outputCounter = new DataflowOutputCounter(cloudOutput.getName(), new ElementByteSizeObservableCoder<>(coder), counterFactory, NameContext.create(stageName, cloudOutput.getOriginalName(), cloudOutput.getSystemName(), cloudOutput.getName()));
            outputReceiver.addOutputCounter(outputCounter);
            return OutputReceiverNode.create(outputReceiver, coder, input.getPcollectionId());
        }
    };
}
Also used : InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) InstructionOutput(com.google.api.services.dataflow.model.InstructionOutput) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) TypeSafeNodeFunction(org.apache.beam.runners.dataflow.worker.graph.Networks.TypeSafeNodeFunction) ElementCounter(org.apache.beam.runners.dataflow.worker.util.common.worker.ElementCounter)

Aggregations

OutputReceiver (org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver)22 CloudObject (org.apache.beam.runners.dataflow.util.CloudObject)11 ParDoFn (org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn)10 Test (org.junit.Test)9 ParDoOperation (org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation)7 ParallelInstruction (com.google.api.services.dataflow.model.ParallelInstruction)6 CounterSet (org.apache.beam.runners.dataflow.worker.counters.CounterSet)6 Receiver (org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver)6 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)6 TypeSafeNodeFunction (org.apache.beam.runners.dataflow.worker.graph.Networks.TypeSafeNodeFunction)5 OutputReceiverNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.OutputReceiverNode)5 InstructionOutputNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode)4 KvCoder (org.apache.beam.sdk.coders.KvCoder)4 WindowedValueCoder (org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder)4 OperationNode (org.apache.beam.runners.dataflow.worker.graph.Nodes.OperationNode)3 ElementCounter (org.apache.beam.runners.dataflow.worker.util.common.worker.ElementCounter)3 Operation (org.apache.beam.runners.dataflow.worker.util.common.worker.Operation)3 ReadOperation (org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation)3 InstructionOutput (com.google.api.services.dataflow.model.InstructionOutput)2 ParDoInstruction (com.google.api.services.dataflow.model.ParDoInstruction)2