Search in sources :

Example 6 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class UserParDoFnFactoryTest method testCleanupRegistered.

@Test
public void testCleanupRegistered() throws Exception {
    PipelineOptions options = PipelineOptionsFactory.create();
    CounterSet counters = new CounterSet();
    DoFn<?, ?> initialFn = new TestStatefulDoFn();
    CloudObject cloudObject = getCloudObject(initialFn, WindowingStrategy.globalDefault().withWindowFn(FixedWindows.of(Duration.millis(10))));
    TimerInternals timerInternals = mock(TimerInternals.class);
    DataflowStepContext stepContext = mock(DataflowStepContext.class);
    when(stepContext.timerInternals()).thenReturn(timerInternals);
    DataflowExecutionContext<DataflowStepContext> executionContext = mock(DataflowExecutionContext.class);
    TestOperationContext operationContext = TestOperationContext.create(counters);
    when(executionContext.getStepContext(operationContext)).thenReturn(stepContext);
    when(executionContext.getSideInputReader(any(), any(), any())).thenReturn(NullSideInputReader.empty());
    ParDoFn parDoFn = factory.create(options, cloudObject, Collections.emptyList(), MAIN_OUTPUT, ImmutableMap.of(MAIN_OUTPUT, 0), executionContext, operationContext);
    Receiver rcvr = new OutputReceiver();
    parDoFn.startBundle(rcvr);
    IntervalWindow firstWindow = new IntervalWindow(new Instant(0), new Instant(10));
    parDoFn.processElement(WindowedValue.of("foo", new Instant(1), firstWindow, PaneInfo.NO_FIRING));
    verify(stepContext).setStateCleanupTimer(SimpleParDoFn.CLEANUP_TIMER_ID, firstWindow, IntervalWindow.getCoder(), firstWindow.maxTimestamp().plus(Duration.millis(1L)), firstWindow.maxTimestamp().plus(Duration.millis(1L)));
}
Also used : Instant(org.joda.time.Instant) Receiver(org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) DataflowStepContext(org.apache.beam.runners.dataflow.worker.DataflowExecutionContext.DataflowStepContext) TimerInternals(org.apache.beam.runners.core.TimerInternals) CounterSet(org.apache.beam.runners.dataflow.worker.counters.CounterSet) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) IntervalWindow(org.apache.beam.sdk.transforms.windowing.IntervalWindow) Test(org.junit.Test)

Example 7 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class UserParDoFnFactoryTest method testFactoryReuseInStep.

@Test
public void testFactoryReuseInStep() throws Exception {
    PipelineOptions options = PipelineOptionsFactory.create();
    CounterSet counters = new CounterSet();
    TestDoFn initialFn = new TestDoFn(Collections.<TupleTag<String>>emptyList());
    CloudObject cloudObject = getCloudObject(initialFn);
    TestOperationContext operationContext = TestOperationContext.create(counters);
    ParDoFn parDoFn = factory.create(options, cloudObject, null, MAIN_OUTPUT, ImmutableMap.<TupleTag<?>, Integer>of(MAIN_OUTPUT, 0), BatchModeExecutionContext.forTesting(options, "testStage"), operationContext);
    Receiver rcvr = new OutputReceiver();
    parDoFn.startBundle(rcvr);
    parDoFn.processElement(WindowedValue.valueInGlobalWindow("foo"));
    TestDoFn fn = (TestDoFn) ((SimpleParDoFn) parDoFn).getDoFnInfo().getDoFn();
    assertThat(fn, not(theInstance(initialFn)));
    parDoFn.finishBundle();
    assertThat(fn.state, equalTo(TestDoFn.State.FINISHED));
    // The fn should be reused for the second call to create
    ParDoFn secondParDoFn = factory.create(options, cloudObject, null, MAIN_OUTPUT, ImmutableMap.<TupleTag<?>, Integer>of(MAIN_OUTPUT, 0), BatchModeExecutionContext.forTesting(options, "testStage"), operationContext);
    // The fn should still be finished from the last call; it should not be set up again
    assertThat(fn.state, equalTo(TestDoFn.State.FINISHED));
    secondParDoFn.startBundle(rcvr);
    secondParDoFn.processElement(WindowedValue.valueInGlobalWindow("spam"));
    TestDoFn reobtainedFn = (TestDoFn) ((SimpleParDoFn) secondParDoFn).getDoFnInfo().getDoFn();
    secondParDoFn.finishBundle();
    assertThat(reobtainedFn.state, equalTo(TestDoFn.State.FINISHED));
    assertThat(fn, theInstance(reobtainedFn));
}
Also used : CounterSet(org.apache.beam.runners.dataflow.worker.counters.CounterSet) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) Receiver(org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Example 8 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class SimpleParDoFnTest method testStateTracking.

@Test
public void testStateTracking() throws Exception {
    ExecutionStateTracker tracker = ExecutionStateTracker.newForTest();
    TestOperationContext operationContext = TestOperationContext.create(new CounterSet(), NameContextsForTests.nameContextForTest(), new MetricsContainerImpl(NameContextsForTests.ORIGINAL_NAME), tracker);
    class StateTestingDoFn extends DoFn<Integer, String> {

        private boolean startCalled = false;

        @StartBundle
        public void startBundle() throws Exception {
            startCalled = true;
            assertThat(tracker.getCurrentState(), equalTo(operationContext.getStartState()));
        }

        @ProcessElement
        public void processElement(ProcessContext c) throws Exception {
            assertThat(startCalled, equalTo(true));
            assertThat(tracker.getCurrentState(), equalTo(operationContext.getProcessState()));
        }
    }
    StateTestingDoFn fn = new StateTestingDoFn();
    DoFnInfo<?, ?> fnInfo = DoFnInfo.forFn(fn, WindowingStrategy.globalDefault(), null, /* side input views */
    null, /* input coder */
    MAIN_OUTPUT, DoFnSchemaInformation.create(), Collections.emptyMap());
    ParDoFn userParDoFn = new SimpleParDoFn<>(options, DoFnInstanceManagers.singleInstance(fnInfo), NullSideInputReader.empty(), MAIN_OUTPUT, ImmutableMap.of(MAIN_OUTPUT, 0, new TupleTag<>("declared"), 1), BatchModeExecutionContext.forTesting(options, operationContext.counterFactory(), "testStage").getStepContext(operationContext), operationContext, DoFnSchemaInformation.create(), Collections.emptyMap(), SimpleDoFnRunnerFactory.INSTANCE);
    // This test ensures proper behavior of the state sampling even with lazy initialization.
    try (Closeable trackerCloser = tracker.activate()) {
        try (Closeable processCloser = operationContext.enterProcess()) {
            userParDoFn.processElement(WindowedValue.valueInGlobalWindow(5));
        }
    }
}
Also used : MetricsContainerImpl(org.apache.beam.runners.core.metrics.MetricsContainerImpl) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) DoFn(org.apache.beam.sdk.transforms.DoFn) CounterSet(org.apache.beam.runners.dataflow.worker.counters.CounterSet) ExecutionStateTracker(org.apache.beam.runners.core.metrics.ExecutionStateTracker) Closeable(java.io.Closeable) TupleTag(org.apache.beam.sdk.values.TupleTag) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Example 9 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class SimpleParDoFnTest method testUnexpectedNumberOfReceivers.

@Test
@SuppressWarnings("AssertionFailureIgnored")
public void testUnexpectedNumberOfReceivers() throws Exception {
    TestDoFn fn = new TestDoFn(Collections.emptyList());
    DoFnInfo<?, ?> fnInfo = DoFnInfo.forFn(fn, WindowingStrategy.globalDefault(), null, /* side input views */
    null, /* input coder */
    MAIN_OUTPUT, DoFnSchemaInformation.create(), Collections.emptyMap());
    TestReceiver receiver = new TestReceiver();
    ParDoFn userParDoFn = new SimpleParDoFn<>(options, DoFnInstanceManagers.singleInstance(fnInfo), new EmptySideInputReader(), MAIN_OUTPUT, ImmutableMap.of(MAIN_OUTPUT, 0), BatchModeExecutionContext.forTesting(options, "testStage").getStepContext(operationContext), operationContext, DoFnSchemaInformation.create(), Collections.emptyMap(), SimpleDoFnRunnerFactory.INSTANCE);
    try {
        userParDoFn.startBundle();
        fail("should have failed");
    } catch (Throwable exn) {
        assertThat(exn.toString(), containsString("unexpected number of receivers"));
    }
    try {
        userParDoFn.startBundle(receiver, receiver);
        fail("should have failed");
    } catch (Throwable exn) {
        assertThat(exn.toString(), containsString("unexpected number of receivers"));
    }
}
Also used : ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Example 10 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class SimpleParDoFnTest method testOutputReceivers.

@Test
public void testOutputReceivers() throws Exception {
    TestDoFn fn = new TestDoFn(ImmutableList.of(new TupleTag<>("tag1"), new TupleTag<>("tag2"), new TupleTag<>("tag3")));
    DoFnInfo<?, ?> fnInfo = DoFnInfo.forFn(fn, WindowingStrategy.globalDefault(), null, /* side input views */
    null, /* input coder */
    MAIN_OUTPUT, DoFnSchemaInformation.create(), Collections.emptyMap());
    TestReceiver receiver = new TestReceiver();
    TestReceiver receiver1 = new TestReceiver();
    TestReceiver receiver2 = new TestReceiver();
    TestReceiver receiver3 = new TestReceiver();
    ParDoFn userParDoFn = new SimpleParDoFn<>(options, DoFnInstanceManagers.cloningPool(fnInfo, options), new EmptySideInputReader(), MAIN_OUTPUT, ImmutableMap.of(MAIN_OUTPUT, 0, new TupleTag<String>("tag1"), 1, new TupleTag<String>("tag2"), 2, new TupleTag<String>("tag3"), 3), BatchModeExecutionContext.forTesting(options, "testStage").getStepContext(operationContext), operationContext, DoFnSchemaInformation.create(), Collections.emptyMap(), SimpleDoFnRunnerFactory.INSTANCE);
    userParDoFn.startBundle(receiver, receiver1, receiver2, receiver3);
    userParDoFn.processElement(WindowedValue.valueInGlobalWindow(3));
    userParDoFn.processElement(WindowedValue.valueInGlobalWindow(42));
    userParDoFn.processElement(WindowedValue.valueInGlobalWindow(666));
    userParDoFn.finishBundle();
    Object[] expectedReceivedElems = { WindowedValue.valueInGlobalWindow("processing: 3"), WindowedValue.valueInGlobalWindow("processing: 42"), WindowedValue.valueInGlobalWindow("processing: 666"), WindowedValue.valueInGlobalWindow("finished") };
    assertArrayEquals(expectedReceivedElems, receiver.receivedElems.toArray());
    Object[] expectedReceivedElems1 = { WindowedValue.valueInGlobalWindow("tag1: processing: 3"), WindowedValue.valueInGlobalWindow("tag1: processing: 42"), WindowedValue.valueInGlobalWindow("tag1: processing: 666"), WindowedValue.valueInGlobalWindow("tag1: finished") };
    assertArrayEquals(expectedReceivedElems1, receiver1.receivedElems.toArray());
    Object[] expectedReceivedElems2 = { WindowedValue.valueInGlobalWindow("tag2: processing: 3"), WindowedValue.valueInGlobalWindow("tag2: processing: 42"), WindowedValue.valueInGlobalWindow("tag2: processing: 666"), WindowedValue.valueInGlobalWindow("tag2: finished") };
    assertArrayEquals(expectedReceivedElems2, receiver2.receivedElems.toArray());
    Object[] expectedReceivedElems3 = { WindowedValue.valueInGlobalWindow("tag3: processing: 3"), WindowedValue.valueInGlobalWindow("tag3: processing: 42"), WindowedValue.valueInGlobalWindow("tag3: processing: 666"), WindowedValue.valueInGlobalWindow("tag3: finished") };
    assertArrayEquals(expectedReceivedElems3, receiver3.receivedElems.toArray());
}
Also used : TupleTag(org.apache.beam.sdk.values.TupleTag) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Aggregations

ParDoFn (org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn)34 Test (org.junit.Test)26 CloudObject (org.apache.beam.runners.dataflow.util.CloudObject)18 OutputReceiver (org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver)10 Coder (org.apache.beam.sdk.coders.Coder)9 KvCoder (org.apache.beam.sdk.coders.KvCoder)9 CounterSet (org.apache.beam.runners.dataflow.worker.counters.CounterSet)7 StringUtf8Coder (org.apache.beam.sdk.coders.StringUtf8Coder)7 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)7 ElementByteSizeObservableCoder (org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.ElementByteSizeObservableCoder)6 BatchSideInputPGBKParDoFn (org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.BatchSideInputPGBKParDoFn)6 StreamingSideInputPGBKParDoFn (org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.StreamingSideInputPGBKParDoFn)6 SimplePartialGroupByKeyParDoFn (org.apache.beam.runners.dataflow.worker.util.common.worker.SimplePartialGroupByKeyParDoFn)6 TestOutputReceiver (org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver)6 BigEndianIntegerCoder (org.apache.beam.sdk.coders.BigEndianIntegerCoder)6 IterableCoder (org.apache.beam.sdk.coders.IterableCoder)6 TupleTag (org.apache.beam.sdk.values.TupleTag)6 ArrayList (java.util.ArrayList)5 Structs.addString (org.apache.beam.runners.dataflow.util.Structs.addString)5 Receiver (org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver)5