Search in sources :

Example 16 with RemoteBundle

use of org.apache.beam.runners.fnexecution.control.RemoteBundle in project beam by apache.

the class ExecutableStageDoFnOperatorTest method testStageBundleClosed.

@Test
public void testStageBundleClosed() throws Exception {
    TupleTag<Integer> mainOutput = new TupleTag<>("main-output");
    DoFnOperator.MultiOutputOutputManagerFactory<Integer> outputManagerFactory = new DoFnOperator.MultiOutputOutputManagerFactory(mainOutput, VoidCoder.of(), new SerializablePipelineOptions(FlinkPipelineOptions.defaults()));
    ExecutableStageDoFnOperator<Integer, Integer> operator = getOperator(mainOutput, Collections.emptyList(), outputManagerFactory);
    OneInputStreamOperatorTestHarness<WindowedValue<Integer>, WindowedValue<Integer>> testHarness = new OneInputStreamOperatorTestHarness<>(operator);
    RemoteBundle bundle = Mockito.mock(RemoteBundle.class);
    when(bundle.getInputReceivers()).thenReturn(ImmutableMap.<String, FnDataReceiver<WindowedValue>>builder().put("input", Mockito.mock(FnDataReceiver.class)).build());
    when(stageBundleFactory.getBundle(any(), any(), any(), any(), any(), any())).thenReturn(bundle);
    testHarness.open();
    testHarness.close();
    verify(stageBundleFactory).getInstructionRequestHandler();
    verify(stageBundleFactory).close();
    verify(stageContext).close();
    verifyNoMoreInteractions(stageBundleFactory);
    // close() will also call dispose(), but call again to verify no new bundle
    // is created afterwards
    operator.cleanUp();
    verifyNoMoreInteractions(bundle);
}
Also used : FnDataReceiver(org.apache.beam.sdk.fn.data.FnDataReceiver) TupleTag(org.apache.beam.sdk.values.TupleTag) KeyedOneInputStreamOperatorTestHarness(org.apache.flink.streaming.util.KeyedOneInputStreamOperatorTestHarness) OneInputStreamOperatorTestHarness(org.apache.flink.streaming.util.OneInputStreamOperatorTestHarness) WindowedValue(org.apache.beam.sdk.util.WindowedValue) StreamRecordStripper.stripStreamRecordFromWindowedValue(org.apache.beam.runners.flink.translation.wrappers.streaming.StreamRecordStripper.stripStreamRecordFromWindowedValue) SerializablePipelineOptions(org.apache.beam.runners.core.construction.SerializablePipelineOptions) RemoteBundle(org.apache.beam.runners.fnexecution.control.RemoteBundle) Test(org.junit.Test) FlinkStateInternalsTest(org.apache.beam.runners.flink.streaming.FlinkStateInternalsTest)

Example 17 with RemoteBundle

use of org.apache.beam.runners.fnexecution.control.RemoteBundle in project beam by apache.

the class ExecutableStageDoFnOperatorTest method testEnsureStateCleanupWithKeyedInput.

@Test
@SuppressWarnings("unchecked")
public void testEnsureStateCleanupWithKeyedInput() throws Exception {
    TupleTag<Integer> mainOutput = new TupleTag<>("main-output");
    DoFnOperator.MultiOutputOutputManagerFactory<Integer> outputManagerFactory = new DoFnOperator.MultiOutputOutputManagerFactory(mainOutput, VarIntCoder.of(), new SerializablePipelineOptions(FlinkPipelineOptions.defaults()));
    VarIntCoder keyCoder = VarIntCoder.of();
    ExecutableStageDoFnOperator<Integer, Integer> operator = getOperator(mainOutput, Collections.emptyList(), outputManagerFactory, WindowingStrategy.globalDefault(), keyCoder, WindowedValue.getFullCoder(keyCoder, GlobalWindow.Coder.INSTANCE));
    KeyedOneInputStreamOperatorTestHarness<Integer, WindowedValue<Integer>, WindowedValue<Integer>> testHarness = new KeyedOneInputStreamOperatorTestHarness(operator, val -> val, new CoderTypeInformation<>(keyCoder, FlinkPipelineOptions.defaults()));
    RemoteBundle bundle = Mockito.mock(RemoteBundle.class);
    when(bundle.getInputReceivers()).thenReturn(ImmutableMap.<String, FnDataReceiver<WindowedValue>>builder().put("input", Mockito.mock(FnDataReceiver.class)).build());
    when(stageBundleFactory.getBundle(any(), any(), any(), any(), any(), any())).thenReturn(bundle);
    testHarness.open();
    Object doFnRunner = Whitebox.getInternalState(operator, "doFnRunner");
    assertThat(doFnRunner, instanceOf(DoFnRunnerWithMetricsUpdate.class));
    // There should be a StatefulDoFnRunner installed which takes care of clearing state
    Object statefulDoFnRunner = Whitebox.getInternalState(doFnRunner, "delegate");
    assertThat(statefulDoFnRunner, instanceOf(StatefulDoFnRunner.class));
}
Also used : FnDataReceiver(org.apache.beam.sdk.fn.data.FnDataReceiver) VarIntCoder(org.apache.beam.sdk.coders.VarIntCoder) TupleTag(org.apache.beam.sdk.values.TupleTag) KeyedOneInputStreamOperatorTestHarness(org.apache.flink.streaming.util.KeyedOneInputStreamOperatorTestHarness) WindowedValue(org.apache.beam.sdk.util.WindowedValue) StreamRecordStripper.stripStreamRecordFromWindowedValue(org.apache.beam.runners.flink.translation.wrappers.streaming.StreamRecordStripper.stripStreamRecordFromWindowedValue) StatefulDoFnRunner(org.apache.beam.runners.core.StatefulDoFnRunner) MutableObject(org.apache.beam.repackaged.core.org.apache.commons.lang3.mutable.MutableObject) SerializablePipelineOptions(org.apache.beam.runners.core.construction.SerializablePipelineOptions) DoFnRunnerWithMetricsUpdate(org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate) RemoteBundle(org.apache.beam.runners.fnexecution.control.RemoteBundle) Test(org.junit.Test) FlinkStateInternalsTest(org.apache.beam.runners.flink.streaming.FlinkStateInternalsTest)

Example 18 with RemoteBundle

use of org.apache.beam.runners.fnexecution.control.RemoteBundle in project beam by apache.

the class ExecutableStageDoFnOperatorTest method testEnsureStateCleanupOnFinalWatermark.

@Test
public void testEnsureStateCleanupOnFinalWatermark() throws Exception {
    TupleTag<Integer> mainOutput = new TupleTag<>("main-output");
    DoFnOperator.MultiOutputOutputManagerFactory<Integer> outputManagerFactory = new DoFnOperator.MultiOutputOutputManagerFactory(mainOutput, VoidCoder.of(), new SerializablePipelineOptions(FlinkPipelineOptions.defaults()));
    StringUtf8Coder keyCoder = StringUtf8Coder.of();
    WindowingStrategy windowingStrategy = WindowingStrategy.globalDefault();
    Coder<BoundedWindow> windowCoder = windowingStrategy.getWindowFn().windowCoder();
    KvCoder<String, Integer> kvCoder = KvCoder.of(keyCoder, VarIntCoder.of());
    ExecutableStageDoFnOperator<Integer, Integer> operator = getOperator(mainOutput, Collections.emptyList(), outputManagerFactory, windowingStrategy, keyCoder, WindowedValue.getFullCoder(kvCoder, windowCoder));
    KeyedOneInputStreamOperatorTestHarness<ByteBuffer, WindowedValue<KV<String, Integer>>, WindowedValue<Integer>> testHarness = new KeyedOneInputStreamOperatorTestHarness(operator, operator.keySelector, new CoderTypeInformation<>(FlinkKeyUtils.ByteBufferCoder.of(), FlinkPipelineOptions.defaults()));
    RemoteBundle bundle = Mockito.mock(RemoteBundle.class);
    when(bundle.getInputReceivers()).thenReturn(ImmutableMap.<String, FnDataReceiver<WindowedValue>>builder().put("input", Mockito.mock(FnDataReceiver.class)).build());
    when(stageBundleFactory.getBundle(any(), any(), any(), any(), any(), any())).thenReturn(bundle);
    testHarness.open();
    KeyedStateBackend<ByteBuffer> keyedStateBackend = operator.getKeyedStateBackend();
    ByteBuffer key = FlinkKeyUtils.encodeKey("key1", keyCoder);
    keyedStateBackend.setCurrentKey(key);
    // create some state which can be cleaned up
    assertThat(testHarness.numKeyedStateEntries(), is(0));
    StateNamespace stateNamespace = StateNamespaces.window(windowCoder, GlobalWindow.INSTANCE);
    // State from the SDK Harness is stored as ByteStrings
    BagState<ByteString> state = operator.keyedStateInternals.state(stateNamespace, StateTags.bag(stateId, ByteStringCoder.of()));
    state.add(ByteString.copyFrom("userstate".getBytes(Charsets.UTF_8)));
    // No timers have been set for cleanup
    assertThat(testHarness.numEventTimeTimers(), is(0));
    // State has been created
    assertThat(testHarness.numKeyedStateEntries(), is(1));
    // Generate final watermark to trigger state cleanup
    testHarness.processWatermark(new Watermark(BoundedWindow.TIMESTAMP_MAX_VALUE.plus(Duration.millis(1)).getMillis()));
    assertThat(testHarness.numKeyedStateEntries(), is(0));
}
Also used : ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) TupleTag(org.apache.beam.sdk.values.TupleTag) ArgumentMatchers.anyString(org.mockito.ArgumentMatchers.anyString) ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) WindowingStrategy(org.apache.beam.sdk.values.WindowingStrategy) KeyedOneInputStreamOperatorTestHarness(org.apache.flink.streaming.util.KeyedOneInputStreamOperatorTestHarness) WindowedValue(org.apache.beam.sdk.util.WindowedValue) StreamRecordStripper.stripStreamRecordFromWindowedValue(org.apache.beam.runners.flink.translation.wrappers.streaming.StreamRecordStripper.stripStreamRecordFromWindowedValue) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) SerializablePipelineOptions(org.apache.beam.runners.core.construction.SerializablePipelineOptions) FnDataReceiver(org.apache.beam.sdk.fn.data.FnDataReceiver) ByteBuffer(java.nio.ByteBuffer) StateNamespace(org.apache.beam.runners.core.StateNamespace) RemoteBundle(org.apache.beam.runners.fnexecution.control.RemoteBundle) Watermark(org.apache.flink.streaming.api.watermark.Watermark) Test(org.junit.Test) FlinkStateInternalsTest(org.apache.beam.runners.flink.streaming.FlinkStateInternalsTest)

Example 19 with RemoteBundle

use of org.apache.beam.runners.fnexecution.control.RemoteBundle in project beam by apache.

the class ExecutableStageDoFnOperatorTest method expectedInputsAreSent.

@Test
public void expectedInputsAreSent() throws Exception {
    TupleTag<Integer> mainOutput = new TupleTag<>("main-output");
    DoFnOperator.MultiOutputOutputManagerFactory<Integer> outputManagerFactory = new DoFnOperator.MultiOutputOutputManagerFactory(mainOutput, VoidCoder.of(), new SerializablePipelineOptions(FlinkPipelineOptions.defaults()));
    ExecutableStageDoFnOperator<Integer, Integer> operator = getOperator(mainOutput, Collections.emptyList(), outputManagerFactory);
    @SuppressWarnings("unchecked") RemoteBundle bundle = Mockito.mock(RemoteBundle.class);
    when(stageBundleFactory.getBundle(any(), any(), any(), any(), any(), any())).thenReturn(bundle);
    @SuppressWarnings("unchecked") FnDataReceiver<WindowedValue<?>> receiver = Mockito.mock(FnDataReceiver.class);
    when(bundle.getInputReceivers()).thenReturn(ImmutableMap.of("input", receiver));
    WindowedValue<Integer> one = WindowedValue.valueInGlobalWindow(1);
    WindowedValue<Integer> two = WindowedValue.valueInGlobalWindow(2);
    WindowedValue<Integer> three = WindowedValue.valueInGlobalWindow(3);
    OneInputStreamOperatorTestHarness<WindowedValue<Integer>, WindowedValue<Integer>> testHarness = new OneInputStreamOperatorTestHarness<>(operator);
    testHarness.open();
    testHarness.processElement(new StreamRecord<>(one));
    testHarness.processElement(new StreamRecord<>(two));
    testHarness.processElement(new StreamRecord<>(three));
    verify(receiver).accept(one);
    verify(receiver).accept(two);
    verify(receiver).accept(three);
    verifyNoMoreInteractions(receiver);
    testHarness.close();
}
Also used : TupleTag(org.apache.beam.sdk.values.TupleTag) KeyedOneInputStreamOperatorTestHarness(org.apache.flink.streaming.util.KeyedOneInputStreamOperatorTestHarness) OneInputStreamOperatorTestHarness(org.apache.flink.streaming.util.OneInputStreamOperatorTestHarness) WindowedValue(org.apache.beam.sdk.util.WindowedValue) StreamRecordStripper.stripStreamRecordFromWindowedValue(org.apache.beam.runners.flink.translation.wrappers.streaming.StreamRecordStripper.stripStreamRecordFromWindowedValue) SerializablePipelineOptions(org.apache.beam.runners.core.construction.SerializablePipelineOptions) RemoteBundle(org.apache.beam.runners.fnexecution.control.RemoteBundle) Test(org.junit.Test) FlinkStateInternalsTest(org.apache.beam.runners.flink.streaming.FlinkStateInternalsTest)

Example 20 with RemoteBundle

use of org.apache.beam.runners.fnexecution.control.RemoteBundle in project beam by apache.

the class SparkExecutableStageFunction method call.

@Override
public Iterator<RawUnionValue> call(Iterator<WindowedValue<InputT>> inputs) throws Exception {
    SparkPipelineOptions options = pipelineOptions.get().as(SparkPipelineOptions.class);
    // Register standard file systems.
    FileSystems.setDefaultPipelineOptions(options);
    // Otherwise, this may cause validation errors (e.g. ParDoTest)
    if (!inputs.hasNext()) {
        return Collections.emptyIterator();
    }
    try (ExecutableStageContext stageContext = contextFactory.get(jobInfo)) {
        ExecutableStage executableStage = ExecutableStage.fromPayload(stagePayload);
        try (StageBundleFactory stageBundleFactory = stageContext.getStageBundleFactory(executableStage)) {
            ConcurrentLinkedQueue<RawUnionValue> collector = new ConcurrentLinkedQueue<>();
            StateRequestHandler stateRequestHandler = getStateRequestHandler(executableStage, stageBundleFactory.getProcessBundleDescriptor());
            if (executableStage.getTimers().size() == 0) {
                ReceiverFactory receiverFactory = new ReceiverFactory(collector, outputMap);
                processElements(stateRequestHandler, receiverFactory, null, stageBundleFactory, inputs);
                return collector.iterator();
            }
            // Used with Batch, we know that all the data is available for this key. We can't use the
            // timer manager from the context because it doesn't exist. So we create one and advance
            // time to the end after processing all elements.
            final InMemoryTimerInternals timerInternals = new InMemoryTimerInternals();
            timerInternals.advanceProcessingTime(Instant.now());
            timerInternals.advanceSynchronizedProcessingTime(Instant.now());
            ReceiverFactory receiverFactory = new ReceiverFactory(collector, outputMap);
            TimerReceiverFactory timerReceiverFactory = new TimerReceiverFactory(stageBundleFactory, (Timer<?> timer, TimerInternals.TimerData timerData) -> {
                currentTimerKey = timer.getUserKey();
                if (timer.getClearBit()) {
                    timerInternals.deleteTimer(timerData);
                } else {
                    timerInternals.setTimer(timerData);
                }
            }, windowCoder);
            // Process inputs.
            processElements(stateRequestHandler, receiverFactory, timerReceiverFactory, stageBundleFactory, inputs);
            // Finish any pending windows by advancing the input watermark to infinity.
            timerInternals.advanceInputWatermark(BoundedWindow.TIMESTAMP_MAX_VALUE);
            // Finally, advance the processing time to infinity to fire any timers.
            timerInternals.advanceProcessingTime(BoundedWindow.TIMESTAMP_MAX_VALUE);
            timerInternals.advanceSynchronizedProcessingTime(BoundedWindow.TIMESTAMP_MAX_VALUE);
            // itself)
            while (timerInternals.hasPendingTimers()) {
                try (RemoteBundle bundle = stageBundleFactory.getBundle(receiverFactory, timerReceiverFactory, stateRequestHandler, getBundleProgressHandler())) {
                    PipelineTranslatorUtils.fireEligibleTimers(timerInternals, bundle.getTimerReceivers(), currentTimerKey);
                }
            }
            return collector.iterator();
        }
    }
}
Also used : TimerReceiverFactory(org.apache.beam.runners.fnexecution.control.TimerReceiverFactory) OutputReceiverFactory(org.apache.beam.runners.fnexecution.control.OutputReceiverFactory) StateRequestHandler(org.apache.beam.runners.fnexecution.state.StateRequestHandler) RawUnionValue(org.apache.beam.sdk.transforms.join.RawUnionValue) InMemoryTimerInternals(org.apache.beam.runners.core.InMemoryTimerInternals) SparkPipelineOptions(org.apache.beam.runners.spark.SparkPipelineOptions) StageBundleFactory(org.apache.beam.runners.fnexecution.control.StageBundleFactory) Timer(org.apache.beam.runners.core.construction.Timer) ExecutableStageContext(org.apache.beam.runners.fnexecution.control.ExecutableStageContext) TimerReceiverFactory(org.apache.beam.runners.fnexecution.control.TimerReceiverFactory) ExecutableStage(org.apache.beam.runners.core.construction.graph.ExecutableStage) ConcurrentLinkedQueue(java.util.concurrent.ConcurrentLinkedQueue) RemoteBundle(org.apache.beam.runners.fnexecution.control.RemoteBundle)

Aggregations

RemoteBundle (org.apache.beam.runners.fnexecution.control.RemoteBundle)22 WindowedValue (org.apache.beam.sdk.util.WindowedValue)18 FnDataReceiver (org.apache.beam.sdk.fn.data.FnDataReceiver)12 Test (org.junit.Test)12 BundleProgressHandler (org.apache.beam.runners.fnexecution.control.BundleProgressHandler)9 TimerReceiverFactory (org.apache.beam.runners.fnexecution.control.TimerReceiverFactory)9 StateRequestHandler (org.apache.beam.runners.fnexecution.state.StateRequestHandler)9 SerializablePipelineOptions (org.apache.beam.runners.core.construction.SerializablePipelineOptions)8 StreamRecordStripper.stripStreamRecordFromWindowedValue (org.apache.beam.runners.flink.translation.wrappers.streaming.StreamRecordStripper.stripStreamRecordFromWindowedValue)8 OutputReceiverFactory (org.apache.beam.runners.fnexecution.control.OutputReceiverFactory)8 TupleTag (org.apache.beam.sdk.values.TupleTag)8 KeyedOneInputStreamOperatorTestHarness (org.apache.flink.streaming.util.KeyedOneInputStreamOperatorTestHarness)8 FlinkStateInternalsTest (org.apache.beam.runners.flink.streaming.FlinkStateInternalsTest)7 BundleCheckpointHandler (org.apache.beam.runners.fnexecution.control.BundleCheckpointHandler)7 BundleFinalizationHandler (org.apache.beam.runners.fnexecution.control.BundleFinalizationHandler)7 StringUtf8Coder (org.apache.beam.sdk.coders.StringUtf8Coder)7 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)7 StageBundleFactory (org.apache.beam.runners.fnexecution.control.StageBundleFactory)6 HashMap (java.util.HashMap)5 InMemoryTimerInternals (org.apache.beam.runners.core.InMemoryTimerInternals)5