Search in sources :

Example 1 with ReadableState

use of org.apache.beam.sdk.state.ReadableState in project beam by apache.

the class StateMerging method mergeWatermarks.

/**
   * Merge all watermark state in {@code sources} (which must include {@code result} if non-empty)
   * into {@code result}, where the final merge result window is {@code mergeResult}.
   */
public static <W extends BoundedWindow> void mergeWatermarks(Collection<WatermarkHoldState> sources, WatermarkHoldState result, W resultWindow) {
    if (sources.isEmpty()) {
        // Nothing to merge.
        return;
    }
    if (sources.size() == 1 && sources.contains(result) && result.getTimestampCombiner().dependsOnlyOnEarliestTimestamp()) {
        // Nothing to merge.
        return;
    }
    if (result.getTimestampCombiner().dependsOnlyOnWindow()) {
        // Clear sources.
        for (WatermarkHoldState source : sources) {
            source.clear();
        }
        // Update directly from window-derived hold.
        Instant hold = result.getTimestampCombiner().assign(resultWindow, BoundedWindow.TIMESTAMP_MIN_VALUE);
        checkState(hold.isAfter(BoundedWindow.TIMESTAMP_MIN_VALUE));
        result.add(hold);
    } else {
        // Prefetch.
        List<ReadableState<Instant>> futures = new ArrayList<>(sources.size());
        for (WatermarkHoldState source : sources) {
            futures.add(source);
        }
        // Read.
        List<Instant> outputTimesToMerge = new ArrayList<>(sources.size());
        for (ReadableState<Instant> future : futures) {
            Instant sourceOutputTime = future.read();
            if (sourceOutputTime != null) {
                outputTimesToMerge.add(sourceOutputTime);
            }
        }
        // Clear sources.
        for (WatermarkHoldState source : sources) {
            source.clear();
        }
        if (!outputTimesToMerge.isEmpty()) {
            // Merge and update.
            result.add(result.getTimestampCombiner().merge(resultWindow, outputTimesToMerge));
        }
    }
}
Also used : Instant(org.joda.time.Instant) ArrayList(java.util.ArrayList) ReadableState(org.apache.beam.sdk.state.ReadableState) WatermarkHoldState(org.apache.beam.sdk.state.WatermarkHoldState)

Example 2 with ReadableState

use of org.apache.beam.sdk.state.ReadableState in project beam by apache.

the class WatermarkHold method extractAndRelease.

/**
   * Return (a future for) the earliest hold for {@code context}. Clear all the holds after
   * reading, but add/restore an end-of-window or garbage collection hold if required.
   *
   * <p>The returned timestamp is the output timestamp according to the {@link TimestampCombiner}
   * from the windowing strategy of this {@link WatermarkHold}, combined across all the non-late
   * elements in the current pane. If there is no such value the timestamp is the end
   * of the window.
   */
public ReadableState<OldAndNewHolds> extractAndRelease(final ReduceFn<?, ?, ?, W>.Context<?, ?, ?, W> context, final boolean isFinished) {
    WindowTracing.debug("WatermarkHold.extractAndRelease: for key:{}; window:{}; inputWatermark:{}; " + "outputWatermark:{}", context.key(), context.window(), timerInternals.currentInputWatermarkTime(), timerInternals.currentOutputWatermarkTime());
    final WatermarkHoldState elementHoldState = context.state().access(elementHoldTag);
    final WatermarkHoldState extraHoldState = context.state().access(EXTRA_HOLD_TAG);
    return new ReadableState<OldAndNewHolds>() {

        @Override
        public ReadableState<OldAndNewHolds> readLater() {
            elementHoldState.readLater();
            extraHoldState.readLater();
            return this;
        }

        @Override
        public OldAndNewHolds read() {
            // Read both the element and extra holds.
            Instant elementHold = elementHoldState.read();
            Instant extraHold = extraHoldState.read();
            Instant oldHold;
            // Find the minimum, accounting for null.
            if (elementHold == null) {
                oldHold = extraHold;
            } else if (extraHold == null) {
                oldHold = elementHold;
            } else if (elementHold.isBefore(extraHold)) {
                oldHold = elementHold;
            } else {
                oldHold = extraHold;
            }
            if (oldHold == null || oldHold.isAfter(context.window().maxTimestamp())) {
                // If no hold (eg because all elements came in behind the output watermark), or
                // the hold was for garbage collection, take the end of window as the result.
                WindowTracing.debug("WatermarkHold.extractAndRelease.read: clipping from {} to end of window " + "for key:{}; window:{}", oldHold, context.key(), context.window());
                oldHold = context.window().maxTimestamp();
            }
            WindowTracing.debug("WatermarkHold.extractAndRelease.read: clearing for key:{}; window:{}", context.key(), context.window());
            // Clear the underlying state to allow the output watermark to progress.
            elementHoldState.clear();
            extraHoldState.clear();
            @Nullable Instant newHold = null;
            if (!isFinished) {
                // Only need to leave behind an end-of-window or garbage collection hold
                // if future elements will be processed.
                newHold = addEndOfWindowOrGarbageCollectionHolds(context, true);
            }
            return new OldAndNewHolds(oldHold, newHold);
        }
    };
}
Also used : Instant(org.joda.time.Instant) ReadableState(org.apache.beam.sdk.state.ReadableState) WatermarkHoldState(org.apache.beam.sdk.state.WatermarkHoldState) Nullable(javax.annotation.Nullable)

Example 3 with ReadableState

use of org.apache.beam.sdk.state.ReadableState in project beam by apache.

the class PaneInfoTracker method getNextPaneInfo.

/**
 * Return a ({@link ReadableState} for) the pane info appropriate for {@code context}. The pane
 * info includes the timing for the pane, who's calculation is quite subtle.
 *
 * @param isFinal should be {@code true} only if the triggering machinery can guarantee no further
 *     firings for the
 */
public ReadableState<PaneInfo> getNextPaneInfo(ReduceFn<?, ?, ?, ?>.Context context, final boolean isFinal) {
    final Object key = context.key();
    final ReadableState<PaneInfo> previousPaneFuture = context.state().access(PaneInfoTracker.PANE_INFO_TAG);
    final Instant windowMaxTimestamp = context.window().maxTimestamp();
    return new ReadableState<PaneInfo>() {

        @Override
        @SuppressFBWarnings(// just prefetch calls to readLater
        "RV_RETURN_VALUE_IGNORED_NO_SIDE_EFFECT")
        public ReadableState<PaneInfo> readLater() {
            previousPaneFuture.readLater();
            return this;
        }

        @Override
        public PaneInfo read() {
            PaneInfo previousPane = previousPaneFuture.read();
            return describePane(key, windowMaxTimestamp, previousPane, isFinal);
        }
    };
}
Also used : PaneInfo(org.apache.beam.sdk.transforms.windowing.PaneInfo) Instant(org.joda.time.Instant) ReadableState(org.apache.beam.sdk.state.ReadableState)

Example 4 with ReadableState

use of org.apache.beam.sdk.state.ReadableState in project beam by apache.

the class RemoteExecutionTest method testExecutionWithUserStateCaching.

@Test
public void testExecutionWithUserStateCaching() throws Exception {
    Pipeline p = Pipeline.create();
    launchSdkHarness(p.getOptions());
    final String stateId = "foo";
    final String stateId2 = "bar";
    p.apply("impulse", Impulse.create()).apply("create", ParDo.of(new DoFn<byte[], KV<String, String>>() {

        @ProcessElement
        public void process(ProcessContext ctxt) {
        }
    })).setCoder(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of())).apply("userState", ParDo.of(new DoFn<KV<String, String>, KV<String, String>>() {

        @StateId(stateId)
        private final StateSpec<BagState<String>> bufferState = StateSpecs.bag(StringUtf8Coder.of());

        @StateId(stateId2)
        private final StateSpec<BagState<String>> bufferState2 = StateSpecs.bag(StringUtf8Coder.of());

        @ProcessElement
        public void processElement(@Element KV<String, String> element, @StateId(stateId) BagState<String> state, @StateId(stateId2) BagState<String> state2, OutputReceiver<KV<String, String>> r) {
            for (String value : state.read()) {
                r.output(KV.of(element.getKey(), value));
            }
            ReadableState<Boolean> isEmpty = state2.isEmpty();
            if (isEmpty.read()) {
                r.output(KV.of(element.getKey(), "Empty"));
            } else {
                state2.clear();
            }
        }
    })).apply("gbk", GroupByKey.create());
    RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p);
    FusedPipeline fused = GreedyPipelineFuser.fuse(pipelineProto);
    Optional<ExecutableStage> optionalStage = Iterables.tryFind(fused.getFusedStages(), (ExecutableStage stage) -> !stage.getUserStates().isEmpty());
    checkState(optionalStage.isPresent(), "Expected a stage with user state.");
    ExecutableStage stage = optionalStage.get();
    ExecutableProcessBundleDescriptor descriptor = ProcessBundleDescriptors.fromExecutableStage("test_stage", stage, dataServer.getApiServiceDescriptor(), stateServer.getApiServiceDescriptor());
    BundleProcessor processor = controlClient.getProcessor(descriptor.getProcessBundleDescriptor(), descriptor.getRemoteInputDestinations(), stateDelegator);
    Map<String, Coder> remoteOutputCoders = descriptor.getRemoteOutputCoders();
    Map<String, Collection<WindowedValue<?>>> outputValues = new HashMap<>();
    Map<String, RemoteOutputReceiver<?>> outputReceivers = new HashMap<>();
    for (Entry<String, Coder> remoteOutputCoder : remoteOutputCoders.entrySet()) {
        List<WindowedValue<?>> outputContents = Collections.synchronizedList(new ArrayList<>());
        outputValues.put(remoteOutputCoder.getKey(), outputContents);
        outputReceivers.put(remoteOutputCoder.getKey(), RemoteOutputReceiver.of((Coder<WindowedValue<?>>) remoteOutputCoder.getValue(), outputContents::add));
    }
    Map<String, List<ByteString>> userStateData = ImmutableMap.of(stateId, new ArrayList(Arrays.asList(ByteString.copyFrom(CoderUtils.encodeToByteArray(StringUtf8Coder.of(), "A", Coder.Context.NESTED)), ByteString.copyFrom(CoderUtils.encodeToByteArray(StringUtf8Coder.of(), "B", Coder.Context.NESTED)), ByteString.copyFrom(CoderUtils.encodeToByteArray(StringUtf8Coder.of(), "C", Coder.Context.NESTED)))), stateId2, new ArrayList(Arrays.asList(ByteString.copyFrom(CoderUtils.encodeToByteArray(StringUtf8Coder.of(), "D", Coder.Context.NESTED)))));
    StoringStateRequestHandler stateRequestHandler = new StoringStateRequestHandler(StateRequestHandlers.forBagUserStateHandlerFactory(descriptor, new BagUserStateHandlerFactory<ByteString, Object, BoundedWindow>() {

        @Override
        public BagUserStateHandler<ByteString, Object, BoundedWindow> forUserState(String pTransformId, String userStateId, Coder<ByteString> keyCoder, Coder<Object> valueCoder, Coder<BoundedWindow> windowCoder) {
            return new BagUserStateHandler<ByteString, Object, BoundedWindow>() {

                @Override
                public Iterable<Object> get(ByteString key, BoundedWindow window) {
                    return (Iterable) userStateData.get(userStateId);
                }

                @Override
                public void append(ByteString key, BoundedWindow window, Iterator<Object> values) {
                    Iterators.addAll(userStateData.get(userStateId), (Iterator) values);
                }

                @Override
                public void clear(ByteString key, BoundedWindow window) {
                    userStateData.get(userStateId).clear();
                }
            };
        }
    }));
    try (RemoteBundle bundle = processor.newBundle(outputReceivers, stateRequestHandler, BundleProgressHandler.ignored())) {
        Iterables.getOnlyElement(bundle.getInputReceivers().values()).accept(valueInGlobalWindow(KV.of("X", "Y")));
    }
    try (RemoteBundle bundle2 = processor.newBundle(outputReceivers, stateRequestHandler, BundleProgressHandler.ignored())) {
        Iterables.getOnlyElement(bundle2.getInputReceivers().values()).accept(valueInGlobalWindow(KV.of("X", "Z")));
    }
    for (Collection<WindowedValue<?>> windowedValues : outputValues.values()) {
        assertThat(windowedValues, containsInAnyOrder(valueInGlobalWindow(KV.of("X", "A")), valueInGlobalWindow(KV.of("X", "B")), valueInGlobalWindow(KV.of("X", "C")), valueInGlobalWindow(KV.of("X", "A")), valueInGlobalWindow(KV.of("X", "B")), valueInGlobalWindow(KV.of("X", "C")), valueInGlobalWindow(KV.of("X", "Empty"))));
    }
    assertThat(userStateData.get(stateId), IsIterableContainingInOrder.contains(ByteString.copyFrom(CoderUtils.encodeToByteArray(StringUtf8Coder.of(), "A", Coder.Context.NESTED)), ByteString.copyFrom(CoderUtils.encodeToByteArray(StringUtf8Coder.of(), "B", Coder.Context.NESTED)), ByteString.copyFrom(CoderUtils.encodeToByteArray(StringUtf8Coder.of(), "C", Coder.Context.NESTED))));
    assertThat(userStateData.get(stateId2), IsEmptyIterable.emptyIterable());
    // 3 Requests expected: state read, state2 read, and state2 clear
    assertEquals(3, stateRequestHandler.getRequestCount());
    ByteString.Output out = ByteString.newOutput();
    StringUtf8Coder.of().encode("X", out);
    assertEquals(stateId, stateRequestHandler.receivedRequests.get(0).getStateKey().getBagUserState().getUserStateId());
    assertEquals(stateRequestHandler.receivedRequests.get(0).getStateKey().getBagUserState().getKey(), out.toByteString());
    assertTrue(stateRequestHandler.receivedRequests.get(0).hasGet());
    assertEquals(stateId2, stateRequestHandler.receivedRequests.get(1).getStateKey().getBagUserState().getUserStateId());
    assertEquals(stateRequestHandler.receivedRequests.get(1).getStateKey().getBagUserState().getKey(), out.toByteString());
    assertTrue(stateRequestHandler.receivedRequests.get(1).hasGet());
    assertEquals(stateId2, stateRequestHandler.receivedRequests.get(2).getStateKey().getBagUserState().getUserStateId());
    assertEquals(stateRequestHandler.receivedRequests.get(2).getStateKey().getBagUserState().getKey(), out.toByteString());
    assertTrue(stateRequestHandler.receivedRequests.get(2).hasClear());
}
Also used : IsEmptyIterable(org.hamcrest.collection.IsEmptyIterable) ConcurrentHashMap(java.util.concurrent.ConcurrentHashMap) HashMap(java.util.HashMap) ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) ArrayList(java.util.ArrayList) ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) BundleProcessor(org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor) WindowedValue(org.apache.beam.sdk.util.WindowedValue) Iterator(java.util.Iterator) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) ArrayList(java.util.ArrayList) PCollectionList(org.apache.beam.sdk.values.PCollectionList) List(java.util.List) ExecutableStage(org.apache.beam.runners.core.construction.graph.ExecutableStage) BagState(org.apache.beam.sdk.state.BagState) ExecutableProcessBundleDescriptor(org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor) KvCoder(org.apache.beam.sdk.coders.KvCoder) Coder(org.apache.beam.sdk.coders.Coder) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) BigEndianLongCoder(org.apache.beam.sdk.coders.BigEndianLongCoder) BagUserStateHandlerFactory(org.apache.beam.runners.fnexecution.state.StateRequestHandlers.BagUserStateHandlerFactory) BagUserStateHandler(org.apache.beam.runners.fnexecution.state.StateRequestHandlers.BagUserStateHandler) FusedPipeline(org.apache.beam.runners.core.construction.graph.FusedPipeline) ReadableState(org.apache.beam.sdk.state.ReadableState) KV(org.apache.beam.sdk.values.KV) Pipeline(org.apache.beam.sdk.Pipeline) FusedPipeline(org.apache.beam.runners.core.construction.graph.FusedPipeline) Collection(java.util.Collection) PCollection(org.apache.beam.sdk.values.PCollection) Test(org.junit.Test)

Example 5 with ReadableState

use of org.apache.beam.sdk.state.ReadableState in project beam by apache.

the class PaneInfoTracker method getNextPaneInfo.

/**
   * Return a ({@link ReadableState} for) the pane info appropriate for {@code context}. The pane
   * info includes the timing for the pane, who's calculation is quite subtle.
   *
   * @param isFinal should be {@code true} only if the triggering machinery can guarantee
   * no further firings for the
   */
public ReadableState<PaneInfo> getNextPaneInfo(ReduceFn<?, ?, ?, ?>.Context<?, ?, ?, ?> context, final boolean isFinal) {
    final Object key = context.key();
    final ReadableState<PaneInfo> previousPaneFuture = context.state().access(PaneInfoTracker.PANE_INFO_TAG);
    final Instant windowMaxTimestamp = context.window().maxTimestamp();
    return new ReadableState<PaneInfo>() {

        @Override
        public ReadableState<PaneInfo> readLater() {
            previousPaneFuture.readLater();
            return this;
        }

        @Override
        public PaneInfo read() {
            PaneInfo previousPane = previousPaneFuture.read();
            return describePane(key, windowMaxTimestamp, previousPane, isFinal);
        }
    };
}
Also used : PaneInfo(org.apache.beam.sdk.transforms.windowing.PaneInfo) Instant(org.joda.time.Instant) ReadableState(org.apache.beam.sdk.state.ReadableState)

Aggregations

ReadableState (org.apache.beam.sdk.state.ReadableState)6 Instant (org.joda.time.Instant)5 WatermarkHoldState (org.apache.beam.sdk.state.WatermarkHoldState)3 ArrayList (java.util.ArrayList)2 PaneInfo (org.apache.beam.sdk.transforms.windowing.PaneInfo)2 Collection (java.util.Collection)1 HashMap (java.util.HashMap)1 Iterator (java.util.Iterator)1 List (java.util.List)1 ConcurrentHashMap (java.util.concurrent.ConcurrentHashMap)1 Nullable (javax.annotation.Nullable)1 RunnerApi (org.apache.beam.model.pipeline.v1.RunnerApi)1 ExecutableStage (org.apache.beam.runners.core.construction.graph.ExecutableStage)1 FusedPipeline (org.apache.beam.runners.core.construction.graph.FusedPipeline)1 ExecutableProcessBundleDescriptor (org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor)1 BundleProcessor (org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor)1 BagUserStateHandler (org.apache.beam.runners.fnexecution.state.StateRequestHandlers.BagUserStateHandler)1 BagUserStateHandlerFactory (org.apache.beam.runners.fnexecution.state.StateRequestHandlers.BagUserStateHandlerFactory)1 Pipeline (org.apache.beam.sdk.Pipeline)1 BigEndianLongCoder (org.apache.beam.sdk.coders.BigEndianLongCoder)1