Search in sources :

Example 11 with KeyedWorkItem

use of org.apache.beam.runners.core.KeyedWorkItem in project beam by apache.

the class KeyedPValueTrackingVisitorTest method keyedInputWithKeyPreserving.

@Test
public void keyedInputWithKeyPreserving() {
    PCollection<KV<String, WindowedValue<KV<String, Integer>>>> input = p.apply(Create.of(KV.of("hello", WindowedValue.of(KV.of("hello", 3), new Instant(0), new IntervalWindow(new Instant(0), new Instant(9)), PaneInfo.NO_FIRING))).withCoder(KvCoder.of(StringUtf8Coder.of(), WindowedValue.getValueOnlyCoder(KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of())))));
    TupleTag<KeyedWorkItem<String, KV<String, Integer>>> keyedTag = new TupleTag<>();
    PCollection<KeyedWorkItem<String, KV<String, Integer>>> keyed = input.apply(new DirectGroupByKeyOnly<>()).apply(new DirectGroupAlsoByWindow<>(WindowingStrategy.globalDefault(), WindowingStrategy.globalDefault())).apply(ParDo.of(new ParDoMultiOverrideFactory.ToKeyedWorkItem<String, Integer>()).withOutputTags(keyedTag, TupleTagList.empty())).get(keyedTag).setCoder(KeyedWorkItemCoder.of(StringUtf8Coder.of(), KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of()), GlobalWindow.Coder.INSTANCE));
    p.traverseTopologically(visitor);
    assertThat(visitor.getKeyedPValues(), hasItem(keyed));
}
Also used : Instant(org.joda.time.Instant) TupleTag(org.apache.beam.sdk.values.TupleTag) KV(org.apache.beam.sdk.values.KV) KeyedWorkItem(org.apache.beam.runners.core.KeyedWorkItem) DirectGroupAlsoByWindow(org.apache.beam.runners.direct.DirectGroupByKey.DirectGroupAlsoByWindow) IntervalWindow(org.apache.beam.sdk.transforms.windowing.IntervalWindow) Test(org.junit.Test)

Example 12 with KeyedWorkItem

use of org.apache.beam.runners.core.KeyedWorkItem in project beam by apache.

the class SplittableProcessElementsEvaluatorFactory method createEvaluator.

@SuppressWarnings({ "unchecked", "rawtypes" })
private TransformEvaluator<KeyedWorkItem<String, ElementAndRestriction<InputT, RestrictionT>>> createEvaluator(AppliedPTransform<PCollection<KeyedWorkItem<String, ElementAndRestriction<InputT, RestrictionT>>>, PCollectionTuple, ProcessElements<InputT, OutputT, RestrictionT, TrackerT>> application, CommittedBundle<InputT> inputBundle) throws Exception {
    final ProcessElements<InputT, OutputT, RestrictionT, TrackerT> transform = application.getTransform();
    ProcessFn<InputT, OutputT, RestrictionT, TrackerT> processFn = transform.newProcessFn(transform.getFn());
    DoFnLifecycleManager fnManager = DoFnLifecycleManager.of(processFn);
    processFn = ((ProcessFn<InputT, OutputT, RestrictionT, TrackerT>) fnManager.<KeyedWorkItem<String, ElementAndRestriction<InputT, RestrictionT>>, OutputT>get());
    String stepName = evaluationContext.getStepName(application);
    final DirectExecutionContext.DirectStepContext stepContext = evaluationContext.getExecutionContext(application, inputBundle.getKey()).getStepContext(stepName);
    final ParDoEvaluator<KeyedWorkItem<String, ElementAndRestriction<InputT, RestrictionT>>> parDoEvaluator = delegateFactory.createParDoEvaluator(application, inputBundle.getKey(), transform.getSideInputs(), transform.getMainOutputTag(), transform.getAdditionalOutputTags().getAll(), stepContext, processFn, fnManager);
    processFn.setStateInternalsFactory(new StateInternalsFactory<String>() {

        @SuppressWarnings({ "unchecked", "rawtypes" })
        @Override
        public StateInternals stateInternalsForKey(String key) {
            return (StateInternals) stepContext.stateInternals();
        }
    });
    processFn.setTimerInternalsFactory(new TimerInternalsFactory<String>() {

        @Override
        public TimerInternals timerInternalsForKey(String key) {
            return stepContext.timerInternals();
        }
    });
    OutputWindowedValue<OutputT> outputWindowedValue = new OutputWindowedValue<OutputT>() {

        private final OutputManager outputManager = parDoEvaluator.getOutputManager();

        @Override
        public void outputWindowedValue(OutputT output, Instant timestamp, Collection<? extends BoundedWindow> windows, PaneInfo pane) {
            outputManager.output(transform.getMainOutputTag(), WindowedValue.of(output, timestamp, windows, pane));
        }

        @Override
        public <AdditionalOutputT> void outputWindowedValue(TupleTag<AdditionalOutputT> tag, AdditionalOutputT output, Instant timestamp, Collection<? extends BoundedWindow> windows, PaneInfo pane) {
            outputManager.output(tag, WindowedValue.of(output, timestamp, windows, pane));
        }
    };
    processFn.setProcessElementInvoker(new OutputAndTimeBoundedSplittableProcessElementInvoker<InputT, OutputT, RestrictionT, TrackerT>(transform.getFn(), evaluationContext.getPipelineOptions(), outputWindowedValue, evaluationContext.createSideInputReader(transform.getSideInputs()), // DirectRunner.
    Executors.newSingleThreadScheduledExecutor(new ThreadFactoryBuilder().setThreadFactory(MoreExecutors.platformThreadFactory()).setDaemon(true).setNameFormat("direct-splittable-process-element-checkpoint-executor").build()), 10000, Duration.standardSeconds(10)));
    return DoFnLifecycleManagerRemovingTransformEvaluator.wrapping(parDoEvaluator, fnManager);
}
Also used : ProcessFn(org.apache.beam.runners.core.SplittableParDoViaKeyedWorkItems.ProcessFn) TupleTag(org.apache.beam.sdk.values.TupleTag) PaneInfo(org.apache.beam.sdk.transforms.windowing.PaneInfo) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) ThreadFactoryBuilder(com.google.common.util.concurrent.ThreadFactoryBuilder) ElementAndRestriction(org.apache.beam.runners.core.construction.ElementAndRestriction) OutputWindowedValue(org.apache.beam.runners.core.OutputWindowedValue) Instant(org.joda.time.Instant) KeyedWorkItem(org.apache.beam.runners.core.KeyedWorkItem) TimerInternals(org.apache.beam.runners.core.TimerInternals) StateInternals(org.apache.beam.runners.core.StateInternals) Collection(java.util.Collection) PCollection(org.apache.beam.sdk.values.PCollection) OutputManager(org.apache.beam.runners.core.DoFnRunners.OutputManager)

Example 13 with KeyedWorkItem

use of org.apache.beam.runners.core.KeyedWorkItem in project beam by apache.

the class ApexParDoOperator method processElementInReadyWindows.

private Iterable<WindowedValue<InputT>> processElementInReadyWindows(WindowedValue<InputT> elem) {
    try {
        pushbackDoFnRunner.startBundle();
        if (currentKeyStateInternals != null) {
            InputT value = elem.getValue();
            final Object key;
            final Coder<Object> keyCoder;
            @SuppressWarnings({ "rawtypes", "unchecked" }) WindowedValueCoder<InputT> wvCoder = (WindowedValueCoder) inputCoder;
            if (value instanceof KeyedWorkItem) {
                key = ((KeyedWorkItem) value).key();
                @SuppressWarnings({ "rawtypes", "unchecked" }) KeyedWorkItemCoder<Object, ?> kwiCoder = (KeyedWorkItemCoder) wvCoder.getValueCoder();
                keyCoder = kwiCoder.getKeyCoder();
            } else {
                key = ((KV) value).getKey();
                @SuppressWarnings({ "rawtypes", "unchecked" }) KvCoder<Object, ?> kwiCoder = (KvCoder) wvCoder.getValueCoder();
                keyCoder = kwiCoder.getKeyCoder();
            }
            ((StateInternalsProxy) currentKeyStateInternals).setKey(key);
            currentKeyTimerInternals.setContext(key, keyCoder, new Instant(this.currentInputWatermark), new Instant(this.currentOutputWatermark));
        }
        Iterable<WindowedValue<InputT>> pushedBack = pushbackDoFnRunner.processElementInReadyWindows(elem);
        pushbackDoFnRunner.finishBundle();
        return pushedBack;
    } catch (UserCodeException ue) {
        if (ue.getCause() instanceof AssertionError) {
            ApexRunner.ASSERTION_ERROR.set((AssertionError) ue.getCause());
        }
        throw ue;
    }
}
Also used : StateInternalsProxy(org.apache.beam.runners.apex.translation.utils.StateInternalsProxy) KeyedWorkItemCoder(org.apache.beam.runners.core.KeyedWorkItemCoder) Instant(org.joda.time.Instant) KvCoder(org.apache.beam.sdk.coders.KvCoder) KeyedWorkItem(org.apache.beam.runners.core.KeyedWorkItem) WindowedValueCoder(org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder) UserCodeException(org.apache.beam.sdk.util.UserCodeException) WindowedValue(org.apache.beam.sdk.util.WindowedValue) OutputWindowedValue(org.apache.beam.runners.core.OutputWindowedValue)

Example 14 with KeyedWorkItem

use of org.apache.beam.runners.core.KeyedWorkItem in project beam by apache.

the class SplittableProcessElementsEvaluatorFactory method createEvaluator.

@SuppressWarnings({ "unchecked", "rawtypes" })
private TransformEvaluator<KeyedWorkItem<byte[], KV<InputT, RestrictionT>>> createEvaluator(AppliedPTransform<PCollection<KeyedWorkItem<byte[], KV<InputT, RestrictionT>>>, PCollectionTuple, ProcessElements<InputT, OutputT, RestrictionT, PositionT, WatermarkEstimatorStateT>> application, CommittedBundle<InputT> inputBundle) throws Exception {
    final ProcessElements<InputT, OutputT, RestrictionT, PositionT, WatermarkEstimatorStateT> transform = application.getTransform();
    final DoFnLifecycleManagerRemovingTransformEvaluator<KeyedWorkItem<byte[], KV<InputT, RestrictionT>>> evaluator = delegateFactory.createEvaluator((AppliedPTransform) application, (PCollection<KeyedWorkItem<byte[], KV<InputT, RestrictionT>>>) inputBundle.getPCollection(), inputBundle.getKey(), application.getTransform().getSideInputs(), application.getTransform().getMainOutputTag(), application.getTransform().getAdditionalOutputTags().getAll(), DoFnSchemaInformation.create(), application.getTransform().getSideInputMapping());
    final ParDoEvaluator<KeyedWorkItem<byte[], KV<InputT, RestrictionT>>> pde = evaluator.getParDoEvaluator();
    final ProcessFn<InputT, OutputT, RestrictionT, PositionT, WatermarkEstimatorStateT> processFn = (ProcessFn<InputT, OutputT, RestrictionT, PositionT, WatermarkEstimatorStateT>) ProcessFnRunner.class.cast(pde.getFnRunner()).getFn();
    final DirectExecutionContext.DirectStepContext stepContext = pde.getStepContext();
    processFn.setStateInternalsFactory(key -> stepContext.stateInternals());
    processFn.setTimerInternalsFactory(key -> stepContext.timerInternals());
    OutputWindowedValue<OutputT> outputWindowedValue = new OutputWindowedValue<OutputT>() {

        private final OutputManager outputManager = pde.getOutputManager();

        @Override
        public void outputWindowedValue(OutputT output, Instant timestamp, Collection<? extends BoundedWindow> windows, PaneInfo pane) {
            outputManager.output(transform.getMainOutputTag(), WindowedValue.of(output, timestamp, windows, pane));
        }

        @Override
        public <AdditionalOutputT> void outputWindowedValue(TupleTag<AdditionalOutputT> tag, AdditionalOutputT output, Instant timestamp, Collection<? extends BoundedWindow> windows, PaneInfo pane) {
            outputManager.output(tag, WindowedValue.of(output, timestamp, windows, pane));
        }
    };
    SideInputReader sideInputReader = evaluationContext.createSideInputReader(transform.getSideInputs());
    processFn.setSideInputReader(sideInputReader);
    processFn.setProcessElementInvoker(new OutputAndTimeBoundedSplittableProcessElementInvoker<>(transform.getFn(), options, outputWindowedValue, sideInputReader, ses, // splittable DoFn's in that respect.
    100, Duration.standardSeconds(1), stepContext::bundleFinalizer));
    return evaluator;
}
Also used : OutputWindowedValue(org.apache.beam.runners.core.OutputWindowedValue) ProcessFn(org.apache.beam.runners.core.SplittableParDoViaKeyedWorkItems.ProcessFn) Instant(org.joda.time.Instant) TupleTag(org.apache.beam.sdk.values.TupleTag) SideInputReader(org.apache.beam.runners.core.SideInputReader) KeyedWorkItem(org.apache.beam.runners.core.KeyedWorkItem) PaneInfo(org.apache.beam.sdk.transforms.windowing.PaneInfo) Collection(java.util.Collection) PCollection(org.apache.beam.sdk.values.PCollection) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) OutputManager(org.apache.beam.runners.core.DoFnRunners.OutputManager)

Example 15 with KeyedWorkItem

use of org.apache.beam.runners.core.KeyedWorkItem in project beam by apache.

the class FlinkStreamingPortablePipelineTranslator method addGBK.

private <K, V> SingleOutputStreamOperator<WindowedValue<KV<K, Iterable<V>>>> addGBK(DataStream<WindowedValue<KV<K, V>>> inputDataStream, WindowingStrategy<?, ?> windowingStrategy, WindowedValueCoder<KV<K, V>> windowedInputCoder, String operatorName, StreamingTranslationContext context) {
    KvCoder<K, V> inputElementCoder = (KvCoder<K, V>) windowedInputCoder.getValueCoder();
    SingletonKeyedWorkItemCoder<K, V> workItemCoder = SingletonKeyedWorkItemCoder.of(inputElementCoder.getKeyCoder(), inputElementCoder.getValueCoder(), windowingStrategy.getWindowFn().windowCoder());
    WindowedValue.FullWindowedValueCoder<KeyedWorkItem<K, V>> windowedWorkItemCoder = WindowedValue.getFullCoder(workItemCoder, windowingStrategy.getWindowFn().windowCoder());
    CoderTypeInformation<WindowedValue<KeyedWorkItem<K, V>>> workItemTypeInfo = new CoderTypeInformation<>(windowedWorkItemCoder, context.getPipelineOptions());
    DataStream<WindowedValue<KeyedWorkItem<K, V>>> workItemStream = inputDataStream.flatMap(new FlinkStreamingTransformTranslators.ToKeyedWorkItem<>(context.getPipelineOptions())).returns(workItemTypeInfo).name("ToKeyedWorkItem");
    WorkItemKeySelector<K, V> keySelector = new WorkItemKeySelector<>(inputElementCoder.getKeyCoder(), new SerializablePipelineOptions(context.getPipelineOptions()));
    KeyedStream<WindowedValue<KeyedWorkItem<K, V>>, ByteBuffer> keyedWorkItemStream = workItemStream.keyBy(keySelector);
    SystemReduceFn<K, V, Iterable<V>, Iterable<V>, BoundedWindow> reduceFn = SystemReduceFn.buffering(inputElementCoder.getValueCoder());
    Coder<Iterable<V>> accumulatorCoder = IterableCoder.of(inputElementCoder.getValueCoder());
    Coder<WindowedValue<KV<K, Iterable<V>>>> outputCoder = WindowedValue.getFullCoder(KvCoder.of(inputElementCoder.getKeyCoder(), accumulatorCoder), windowingStrategy.getWindowFn().windowCoder());
    TypeInformation<WindowedValue<KV<K, Iterable<V>>>> outputTypeInfo = new CoderTypeInformation<>(outputCoder, context.getPipelineOptions());
    TupleTag<KV<K, Iterable<V>>> mainTag = new TupleTag<>("main output");
    WindowDoFnOperator<K, V, Iterable<V>> doFnOperator = new WindowDoFnOperator<>(reduceFn, operatorName, windowedWorkItemCoder, mainTag, Collections.emptyList(), new DoFnOperator.MultiOutputOutputManagerFactory<>(mainTag, outputCoder, new SerializablePipelineOptions(context.getPipelineOptions())), windowingStrategy, new HashMap<>(), /* side-input mapping */
    Collections.emptyList(), /* side inputs */
    context.getPipelineOptions(), inputElementCoder.getKeyCoder(), keySelector);
    return keyedWorkItemStream.transform(operatorName, outputTypeInfo, doFnOperator);
}
Also used : WorkItemKeySelector(org.apache.beam.runners.flink.translation.wrappers.streaming.WorkItemKeySelector) TupleTag(org.apache.beam.sdk.values.TupleTag) WindowDoFnOperator(org.apache.beam.runners.flink.translation.wrappers.streaming.WindowDoFnOperator) DoFnOperator(org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator) ExecutableStageDoFnOperator(org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator) KV(org.apache.beam.sdk.values.KV) WindowedValue(org.apache.beam.sdk.util.WindowedValue) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) SerializablePipelineOptions(org.apache.beam.runners.core.construction.SerializablePipelineOptions) CoderTypeInformation(org.apache.beam.runners.flink.translation.types.CoderTypeInformation) WindowDoFnOperator(org.apache.beam.runners.flink.translation.wrappers.streaming.WindowDoFnOperator) KvCoder(org.apache.beam.sdk.coders.KvCoder) KV(org.apache.beam.sdk.values.KV) KeyedWorkItem(org.apache.beam.runners.core.KeyedWorkItem) ByteBuffer(java.nio.ByteBuffer)

Aggregations

KeyedWorkItem (org.apache.beam.runners.core.KeyedWorkItem)20 TupleTag (org.apache.beam.sdk.values.TupleTag)16 KV (org.apache.beam.sdk.values.KV)15 Instant (org.joda.time.Instant)14 Test (org.junit.Test)13 WindowedValue (org.apache.beam.sdk.util.WindowedValue)11 ListOutputManager (org.apache.beam.runners.dataflow.worker.util.ListOutputManager)9 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)9 InputMessageBundle (org.apache.beam.runners.dataflow.worker.windmill.Windmill.InputMessageBundle)6 WorkItem (org.apache.beam.runners.dataflow.worker.windmill.Windmill.WorkItem)6 BoundedWindow (org.apache.beam.sdk.transforms.windowing.BoundedWindow)4 IntervalWindow (org.apache.beam.sdk.transforms.windowing.IntervalWindow)4 PCollection (org.apache.beam.sdk.values.PCollection)4 OutputManager (org.apache.beam.runners.core.DoFnRunners.OutputManager)3 OutputWindowedValue (org.apache.beam.runners.core.OutputWindowedValue)3 KvCoder (org.apache.beam.sdk.coders.KvCoder)3 PaneInfo (org.apache.beam.sdk.transforms.windowing.PaneInfo)3 Matchers.emptyIterable (org.hamcrest.Matchers.emptyIterable)3 Collection (java.util.Collection)2 List (java.util.List)2