Search in sources :

Example 1 with InMemoryTimerInternals

use of org.apache.beam.runners.core.InMemoryTimerInternals in project beam by apache.

the class FlinkStatefulDoFnFunction method reduce.

@Override
public void reduce(Iterable<WindowedValue<KV<K, V>>> values, Collector<WindowedValue<OutputT>> out) throws Exception {
    RuntimeContext runtimeContext = getRuntimeContext();
    DoFnRunners.OutputManager outputManager;
    if (outputMap.size() == 1) {
        outputManager = new FlinkDoFnFunction.DoFnOutputManager(out);
    } else {
        // it has some additional Outputs
        outputManager = new FlinkDoFnFunction.MultiDoFnOutputManager((Collector) out, outputMap);
    }
    final Iterator<WindowedValue<KV<K, V>>> iterator = values.iterator();
    // get the first value, we need this for initializing the state internals with the key.
    // we are guaranteed to have a first value, otherwise reduce() would not have been called.
    WindowedValue<KV<K, V>> currentValue = iterator.next();
    final K key = currentValue.getValue().getKey();
    final InMemoryStateInternals<K> stateInternals = InMemoryStateInternals.forKey(key);
    // Used with Batch, we know that all the data is available for this key. We can't use the
    // timer manager from the context because it doesn't exist. So we create one and advance
    // time to the end after processing all elements.
    final InMemoryTimerInternals timerInternals = new InMemoryTimerInternals();
    timerInternals.advanceProcessingTime(Instant.now());
    timerInternals.advanceSynchronizedProcessingTime(Instant.now());
    List<TupleTag<?>> additionalOutputTags = Lists.newArrayList(outputMap.keySet());
    DoFnRunner<KV<K, V>, OutputT> doFnRunner = DoFnRunners.simpleRunner(serializedOptions.getPipelineOptions(), dofn, new FlinkSideInputReader(sideInputs, runtimeContext), outputManager, mainOutputTag, additionalOutputTags, new FlinkNoOpStepContext() {

        @Override
        public StateInternals stateInternals() {
            return stateInternals;
        }

        @Override
        public TimerInternals timerInternals() {
            return timerInternals;
        }
    }, windowingStrategy);
    if ((serializedOptions.getPipelineOptions().as(FlinkPipelineOptions.class)).getEnableMetrics()) {
        doFnRunner = new DoFnRunnerWithMetricsUpdate<>(stepName, doFnRunner, getRuntimeContext());
    }
    doFnRunner.startBundle();
    doFnRunner.processElement(currentValue);
    while (iterator.hasNext()) {
        currentValue = iterator.next();
        doFnRunner.processElement(currentValue);
    }
    // Finish any pending windows by advancing the input watermark to infinity.
    timerInternals.advanceInputWatermark(BoundedWindow.TIMESTAMP_MAX_VALUE);
    // Finally, advance the processing time to infinity to fire any timers.
    timerInternals.advanceProcessingTime(BoundedWindow.TIMESTAMP_MAX_VALUE);
    timerInternals.advanceSynchronizedProcessingTime(BoundedWindow.TIMESTAMP_MAX_VALUE);
    fireEligibleTimers(timerInternals, doFnRunner);
    doFnRunner.finishBundle();
}
Also used : DoFnRunners(org.apache.beam.runners.core.DoFnRunners) TupleTag(org.apache.beam.sdk.values.TupleTag) WindowedValue(org.apache.beam.sdk.util.WindowedValue) KV(org.apache.beam.sdk.values.KV) Collector(org.apache.flink.util.Collector) InMemoryTimerInternals(org.apache.beam.runners.core.InMemoryTimerInternals) KV(org.apache.beam.sdk.values.KV) TimerInternals(org.apache.beam.runners.core.TimerInternals) InMemoryTimerInternals(org.apache.beam.runners.core.InMemoryTimerInternals) InMemoryStateInternals(org.apache.beam.runners.core.InMemoryStateInternals) StateInternals(org.apache.beam.runners.core.StateInternals) RuntimeContext(org.apache.flink.api.common.functions.RuntimeContext)

Example 2 with InMemoryTimerInternals

use of org.apache.beam.runners.core.InMemoryTimerInternals in project beam by apache.

the class SparkGroupAlsoByWindowViaOutputBufferFn method call.

@Override
public Iterable<WindowedValue<KV<K, Iterable<InputT>>>> call(WindowedValue<KV<K, Iterable<WindowedValue<InputT>>>> windowedValue) throws Exception {
    K key = windowedValue.getValue().getKey();
    Iterable<WindowedValue<InputT>> values = windowedValue.getValue().getValue();
    //------ based on GroupAlsoByWindowsViaOutputBufferDoFn ------//
    // Used with Batch, we know that all the data is available for this key. We can't use the
    // timer manager from the context because it doesn't exist. So we create one and emulate the
    // watermark, knowing that we have all data and it is in timestamp order.
    InMemoryTimerInternals timerInternals = new InMemoryTimerInternals();
    timerInternals.advanceProcessingTime(Instant.now());
    timerInternals.advanceSynchronizedProcessingTime(Instant.now());
    StateInternals stateInternals = stateInternalsFactory.stateInternalsForKey(key);
    GABWOutputWindowedValue<K, InputT> outputter = new GABWOutputWindowedValue<>();
    ReduceFnRunner<K, InputT, Iterable<InputT>, W> reduceFnRunner = new ReduceFnRunner<>(key, windowingStrategy, ExecutableTriggerStateMachine.create(TriggerStateMachines.stateMachineForTrigger(TriggerTranslation.toProto(windowingStrategy.getTrigger()))), stateInternals, timerInternals, outputter, new UnsupportedSideInputReader("GroupAlsoByWindow"), reduceFn, runtimeContext.getPipelineOptions());
    // Process the grouped values.
    reduceFnRunner.processElements(values);
    // Finish any pending windows by advancing the input watermark to infinity.
    timerInternals.advanceInputWatermark(BoundedWindow.TIMESTAMP_MAX_VALUE);
    // Finally, advance the processing time to infinity to fire any timers.
    timerInternals.advanceProcessingTime(BoundedWindow.TIMESTAMP_MAX_VALUE);
    timerInternals.advanceSynchronizedProcessingTime(BoundedWindow.TIMESTAMP_MAX_VALUE);
    fireEligibleTimers(timerInternals, reduceFnRunner);
    reduceFnRunner.persist();
    return outputter.getOutputs();
}
Also used : ReduceFnRunner(org.apache.beam.runners.core.ReduceFnRunner) InMemoryTimerInternals(org.apache.beam.runners.core.InMemoryTimerInternals) WindowedValue(org.apache.beam.sdk.util.WindowedValue) OutputWindowedValue(org.apache.beam.runners.core.OutputWindowedValue) UnsupportedSideInputReader(org.apache.beam.runners.core.UnsupportedSideInputReader) StateInternals(org.apache.beam.runners.core.StateInternals)

Example 3 with InMemoryTimerInternals

use of org.apache.beam.runners.core.InMemoryTimerInternals in project beam by apache.

the class MultiDoFnFunction method call.

@Override
public Iterable<Tuple2<TupleTag<?>, WindowedValue<?>>> call(Iterator<WindowedValue<InputT>> iter) throws Exception {
    DoFnOutputManager outputManager = new DoFnOutputManager();
    final InMemoryTimerInternals timerInternals;
    final StepContext context;
    // Now only implements the StatefulParDo in Batch mode.
    if (stateful) {
        Object key = null;
        if (iter.hasNext()) {
            WindowedValue<InputT> currentValue = iter.next();
            key = ((KV) currentValue.getValue()).getKey();
            iter = Iterators.concat(Iterators.singletonIterator(currentValue), iter);
        }
        final InMemoryStateInternals<?> stateInternals = InMemoryStateInternals.forKey(key);
        timerInternals = new InMemoryTimerInternals();
        context = new StepContext() {

            @Override
            public StateInternals stateInternals() {
                return stateInternals;
            }

            @Override
            public TimerInternals timerInternals() {
                return timerInternals;
            }
        };
    } else {
        timerInternals = null;
        context = new SparkProcessContext.NoOpStepContext();
    }
    final DoFnRunner<InputT, OutputT> doFnRunner = DoFnRunners.simpleRunner(runtimeContext.getPipelineOptions(), doFn, new SparkSideInputReader(sideInputs), outputManager, mainOutputTag, additionalOutputTags, context, windowingStrategy);
    DoFnRunnerWithMetrics<InputT, OutputT> doFnRunnerWithMetrics = new DoFnRunnerWithMetrics<>(stepName, doFnRunner, metricsAccum);
    return new SparkProcessContext<>(doFn, doFnRunnerWithMetrics, outputManager, stateful ? new TimerDataIterator(timerInternals) : Collections.<TimerInternals.TimerData>emptyIterator()).processPartition(iter);
}
Also used : StepContext(org.apache.beam.runners.core.StepContext) InMemoryTimerInternals(org.apache.beam.runners.core.InMemoryTimerInternals) TimerInternals(org.apache.beam.runners.core.TimerInternals) InMemoryTimerInternals(org.apache.beam.runners.core.InMemoryTimerInternals) InMemoryStateInternals(org.apache.beam.runners.core.InMemoryStateInternals) StateInternals(org.apache.beam.runners.core.StateInternals) SparkSideInputReader(org.apache.beam.runners.spark.util.SparkSideInputReader)

Aggregations

InMemoryTimerInternals (org.apache.beam.runners.core.InMemoryTimerInternals)3 StateInternals (org.apache.beam.runners.core.StateInternals)3 InMemoryStateInternals (org.apache.beam.runners.core.InMemoryStateInternals)2 TimerInternals (org.apache.beam.runners.core.TimerInternals)2 WindowedValue (org.apache.beam.sdk.util.WindowedValue)2 DoFnRunners (org.apache.beam.runners.core.DoFnRunners)1 OutputWindowedValue (org.apache.beam.runners.core.OutputWindowedValue)1 ReduceFnRunner (org.apache.beam.runners.core.ReduceFnRunner)1 StepContext (org.apache.beam.runners.core.StepContext)1 UnsupportedSideInputReader (org.apache.beam.runners.core.UnsupportedSideInputReader)1 SparkSideInputReader (org.apache.beam.runners.spark.util.SparkSideInputReader)1 KV (org.apache.beam.sdk.values.KV)1 TupleTag (org.apache.beam.sdk.values.TupleTag)1 RuntimeContext (org.apache.flink.api.common.functions.RuntimeContext)1 Collector (org.apache.flink.util.Collector)1