Search in sources :

Example 1 with ExecutableStageContext

use of org.apache.beam.runners.fnexecution.control.ExecutableStageContext in project beam by apache.

the class SparkExecutableStageFunction method call.

@Override
public Iterator<RawUnionValue> call(Iterator<WindowedValue<InputT>> inputs) throws Exception {
    SparkPipelineOptions options = pipelineOptions.get().as(SparkPipelineOptions.class);
    // Register standard file systems.
    FileSystems.setDefaultPipelineOptions(options);
    // Otherwise, this may cause validation errors (e.g. ParDoTest)
    if (!inputs.hasNext()) {
        return Collections.emptyIterator();
    }
    try (ExecutableStageContext stageContext = contextFactory.get(jobInfo)) {
        ExecutableStage executableStage = ExecutableStage.fromPayload(stagePayload);
        try (StageBundleFactory stageBundleFactory = stageContext.getStageBundleFactory(executableStage)) {
            ConcurrentLinkedQueue<RawUnionValue> collector = new ConcurrentLinkedQueue<>();
            StateRequestHandler stateRequestHandler = getStateRequestHandler(executableStage, stageBundleFactory.getProcessBundleDescriptor());
            if (executableStage.getTimers().size() == 0) {
                ReceiverFactory receiverFactory = new ReceiverFactory(collector, outputMap);
                processElements(stateRequestHandler, receiverFactory, null, stageBundleFactory, inputs);
                return collector.iterator();
            }
            // Used with Batch, we know that all the data is available for this key. We can't use the
            // timer manager from the context because it doesn't exist. So we create one and advance
            // time to the end after processing all elements.
            final InMemoryTimerInternals timerInternals = new InMemoryTimerInternals();
            timerInternals.advanceProcessingTime(Instant.now());
            timerInternals.advanceSynchronizedProcessingTime(Instant.now());
            ReceiverFactory receiverFactory = new ReceiverFactory(collector, outputMap);
            TimerReceiverFactory timerReceiverFactory = new TimerReceiverFactory(stageBundleFactory, (Timer<?> timer, TimerInternals.TimerData timerData) -> {
                currentTimerKey = timer.getUserKey();
                if (timer.getClearBit()) {
                    timerInternals.deleteTimer(timerData);
                } else {
                    timerInternals.setTimer(timerData);
                }
            }, windowCoder);
            // Process inputs.
            processElements(stateRequestHandler, receiverFactory, timerReceiverFactory, stageBundleFactory, inputs);
            // Finish any pending windows by advancing the input watermark to infinity.
            timerInternals.advanceInputWatermark(BoundedWindow.TIMESTAMP_MAX_VALUE);
            // Finally, advance the processing time to infinity to fire any timers.
            timerInternals.advanceProcessingTime(BoundedWindow.TIMESTAMP_MAX_VALUE);
            timerInternals.advanceSynchronizedProcessingTime(BoundedWindow.TIMESTAMP_MAX_VALUE);
            // itself)
            while (timerInternals.hasPendingTimers()) {
                try (RemoteBundle bundle = stageBundleFactory.getBundle(receiverFactory, timerReceiverFactory, stateRequestHandler, getBundleProgressHandler())) {
                    PipelineTranslatorUtils.fireEligibleTimers(timerInternals, bundle.getTimerReceivers(), currentTimerKey);
                }
            }
            return collector.iterator();
        }
    }
}
Also used : TimerReceiverFactory(org.apache.beam.runners.fnexecution.control.TimerReceiverFactory) OutputReceiverFactory(org.apache.beam.runners.fnexecution.control.OutputReceiverFactory) StateRequestHandler(org.apache.beam.runners.fnexecution.state.StateRequestHandler) RawUnionValue(org.apache.beam.sdk.transforms.join.RawUnionValue) InMemoryTimerInternals(org.apache.beam.runners.core.InMemoryTimerInternals) SparkPipelineOptions(org.apache.beam.runners.spark.SparkPipelineOptions) StageBundleFactory(org.apache.beam.runners.fnexecution.control.StageBundleFactory) Timer(org.apache.beam.runners.core.construction.Timer) ExecutableStageContext(org.apache.beam.runners.fnexecution.control.ExecutableStageContext) TimerReceiverFactory(org.apache.beam.runners.fnexecution.control.TimerReceiverFactory) ExecutableStage(org.apache.beam.runners.core.construction.graph.ExecutableStage) ConcurrentLinkedQueue(java.util.concurrent.ConcurrentLinkedQueue) RemoteBundle(org.apache.beam.runners.fnexecution.control.RemoteBundle)

Aggregations

ConcurrentLinkedQueue (java.util.concurrent.ConcurrentLinkedQueue)1 InMemoryTimerInternals (org.apache.beam.runners.core.InMemoryTimerInternals)1 Timer (org.apache.beam.runners.core.construction.Timer)1 ExecutableStage (org.apache.beam.runners.core.construction.graph.ExecutableStage)1 ExecutableStageContext (org.apache.beam.runners.fnexecution.control.ExecutableStageContext)1 OutputReceiverFactory (org.apache.beam.runners.fnexecution.control.OutputReceiverFactory)1 RemoteBundle (org.apache.beam.runners.fnexecution.control.RemoteBundle)1 StageBundleFactory (org.apache.beam.runners.fnexecution.control.StageBundleFactory)1 TimerReceiverFactory (org.apache.beam.runners.fnexecution.control.TimerReceiverFactory)1 StateRequestHandler (org.apache.beam.runners.fnexecution.state.StateRequestHandler)1 SparkPipelineOptions (org.apache.beam.runners.spark.SparkPipelineOptions)1 RawUnionValue (org.apache.beam.sdk.transforms.join.RawUnionValue)1