Search in sources :

Example 6 with FlinkMetricContainer

use of org.apache.beam.runners.flink.metrics.FlinkMetricContainer in project beam by apache.

the class DoFnOperator method open.

@Override
public void open() throws Exception {
    // WindowDoFnOperator need use state and timer to get DoFn.
    // So must wait StateInternals and TimerInternals ready.
    // This will be called after initializeState()
    this.doFn = getDoFn();
    FlinkPipelineOptions options = serializedOptions.get().as(FlinkPipelineOptions.class);
    doFnInvoker = DoFnInvokers.tryInvokeSetupFor(doFn, options);
    StepContext stepContext = new FlinkStepContext();
    doFnRunner = DoFnRunners.simpleRunner(options, doFn, sideInputReader, outputManager, mainOutputTag, additionalOutputTags, stepContext, getInputCoder(), outputCoders, windowingStrategy, doFnSchemaInformation, sideInputMapping);
    if (requiresStableInput) {
        // put this in front of the root FnRunner before any additional wrappers
        doFnRunner = bufferingDoFnRunner = BufferingDoFnRunner.create(doFnRunner, "stable-input-buffer", windowedInputCoder, windowingStrategy.getWindowFn().windowCoder(), getOperatorStateBackend(), getKeyedStateBackend(), options.getNumConcurrentCheckpoints(), serializedOptions);
    }
    doFnRunner = createWrappingDoFnRunner(doFnRunner, stepContext);
    earlyBindStateIfNeeded();
    if (!options.getDisableMetrics()) {
        flinkMetricContainer = new FlinkMetricContainer(getRuntimeContext());
        doFnRunner = new DoFnRunnerWithMetricsUpdate<>(stepName, doFnRunner, flinkMetricContainer);
        String checkpointMetricNamespace = options.getReportCheckpointDuration();
        if (checkpointMetricNamespace != null) {
            MetricName checkpointMetric = MetricName.named(checkpointMetricNamespace, "checkpoint_duration");
            checkpointStats = new CheckpointStats(() -> flinkMetricContainer.getMetricsContainer(stepName).getDistribution(checkpointMetric));
        }
    }
    elementCount = 0L;
    lastFinishBundleTime = getProcessingTimeService().getCurrentProcessingTime();
    // Schedule timer to check timeout of finish bundle.
    long bundleCheckPeriod = Math.max(maxBundleTimeMills / 2, 1);
    checkFinishBundleTimer = getProcessingTimeService().scheduleAtFixedRate(timestamp -> checkInvokeFinishBundleByTime(), bundleCheckPeriod, bundleCheckPeriod);
    if (doFn instanceof SplittableParDoViaKeyedWorkItems.ProcessFn) {
        pushbackDoFnRunner = new ProcessFnRunner<>((DoFnRunner) doFnRunner, sideInputs, sideInputHandler);
    } else {
        pushbackDoFnRunner = SimplePushbackSideInputDoFnRunner.create(doFnRunner, sideInputs, sideInputHandler);
    }
    bundleFinalizer = new InMemoryBundleFinalizer();
    pendingFinalizations = new LinkedHashMap<>();
}
Also used : MetricName(org.apache.beam.sdk.metrics.MetricName) InternalTimeServiceManager(org.apache.flink.streaming.api.operators.InternalTimeServiceManager) FlinkMetricContainer(org.apache.beam.runners.flink.metrics.FlinkMetricContainer) Joiner(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Joiner) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) TimerInternals(org.apache.beam.runners.core.TimerInternals) DoFnSignatures(org.apache.beam.sdk.transforms.reflect.DoFnSignatures) Map(java.util.Map) InternalTimerService(org.apache.flink.streaming.api.operators.InternalTimerService) GlobalWindow(org.apache.beam.sdk.transforms.windowing.GlobalWindow) OperatorStateBackend(org.apache.flink.runtime.state.OperatorStateBackend) FlinkBroadcastStateInternals(org.apache.beam.runners.flink.translation.wrappers.streaming.state.FlinkBroadcastStateInternals) StateSnapshotContext(org.apache.flink.runtime.state.StateSnapshotContext) InternalTimer(org.apache.flink.streaming.api.operators.InternalTimer) OutputTag(org.apache.flink.util.OutputTag) Serializable(java.io.Serializable) Workarounds(org.apache.beam.runners.flink.translation.utils.Workarounds) Stream(java.util.stream.Stream) StructuredCoder(org.apache.beam.sdk.coders.StructuredCoder) DoFnInvokers(org.apache.beam.sdk.transforms.reflect.DoFnInvokers) OneInputStreamOperator(org.apache.flink.streaming.api.operators.OneInputStreamOperator) StatefulDoFnRunner(org.apache.beam.runners.core.StatefulDoFnRunner) VoidNamespace(org.apache.flink.runtime.state.VoidNamespace) KV(org.apache.beam.sdk.values.KV) PushbackSideInputDoFnRunner(org.apache.beam.runners.core.PushbackSideInputDoFnRunner) BundleFinalizer(org.apache.beam.sdk.transforms.DoFn.BundleFinalizer) MapStateDescriptor(org.apache.flink.api.common.state.MapStateDescriptor) ArrayList(java.util.ArrayList) LinkedHashMap(java.util.LinkedHashMap) InternalPriorityQueue(org.apache.flink.runtime.state.InternalPriorityQueue) CoderTypeSerializer(org.apache.beam.runners.flink.translation.types.CoderTypeSerializer) TupleTag(org.apache.beam.sdk.values.TupleTag) Output(org.apache.flink.streaming.api.operators.Output) StateInternals(org.apache.beam.runners.core.StateInternals) SideInputReader(org.apache.beam.runners.core.SideInputReader) DoFn(org.apache.beam.sdk.transforms.DoFn) TwoInputStreamOperator(org.apache.flink.streaming.api.operators.TwoInputStreamOperator) WindowNamespace(org.apache.beam.runners.core.StateNamespaces.WindowNamespace) NullSideInputReader(org.apache.beam.runners.core.NullSideInputReader) IOException(java.io.IOException) VisibleForTesting(org.apache.flink.annotation.VisibleForTesting) NoopLock(org.apache.beam.sdk.util.NoopLock) Lock(java.util.concurrent.locks.Lock) MapState(org.apache.flink.api.common.state.MapState) PCollectionView(org.apache.beam.sdk.values.PCollectionView) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) VarIntCoder(org.apache.beam.sdk.coders.VarIntCoder) FileSystems(org.apache.beam.sdk.io.FileSystems) TimeDomain(org.apache.beam.sdk.state.TimeDomain) SplittableParDoViaKeyedWorkItems(org.apache.beam.runners.core.SplittableParDoViaKeyedWorkItems) StateSpec(org.apache.beam.sdk.state.StateSpec) ScheduledFuture(java.util.concurrent.ScheduledFuture) StateNamespace(org.apache.beam.runners.core.StateNamespace) SerializablePipelineOptions(org.apache.beam.runners.core.construction.SerializablePipelineOptions) WindowedValue(org.apache.beam.sdk.util.WindowedValue) FlinkPipelineOptions(org.apache.beam.runners.flink.FlinkPipelineOptions) DoFnRunner(org.apache.beam.runners.core.DoFnRunner) CheckpointingMode(org.apache.flink.streaming.api.CheckpointingMode) LoggerFactory(org.slf4j.LoggerFactory) StepContext(org.apache.beam.runners.core.StepContext) StringSerializer(org.apache.flink.api.common.typeutils.base.StringSerializer) DoFnRunners(org.apache.beam.runners.core.DoFnRunners) ByteBuffer(java.nio.ByteBuffer) DoFnSchemaInformation(org.apache.beam.sdk.transforms.DoFnSchemaInformation) ListState(org.apache.flink.api.common.state.ListState) ChainingStrategy(org.apache.flink.streaming.api.operators.ChainingStrategy) CheckpointStats(org.apache.beam.runners.flink.translation.utils.CheckpointStats) Iterables(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables) ListStateDescriptor(org.apache.flink.api.common.state.ListStateDescriptor) DoFnInvoker(org.apache.beam.sdk.transforms.reflect.DoFnInvoker) KeySelector(org.apache.flink.api.java.functions.KeySelector) StreamTask(org.apache.flink.streaming.runtime.tasks.StreamTask) Collection(java.util.Collection) Collectors(java.util.stream.Collectors) List(java.util.List) Preconditions.checkArgument(org.apache.flink.util.Preconditions.checkArgument) Optional(java.util.Optional) SuppressFBWarnings(edu.umd.cs.findbugs.annotations.SuppressFBWarnings) WindowingStrategy(org.apache.beam.sdk.values.WindowingStrategy) StreamConfig(org.apache.flink.streaming.api.graph.StreamConfig) StateAndTimerBundleCheckpointHandler(org.apache.beam.runners.fnexecution.control.BundleCheckpointHandlers.StateAndTimerBundleCheckpointHandler) Coder(org.apache.beam.sdk.coders.Coder) Watermark(org.apache.flink.streaming.api.watermark.Watermark) HashMap(java.util.HashMap) ProcessFnRunner(org.apache.beam.runners.core.ProcessFnRunner) RawUnionValue(org.apache.beam.sdk.transforms.join.RawUnionValue) StreamRecord(org.apache.flink.streaming.runtime.streamrecord.StreamRecord) SideInputHandler(org.apache.beam.runners.core.SideInputHandler) FlinkStateInternals(org.apache.beam.runners.flink.translation.wrappers.streaming.state.FlinkStateInternals) TimerData(org.apache.beam.runners.core.TimerInternals.TimerData) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) Nullable(org.checkerframework.checker.nullness.qual.Nullable) DoFnRunnerWithMetricsUpdate(org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate) OutputStream(java.io.OutputStream) DoFnSignature(org.apache.beam.sdk.transforms.reflect.DoFnSignature) Triggerable(org.apache.flink.streaming.api.operators.Triggerable) Logger(org.slf4j.Logger) Iterator(java.util.Iterator) KeyedStateBackend(org.apache.flink.runtime.state.KeyedStateBackend) SimplePushbackSideInputDoFnRunner(org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunner) InMemoryBundleFinalizer(org.apache.beam.runners.core.InMemoryBundleFinalizer) Preconditions(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions) Instant(org.joda.time.Instant) BufferingDoFnRunner(org.apache.beam.runners.flink.translation.wrappers.streaming.stableinput.BufferingDoFnRunner) InputStream(java.io.InputStream) StateInitializationContext(org.apache.flink.runtime.state.StateInitializationContext) StepContext(org.apache.beam.runners.core.StepContext) CheckpointStats(org.apache.beam.runners.flink.translation.utils.CheckpointStats) FlinkPipelineOptions(org.apache.beam.runners.flink.FlinkPipelineOptions) MetricName(org.apache.beam.sdk.metrics.MetricName) InMemoryBundleFinalizer(org.apache.beam.runners.core.InMemoryBundleFinalizer) StatefulDoFnRunner(org.apache.beam.runners.core.StatefulDoFnRunner) PushbackSideInputDoFnRunner(org.apache.beam.runners.core.PushbackSideInputDoFnRunner) DoFnRunner(org.apache.beam.runners.core.DoFnRunner) SimplePushbackSideInputDoFnRunner(org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunner) BufferingDoFnRunner(org.apache.beam.runners.flink.translation.wrappers.streaming.stableinput.BufferingDoFnRunner) FlinkMetricContainer(org.apache.beam.runners.flink.metrics.FlinkMetricContainer)

Example 7 with FlinkMetricContainer

use of org.apache.beam.runners.flink.metrics.FlinkMetricContainer in project beam by apache.

the class UnboundedSourceWrapper method open.

/**
 * Initialize and restore state before starting execution of the source.
 */
@Override
public void open(Configuration parameters) throws Exception {
    FileSystems.setDefaultPipelineOptions(serializedOptions.get());
    runtimeContext = (StreamingRuntimeContext) getRuntimeContext();
    metricContainer = new FlinkMetricContainer(runtimeContext);
    // figure out which split sources we're responsible for
    int subtaskIndex = runtimeContext.getIndexOfThisSubtask();
    int numSubtasks = runtimeContext.getNumberOfParallelSubtasks();
    localSplitSources = new ArrayList<>();
    localReaders = new ArrayList<>();
    pendingCheckpoints = new LinkedHashMap<>();
    if (isRestored) {
        // restore the splitSources from the checkpoint to ensure consistent ordering
        for (KV<? extends UnboundedSource<OutputT, CheckpointMarkT>, CheckpointMarkT> restored : stateForCheckpoint.get()) {
            localSplitSources.add(restored.getKey());
            localReaders.add(restored.getKey().createReader(serializedOptions.get(), restored.getValue()));
        }
    } else {
        // initialize localReaders and localSources from scratch
        for (int i = 0; i < splitSources.size(); i++) {
            if (i % numSubtasks == subtaskIndex) {
                UnboundedSource<OutputT, CheckpointMarkT> source = splitSources.get(i);
                UnboundedSource.UnboundedReader<OutputT> reader = source.createReader(serializedOptions.get(), null);
                localSplitSources.add(source);
                localReaders.add(reader);
            }
        }
    }
    LOG.info("Unbounded Flink Source {}/{} is reading from sources: {}", subtaskIndex + 1, numSubtasks, localSplitSources);
}
Also used : FlinkMetricContainer(org.apache.beam.runners.flink.metrics.FlinkMetricContainer) UnboundedSource(org.apache.beam.sdk.io.UnboundedSource)

Example 8 with FlinkMetricContainer

use of org.apache.beam.runners.flink.metrics.FlinkMetricContainer in project beam by apache.

the class FlinkDoFnFunction method open.

@Override
public void open(Configuration parameters) {
    // Note that the SerializablePipelineOptions already initialize FileSystems in the readObject()
    // deserialization method. However, this is a hack, and we want to properly initialize the
    // options where they are needed.
    PipelineOptions options = serializedOptions.get();
    FileSystems.setDefaultPipelineOptions(options);
    doFnInvoker = DoFnInvokers.tryInvokeSetupFor(doFn, options);
    metricContainer = new FlinkMetricContainer(getRuntimeContext());
    // setup DoFnRunner
    final RuntimeContext runtimeContext = getRuntimeContext();
    final DoFnRunners.OutputManager outputManager;
    if (outputMap.size() == 1) {
        outputManager = new DoFnOutputManager();
    } else {
        // it has some additional outputs
        outputManager = new MultiDoFnOutputManager(outputMap);
    }
    final List<TupleTag<?>> additionalOutputTags = Lists.newArrayList(outputMap.keySet());
    DoFnRunner<InputT, OutputT> doFnRunner = DoFnRunners.simpleRunner(options, doFn, new FlinkSideInputReader(sideInputs, runtimeContext), outputManager, mainOutputTag, additionalOutputTags, new FlinkNoOpStepContext(), inputCoder, outputCoderMap, windowingStrategy, doFnSchemaInformation, sideInputMapping);
    if (!serializedOptions.get().as(FlinkPipelineOptions.class).getDisableMetrics()) {
        doFnRunner = new DoFnRunnerWithMetricsUpdate<>(stepName, doFnRunner, metricContainer);
    }
    this.collectorAware = (CollectorAware) outputManager;
    this.doFnRunner = doFnRunner;
}
Also used : DoFnRunners(org.apache.beam.runners.core.DoFnRunners) TupleTag(org.apache.beam.sdk.values.TupleTag) SerializablePipelineOptions(org.apache.beam.runners.core.construction.SerializablePipelineOptions) FlinkPipelineOptions(org.apache.beam.runners.flink.FlinkPipelineOptions) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) RuntimeContext(org.apache.flink.api.common.functions.RuntimeContext) FlinkMetricContainer(org.apache.beam.runners.flink.metrics.FlinkMetricContainer)

Example 9 with FlinkMetricContainer

use of org.apache.beam.runners.flink.metrics.FlinkMetricContainer in project beam by apache.

the class FlinkExecutableStageFunction method open.

@Override
public void open(Configuration parameters) {
    FlinkPipelineOptions options = pipelineOptions.get().as(FlinkPipelineOptions.class);
    // Register standard file systems.
    FileSystems.setDefaultPipelineOptions(options);
    executableStage = ExecutableStage.fromPayload(stagePayload);
    runtimeContext = getRuntimeContext();
    metricContainer = new FlinkMetricContainer(runtimeContext);
    // TODO: Wire this into the distributed cache and make it pluggable.
    stageContext = contextFactory.get(jobInfo);
    stageBundleFactory = stageContext.getStageBundleFactory(executableStage);
    // NOTE: It's safe to reuse the state handler between partitions because each partition uses the
    // same backing runtime context and broadcast variables. We use checkState below to catch errors
    // in backward-incompatible Flink changes.
    stateRequestHandler = getStateRequestHandler(executableStage, stageBundleFactory.getProcessBundleDescriptor(), runtimeContext);
    progressHandler = new BundleProgressHandler() {

        @Override
        public void onProgress(ProcessBundleProgressResponse progress) {
            metricContainer.updateMetrics(stepName, progress.getMonitoringInfosList());
        }

        @Override
        public void onCompleted(ProcessBundleResponse response) {
            metricContainer.updateMetrics(stepName, response.getMonitoringInfosList());
        }
    };
    // TODO(BEAM-11021): Support bundle finalization in portable batch.
    finalizationHandler = bundleId -> {
        throw new UnsupportedOperationException("Portable Flink runner doesn't support bundle finalization in batch mode. For more details, please refer to https://issues.apache.org/jira/browse/BEAM-11021.");
    };
    bundleCheckpointHandler = getBundleCheckpointHandler(executableStage);
}
Also used : FlinkPipelineOptions(org.apache.beam.runners.flink.FlinkPipelineOptions) BundleProgressHandler(org.apache.beam.runners.fnexecution.control.BundleProgressHandler) ProcessBundleProgressResponse(org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleProgressResponse) ProcessBundleResponse(org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleResponse) FlinkMetricContainer(org.apache.beam.runners.flink.metrics.FlinkMetricContainer)

Example 10 with FlinkMetricContainer

use of org.apache.beam.runners.flink.metrics.FlinkMetricContainer in project beam by apache.

the class SourceInputFormat method open.

@Override
public void open(SourceInputSplit<T> sourceInputSplit) throws IOException {
    metricContainer = new FlinkMetricContainer(getRuntimeContext());
    readerInvoker = new ReaderInvocationUtil<>(stepName, serializedOptions.get(), metricContainer);
    reader = ((BoundedSource<T>) sourceInputSplit.getSource()).createReader(options);
    inputAvailable = readerInvoker.invokeStart(reader);
}
Also used : FlinkMetricContainer(org.apache.beam.runners.flink.metrics.FlinkMetricContainer)

Aggregations

FlinkMetricContainer (org.apache.beam.runners.flink.metrics.FlinkMetricContainer)12 Test (org.junit.Test)5 FlinkPipelineOptions (org.apache.beam.runners.flink.FlinkPipelineOptions)4 SerializablePipelineOptions (org.apache.beam.runners.core.construction.SerializablePipelineOptions)3 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)3 Configuration (org.apache.flink.configuration.Configuration)3 ArrayList (java.util.ArrayList)2 DoFnRunners (org.apache.beam.runners.core.DoFnRunners)2 WindowedValue (org.apache.beam.sdk.util.WindowedValue)2 TupleTag (org.apache.beam.sdk.values.TupleTag)2 Watermark (org.apache.flink.streaming.api.watermark.Watermark)2 SuppressFBWarnings (edu.umd.cs.findbugs.annotations.SuppressFBWarnings)1 IOException (java.io.IOException)1 InputStream (java.io.InputStream)1 OutputStream (java.io.OutputStream)1 Serializable (java.io.Serializable)1 ByteBuffer (java.nio.ByteBuffer)1 Collection (java.util.Collection)1 HashMap (java.util.HashMap)1 Iterator (java.util.Iterator)1