Search in sources :

Example 26 with Coder

use of org.apache.beam.model.pipeline.v1.RunnerApi.Coder in project beam by apache.

the class FlinkStreamingPortablePipelineTranslator method translateFlatten.

private <T> void translateFlatten(String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext context) {
    RunnerApi.PTransform transform = pipeline.getComponents().getTransformsOrThrow(id);
    Map<String, String> allInputs = transform.getInputsMap();
    if (allInputs.isEmpty()) {
        // create an empty dummy source to satisfy downstream operations
        // we cannot create an empty source in Flink, therefore we have to
        // add the flatMap that simply never forwards the single element
        long shutdownAfterIdleSourcesMs = context.getPipelineOptions().getShutdownSourcesAfterIdleMs();
        DataStreamSource<WindowedValue<byte[]>> dummySource = context.getExecutionEnvironment().addSource(new ImpulseSourceFunction(shutdownAfterIdleSourcesMs));
        DataStream<WindowedValue<T>> result = dummySource.<WindowedValue<T>>flatMap((s, collector) -> {
        // never return anything
        }).returns(new CoderTypeInformation<>(WindowedValue.getFullCoder((Coder<T>) VoidCoder.of(), GlobalWindow.Coder.INSTANCE), context.getPipelineOptions()));
        context.addDataStream(Iterables.getOnlyElement(transform.getOutputsMap().values()), result);
    } else {
        DataStream<T> result = null;
        // Determine DataStreams that we use as input several times. For those, we need to uniquify
        // input streams because Flink seems to swallow watermarks when we have a union of one and
        // the same stream.
        HashMultiset<DataStream<T>> inputCounts = HashMultiset.create();
        for (String input : allInputs.values()) {
            DataStream<T> current = context.getDataStreamOrThrow(input);
            inputCounts.add(current, 1);
        }
        for (String input : allInputs.values()) {
            DataStream<T> current = context.getDataStreamOrThrow(input);
            final int timesRequired = inputCounts.count(current);
            if (timesRequired > 1) {
                current = current.flatMap(new FlatMapFunction<T, T>() {

                    private static final long serialVersionUID = 1L;

                    @Override
                    public void flatMap(T t, Collector<T> collector) {
                        collector.collect(t);
                    }
                });
            }
            result = (result == null) ? current : result.union(current);
        }
        context.addDataStream(Iterables.getOnlyElement(transform.getOutputsMap().values()), result);
    }
}
Also used : SingletonKeyedWorkItemCoder(org.apache.beam.runners.flink.translation.wrappers.streaming.SingletonKeyedWorkItemCoder) WindowedValueCoder(org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder) FlinkExecutableStageContextFactory(org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageContextFactory) CoderUtils(org.apache.beam.sdk.util.CoderUtils) WireCoders(org.apache.beam.runners.fnexecution.wire.WireCoders) UnboundedSource(org.apache.beam.sdk.io.UnboundedSource) PCollectionViews(org.apache.beam.sdk.values.PCollectionViews) SdkHarnessClient(org.apache.beam.runners.fnexecution.control.SdkHarnessClient) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) RunnerPCollectionView(org.apache.beam.runners.core.construction.RunnerPCollectionView) ImmutableSet(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet) TestStreamSource(org.apache.beam.runners.flink.translation.wrappers.streaming.io.TestStreamSource) Map(java.util.Map) TestStreamTranslation(org.apache.beam.runners.core.construction.TestStreamTranslation) GlobalWindow(org.apache.beam.sdk.transforms.windowing.GlobalWindow) JsonNode(com.fasterxml.jackson.databind.JsonNode) CoderTypeInformation(org.apache.beam.runners.flink.translation.types.CoderTypeInformation) KvCoder(org.apache.beam.sdk.coders.KvCoder) PTransformTranslation(org.apache.beam.runners.core.construction.PTransformTranslation) WindowDoFnOperator(org.apache.beam.runners.flink.translation.wrappers.streaming.WindowDoFnOperator) Set(java.util.Set) OutputTag(org.apache.flink.util.OutputTag) ExecutableStage(org.apache.beam.runners.core.construction.graph.ExecutableStage) ExecutableStageTranslation.generateNameFromStagePayload(org.apache.beam.runners.core.construction.ExecutableStageTranslation.generateNameFromStagePayload) FlatMapFunction(org.apache.flink.api.common.functions.FlatMapFunction) CoderException(org.apache.beam.sdk.coders.CoderException) WindowingStrategyTranslation(org.apache.beam.runners.core.construction.WindowingStrategyTranslation) TestStream(org.apache.beam.sdk.testing.TestStream) PipelineTranslatorUtils.instantiateCoder(org.apache.beam.runners.fnexecution.translation.PipelineTranslatorUtils.instantiateCoder) ValueWithRecordId(org.apache.beam.sdk.values.ValueWithRecordId) KV(org.apache.beam.sdk.values.KV) TypeDescriptor(org.apache.beam.sdk.values.TypeDescriptor) ArrayList(java.util.ArrayList) LinkedHashMap(java.util.LinkedHashMap) StreamingImpulseSource(org.apache.beam.runners.flink.translation.wrappers.streaming.io.StreamingImpulseSource) RichMapFunction(org.apache.flink.api.common.functions.RichMapFunction) Collector(org.apache.flink.util.Collector) TupleTag(org.apache.beam.sdk.values.TupleTag) Maps(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps) BiMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.BiMap) InvalidProtocolBufferException(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.InvalidProtocolBufferException) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) QueryablePipeline(org.apache.beam.runners.core.construction.graph.QueryablePipeline) IterableCoder(org.apache.beam.sdk.coders.IterableCoder) SingleOutputStreamOperator(org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator) IOException(java.io.IOException) DedupingOperator(org.apache.beam.runners.flink.translation.wrappers.streaming.io.DedupingOperator) BoundedSource(org.apache.beam.sdk.io.BoundedSource) TreeMap(java.util.TreeMap) PCollectionView(org.apache.beam.sdk.values.PCollectionView) AutoService(com.google.auto.service.AutoService) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) PipelineNode(org.apache.beam.runners.core.construction.graph.PipelineNode) VoidCoder(org.apache.beam.sdk.coders.VoidCoder) UnboundedSourceWrapper(org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper) FileSystems(org.apache.beam.sdk.io.FileSystems) SystemReduceFn(org.apache.beam.runners.core.SystemReduceFn) SerializablePipelineOptions(org.apache.beam.runners.core.construction.SerializablePipelineOptions) WindowedValue(org.apache.beam.sdk.util.WindowedValue) PipelineTranslatorUtils.getWindowingStrategy(org.apache.beam.runners.fnexecution.translation.PipelineTranslatorUtils.getWindowingStrategy) WorkItemKeySelector(org.apache.beam.runners.flink.translation.wrappers.streaming.WorkItemKeySelector) KvToByteBufferKeySelector(org.apache.beam.runners.flink.translation.wrappers.streaming.KvToByteBufferKeySelector) SerializableFunction(org.apache.beam.sdk.transforms.SerializableFunction) RehydratedComponents(org.apache.beam.runners.core.construction.RehydratedComponents) DoFnOperator(org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator) ByteBuffer(java.nio.ByteBuffer) Sets(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Sets) Locale(java.util.Locale) Iterables(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables) JobInfo(org.apache.beam.runners.fnexecution.provisioning.JobInfo) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) KeyedWorkItem(org.apache.beam.runners.core.KeyedWorkItem) KeySelector(org.apache.flink.api.java.functions.KeySelector) TwoInputTransformation(org.apache.flink.streaming.api.transformations.TwoInputTransformation) KeyedStream(org.apache.flink.streaming.api.datastream.KeyedStream) String.format(java.lang.String.format) ModelCoders(org.apache.beam.runners.core.construction.ModelCoders) UnionCoder(org.apache.beam.sdk.transforms.join.UnionCoder) JobExecutionResult(org.apache.flink.api.common.JobExecutionResult) List(java.util.List) TypeDescriptors(org.apache.beam.sdk.values.TypeDescriptors) WindowingStrategy(org.apache.beam.sdk.values.WindowingStrategy) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) HashMultiset(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.HashMultiset) PipelineTranslatorUtils.createOutputMap(org.apache.beam.runners.fnexecution.translation.PipelineTranslatorUtils.createOutputMap) ReadTranslation(org.apache.beam.runners.core.construction.ReadTranslation) ExecutableStageDoFnOperator(org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator) Coder(org.apache.beam.sdk.coders.Coder) HashMap(java.util.HashMap) DataStreamSource(org.apache.flink.streaming.api.datastream.DataStreamSource) RawUnionValue(org.apache.beam.sdk.transforms.join.RawUnionValue) ImpulseSourceFunction(org.apache.beam.runners.flink.translation.functions.ImpulseSourceFunction) ViewFn(org.apache.beam.sdk.transforms.ViewFn) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) SdfByteBufferKeySelector(org.apache.beam.runners.flink.translation.wrappers.streaming.SdfByteBufferKeySelector) NativeTransforms(org.apache.beam.runners.core.construction.NativeTransforms) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper) Configuration(org.apache.flink.configuration.Configuration) Lists(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists) DataStream(org.apache.flink.streaming.api.datastream.DataStream) ByteArrayCoder(org.apache.beam.sdk.coders.ByteArrayCoder) SourceInputFormat(org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat) Collections(java.util.Collections) DataStream(org.apache.flink.streaming.api.datastream.DataStream) ImpulseSourceFunction(org.apache.beam.runners.flink.translation.functions.ImpulseSourceFunction) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) WindowedValue(org.apache.beam.sdk.util.WindowedValue) FlatMapFunction(org.apache.flink.api.common.functions.FlatMapFunction) Collector(org.apache.flink.util.Collector)

Example 27 with Coder

use of org.apache.beam.model.pipeline.v1.RunnerApi.Coder in project beam by apache.

the class FlinkStreamingPortablePipelineTranslator method translateExecutableStage.

private <InputT, OutputT> void translateExecutableStage(String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext context) {
    // TODO: Fail on splittable DoFns.
    // TODO: Special-case single outputs to avoid multiplexing PCollections.
    RunnerApi.Components components = pipeline.getComponents();
    RunnerApi.PTransform transform = components.getTransformsOrThrow(id);
    Map<String, String> outputs = transform.getOutputsMap();
    final RunnerApi.ExecutableStagePayload stagePayload;
    try {
        stagePayload = RunnerApi.ExecutableStagePayload.parseFrom(transform.getSpec().getPayload());
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
    String inputPCollectionId = stagePayload.getInput();
    final TransformedSideInputs transformedSideInputs;
    if (stagePayload.getSideInputsCount() > 0) {
        transformedSideInputs = transformSideInputs(stagePayload, components, context);
    } else {
        transformedSideInputs = new TransformedSideInputs(Collections.emptyMap(), null);
    }
    Map<TupleTag<?>, OutputTag<WindowedValue<?>>> tagsToOutputTags = Maps.newLinkedHashMap();
    Map<TupleTag<?>, Coder<WindowedValue<?>>> tagsToCoders = Maps.newLinkedHashMap();
    // TODO: does it matter which output we designate as "main"
    final TupleTag<OutputT> mainOutputTag = outputs.isEmpty() ? null : new TupleTag(outputs.keySet().iterator().next());
    // associate output tags with ids, output manager uses these Integer ids to serialize state
    BiMap<String, Integer> outputIndexMap = createOutputMap(outputs.keySet());
    Map<String, Coder<WindowedValue<?>>> outputCoders = Maps.newHashMap();
    Map<TupleTag<?>, Integer> tagsToIds = Maps.newHashMap();
    Map<String, TupleTag<?>> collectionIdToTupleTag = Maps.newHashMap();
    // order output names for deterministic mapping
    for (String localOutputName : new TreeMap<>(outputIndexMap).keySet()) {
        String collectionId = outputs.get(localOutputName);
        Coder<WindowedValue<?>> windowCoder = (Coder) instantiateCoder(collectionId, components);
        outputCoders.put(localOutputName, windowCoder);
        TupleTag<?> tupleTag = new TupleTag<>(localOutputName);
        CoderTypeInformation<WindowedValue<?>> typeInformation = new CoderTypeInformation(windowCoder, context.getPipelineOptions());
        tagsToOutputTags.put(tupleTag, new OutputTag<>(localOutputName, typeInformation));
        tagsToCoders.put(tupleTag, windowCoder);
        tagsToIds.put(tupleTag, outputIndexMap.get(localOutputName));
        collectionIdToTupleTag.put(collectionId, tupleTag);
    }
    final SingleOutputStreamOperator<WindowedValue<OutputT>> outputStream;
    DataStream<WindowedValue<InputT>> inputDataStream = context.getDataStreamOrThrow(inputPCollectionId);
    CoderTypeInformation<WindowedValue<OutputT>> outputTypeInformation = !outputs.isEmpty() ? new CoderTypeInformation(outputCoders.get(mainOutputTag.getId()), context.getPipelineOptions()) : null;
    ArrayList<TupleTag<?>> additionalOutputTags = Lists.newArrayList();
    for (TupleTag<?> tupleTag : tagsToCoders.keySet()) {
        if (!mainOutputTag.getId().equals(tupleTag.getId())) {
            additionalOutputTags.add(tupleTag);
        }
    }
    final Coder<WindowedValue<InputT>> windowedInputCoder = instantiateCoder(inputPCollectionId, components);
    final boolean stateful = stagePayload.getUserStatesCount() > 0 || stagePayload.getTimersCount() > 0;
    final boolean hasSdfProcessFn = stagePayload.getComponents().getTransformsMap().values().stream().anyMatch(pTransform -> pTransform.getSpec().getUrn().equals(PTransformTranslation.SPLITTABLE_PROCESS_SIZED_ELEMENTS_AND_RESTRICTIONS_URN));
    Coder keyCoder = null;
    KeySelector<WindowedValue<InputT>, ?> keySelector = null;
    if (stateful || hasSdfProcessFn) {
        // Stateful/SDF stages are only allowed of KV input.
        Coder valueCoder = ((WindowedValue.FullWindowedValueCoder) windowedInputCoder).getValueCoder();
        if (!(valueCoder instanceof KvCoder)) {
            throw new IllegalStateException(String.format(Locale.ENGLISH, "The element coder for stateful DoFn '%s' must be KvCoder but is: %s", inputPCollectionId, valueCoder.getClass().getSimpleName()));
        }
        if (stateful) {
            keyCoder = ((KvCoder) valueCoder).getKeyCoder();
            keySelector = new KvToByteBufferKeySelector(keyCoder, new SerializablePipelineOptions(context.getPipelineOptions()));
        } else {
            // as the key.
            if (!(((KvCoder) valueCoder).getKeyCoder() instanceof KvCoder)) {
                throw new IllegalStateException(String.format(Locale.ENGLISH, "The element coder for splittable DoFn '%s' must be KVCoder(KvCoder, DoubleCoder) but is: %s", inputPCollectionId, valueCoder.getClass().getSimpleName()));
            }
            keyCoder = ((KvCoder) ((KvCoder) valueCoder).getKeyCoder()).getKeyCoder();
            keySelector = new SdfByteBufferKeySelector(keyCoder, new SerializablePipelineOptions(context.getPipelineOptions()));
        }
        inputDataStream = inputDataStream.keyBy(keySelector);
    }
    DoFnOperator.MultiOutputOutputManagerFactory<OutputT> outputManagerFactory = new DoFnOperator.MultiOutputOutputManagerFactory<>(mainOutputTag, tagsToOutputTags, tagsToCoders, tagsToIds, new SerializablePipelineOptions(context.getPipelineOptions()));
    DoFnOperator<InputT, OutputT> doFnOperator = new ExecutableStageDoFnOperator<>(transform.getUniqueName(), windowedInputCoder, Collections.emptyMap(), mainOutputTag, additionalOutputTags, outputManagerFactory, transformedSideInputs.unionTagToView, new ArrayList<>(transformedSideInputs.unionTagToView.values()), getSideInputIdToPCollectionViewMap(stagePayload, components), context.getPipelineOptions(), stagePayload, context.getJobInfo(), FlinkExecutableStageContextFactory.getInstance(), collectionIdToTupleTag, getWindowingStrategy(inputPCollectionId, components), keyCoder, keySelector);
    final String operatorName = generateNameFromStagePayload(stagePayload);
    if (transformedSideInputs.unionTagToView.isEmpty()) {
        outputStream = inputDataStream.transform(operatorName, outputTypeInformation, doFnOperator);
    } else {
        DataStream<RawUnionValue> sideInputStream = transformedSideInputs.unionedSideInputs.broadcast();
        if (stateful || hasSdfProcessFn) {
            // We have to manually construct the two-input transform because we're not
            // allowed to have only one input keyed, normally. Since Flink 1.5.0 it's
            // possible to use the Broadcast State Pattern which provides a more elegant
            // way to process keyed main input with broadcast state, but it's not feasible
            // here because it breaks the DoFnOperator abstraction.
            TwoInputTransformation<WindowedValue<KV<?, InputT>>, RawUnionValue, WindowedValue<OutputT>> rawFlinkTransform = new TwoInputTransformation(inputDataStream.getTransformation(), sideInputStream.getTransformation(), transform.getUniqueName(), doFnOperator, outputTypeInformation, inputDataStream.getParallelism());
            rawFlinkTransform.setStateKeyType(((KeyedStream) inputDataStream).getKeyType());
            rawFlinkTransform.setStateKeySelectors(((KeyedStream) inputDataStream).getKeySelector(), null);
            outputStream = new SingleOutputStreamOperator(inputDataStream.getExecutionEnvironment(), // we have to cheat around the ctor being protected
            rawFlinkTransform) {
            };
        } else {
            outputStream = inputDataStream.connect(sideInputStream).transform(operatorName, outputTypeInformation, doFnOperator);
        }
    }
    // Assign a unique but consistent id to re-map operator state
    outputStream.uid(transform.getUniqueName());
    if (mainOutputTag != null) {
        context.addDataStream(outputs.get(mainOutputTag.getId()), outputStream);
    }
    for (TupleTag<?> tupleTag : additionalOutputTags) {
        context.addDataStream(outputs.get(tupleTag.getId()), outputStream.getSideOutput(tagsToOutputTags.get(tupleTag)));
    }
}
Also used : KvToByteBufferKeySelector(org.apache.beam.runners.flink.translation.wrappers.streaming.KvToByteBufferKeySelector) TupleTag(org.apache.beam.sdk.values.TupleTag) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) WindowedValue(org.apache.beam.sdk.util.WindowedValue) OutputTag(org.apache.flink.util.OutputTag) SerializablePipelineOptions(org.apache.beam.runners.core.construction.SerializablePipelineOptions) RawUnionValue(org.apache.beam.sdk.transforms.join.RawUnionValue) SingleOutputStreamOperator(org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator) WindowDoFnOperator(org.apache.beam.runners.flink.translation.wrappers.streaming.WindowDoFnOperator) DoFnOperator(org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator) ExecutableStageDoFnOperator(org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator) SdfByteBufferKeySelector(org.apache.beam.runners.flink.translation.wrappers.streaming.SdfByteBufferKeySelector) TwoInputTransformation(org.apache.flink.streaming.api.transformations.TwoInputTransformation) ExecutableStageDoFnOperator(org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator) CoderTypeInformation(org.apache.beam.runners.flink.translation.types.CoderTypeInformation) SingletonKeyedWorkItemCoder(org.apache.beam.runners.flink.translation.wrappers.streaming.SingletonKeyedWorkItemCoder) WindowedValueCoder(org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder) KvCoder(org.apache.beam.sdk.coders.KvCoder) PipelineTranslatorUtils.instantiateCoder(org.apache.beam.runners.fnexecution.translation.PipelineTranslatorUtils.instantiateCoder) IterableCoder(org.apache.beam.sdk.coders.IterableCoder) VoidCoder(org.apache.beam.sdk.coders.VoidCoder) UnionCoder(org.apache.beam.sdk.transforms.join.UnionCoder) Coder(org.apache.beam.sdk.coders.Coder) ByteArrayCoder(org.apache.beam.sdk.coders.ByteArrayCoder) KvCoder(org.apache.beam.sdk.coders.KvCoder) IOException(java.io.IOException)

Example 28 with Coder

use of org.apache.beam.model.pipeline.v1.RunnerApi.Coder in project beam by apache.

the class SamzaDoFnRunners method createPortable.

/**
 * Create DoFnRunner for portable runner.
 */
@SuppressWarnings("unchecked")
public static <InT, FnOutT> DoFnRunner<InT, FnOutT> createPortable(String transformId, String bundleStateId, Coder<WindowedValue<InT>> windowedValueCoder, ExecutableStage executableStage, Map<?, PCollectionView<?>> sideInputMapping, SideInputHandler sideInputHandler, SamzaStoreStateInternals.Factory<?> nonKeyedStateInternalsFactory, SamzaTimerInternalsFactory<?> timerInternalsFactory, SamzaPipelineOptions pipelineOptions, DoFnRunners.OutputManager outputManager, StageBundleFactory stageBundleFactory, TupleTag<FnOutT> mainOutputTag, Map<String, TupleTag<?>> idToTupleTagMap, Context context, String transformFullName) {
    // storing events within a bundle in states
    final BagState<WindowedValue<InT>> bundledEventsBag = nonKeyedStateInternalsFactory.stateInternalsForKey(null).state(StateNamespaces.global(), StateTags.bag(bundleStateId, windowedValueCoder));
    final StateRequestHandler stateRequestHandler = SamzaStateRequestHandlers.of(transformId, context.getTaskContext(), pipelineOptions, executableStage, stageBundleFactory, (Map<RunnerApi.ExecutableStagePayload.SideInputId, PCollectionView<?>>) sideInputMapping, sideInputHandler);
    final SamzaExecutionContext executionContext = (SamzaExecutionContext) context.getApplicationContainerContext();
    final DoFnRunner<InT, FnOutT> underlyingRunner = new SdkHarnessDoFnRunner<>(timerInternalsFactory, WindowUtils.getWindowStrategy(executableStage.getInputPCollection().getId(), executableStage.getComponents()), outputManager, stageBundleFactory, idToTupleTagMap, bundledEventsBag, stateRequestHandler);
    return pipelineOptions.getEnableMetrics() ? DoFnRunnerWithMetrics.wrap(underlyingRunner, executionContext.getMetricsContainer(), transformFullName) : underlyingRunner;
}
Also used : SamzaExecutionContext(org.apache.beam.runners.samza.SamzaExecutionContext) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) StateRequestHandler(org.apache.beam.runners.fnexecution.state.StateRequestHandler) PCollectionView(org.apache.beam.sdk.values.PCollectionView) WindowedValue(org.apache.beam.sdk.util.WindowedValue)

Example 29 with Coder

use of org.apache.beam.model.pipeline.v1.RunnerApi.Coder in project beam by apache.

the class DataflowPipelineTranslatorTest method testStreamingSplittableParDoTranslation.

/**
 * Smoke test to fail fast if translation of a splittable ParDo in streaming breaks.
 */
@Test
public void testStreamingSplittableParDoTranslation() throws Exception {
    DataflowPipelineOptions options = buildPipelineOptions();
    DataflowRunner runner = DataflowRunner.fromOptions(options);
    options.setStreaming(true);
    DataflowPipelineTranslator translator = DataflowPipelineTranslator.fromOptions(options);
    Pipeline pipeline = Pipeline.create(options);
    PCollection<String> windowedInput = pipeline.apply(Create.of("a")).apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))));
    windowedInput.apply(ParDo.of(new TestSplittableFn()));
    runner.replaceV1Transforms(pipeline);
    SdkComponents sdkComponents = createSdkComponents(options);
    RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true);
    Job job = translator.translate(pipeline, pipelineProto, sdkComponents, runner, Collections.emptyList()).getJob();
    // The job should contain a SplittableParDo.ProcessKeyedElements step, translated as
    // "SplittableProcessKeyed".
    List<Step> steps = job.getSteps();
    Step processKeyedStep = null;
    for (Step step : steps) {
        if ("SplittableProcessKeyed".equals(step.getKind())) {
            assertNull(processKeyedStep);
            processKeyedStep = step;
        }
    }
    assertNotNull(processKeyedStep);
    @SuppressWarnings({ "unchecked", "rawtypes" }) DoFnInfo<String, Integer> fnInfo = (DoFnInfo<String, Integer>) SerializableUtils.deserializeFromByteArray(jsonStringToByteArray(getString(processKeyedStep.getProperties(), PropertyNames.SERIALIZED_FN)), "DoFnInfo");
    assertThat(fnInfo.getDoFn(), instanceOf(TestSplittableFn.class));
    assertThat(fnInfo.getWindowingStrategy().getWindowFn(), Matchers.<WindowFn>equalTo(FixedWindows.of(Duration.standardMinutes(1))));
    assertThat(fnInfo.getInputCoder(), instanceOf(StringUtf8Coder.class));
    Coder<?> restrictionCoder = CloudObjects.coderFromCloudObject((CloudObject) Structs.getObject(processKeyedStep.getProperties(), PropertyNames.RESTRICTION_CODER));
    assertEquals(KvCoder.of(SerializableCoder.of(OffsetRange.class), VoidCoder.of()), restrictionCoder);
}
Also used : DataflowPipelineOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineOptions) DoFnInfo(org.apache.beam.sdk.util.DoFnInfo) Structs.getString(org.apache.beam.runners.dataflow.util.Structs.getString) ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) Step(com.google.api.services.dataflow.model.Step) SdkComponents(org.apache.beam.runners.core.construction.SdkComponents) Pipeline(org.apache.beam.sdk.Pipeline) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) Job(com.google.api.services.dataflow.model.Job) Test(org.junit.Test)

Example 30 with Coder

use of org.apache.beam.model.pipeline.v1.RunnerApi.Coder in project beam by apache.

the class ProcessBundleHandlerTest method setupProcessBundleHandlerForSimpleRecordingDoFn.

private ProcessBundleHandler setupProcessBundleHandlerForSimpleRecordingDoFn(List<String> dataOutput, List<Timers> timerOutput, boolean enableOutputEmbedding) throws Exception {
    DoFnWithExecutionInformation doFnWithExecutionInformation = DoFnWithExecutionInformation.of(new SimpleDoFn(), SimpleDoFn.MAIN_OUTPUT_TAG, Collections.emptyMap(), DoFnSchemaInformation.create());
    RunnerApi.FunctionSpec functionSpec = RunnerApi.FunctionSpec.newBuilder().setUrn(ParDoTranslation.CUSTOM_JAVA_DO_FN_URN).setPayload(ByteString.copyFrom(SerializableUtils.serializeToByteArray(doFnWithExecutionInformation))).build();
    RunnerApi.ParDoPayload parDoPayload = ParDoPayload.newBuilder().setDoFn(functionSpec).putTimerFamilySpecs("tfs-" + SimpleDoFn.TIMER_FAMILY_ID, TimerFamilySpec.newBuilder().setTimeDomain(RunnerApi.TimeDomain.Enum.EVENT_TIME).setTimerFamilyCoderId("timer-coder").build()).build();
    BeamFnApi.ProcessBundleDescriptor processBundleDescriptor = ProcessBundleDescriptor.newBuilder().putTransforms("2L", PTransform.newBuilder().setSpec(FunctionSpec.newBuilder().setUrn(DATA_INPUT_URN).build()).putOutputs("2L-output", "2L-output-pc").build()).putTransforms("3L", PTransform.newBuilder().setSpec(FunctionSpec.newBuilder().setUrn(PTransformTranslation.PAR_DO_TRANSFORM_URN).setPayload(parDoPayload.toByteString())).putInputs("3L-input", "2L-output-pc").build()).putPcollections("2L-output-pc", PCollection.newBuilder().setWindowingStrategyId("window-strategy").setCoderId("2L-output-coder").setIsBounded(IsBounded.Enum.BOUNDED).build()).putWindowingStrategies("window-strategy", WindowingStrategy.newBuilder().setWindowCoderId("window-strategy-coder").setWindowFn(FunctionSpec.newBuilder().setUrn("beam:window_fn:global_windows:v1")).setOutputTime(OutputTime.Enum.END_OF_WINDOW).setAccumulationMode(AccumulationMode.Enum.ACCUMULATING).setTrigger(Trigger.newBuilder().setAlways(Always.getDefaultInstance())).setClosingBehavior(ClosingBehavior.Enum.EMIT_ALWAYS).setOnTimeBehavior(OnTimeBehavior.Enum.FIRE_ALWAYS).build()).setTimerApiServiceDescriptor(ApiServiceDescriptor.newBuilder().setUrl("url").build()).putCoders("string_coder", CoderTranslation.toProto(StringUtf8Coder.of()).getCoder()).putCoders("2L-output-coder", Coder.newBuilder().setSpec(FunctionSpec.newBuilder().setUrn(ModelCoders.KV_CODER_URN).build()).addComponentCoderIds("string_coder").addComponentCoderIds("string_coder").build()).putCoders("window-strategy-coder", Coder.newBuilder().setSpec(FunctionSpec.newBuilder().setUrn(ModelCoders.GLOBAL_WINDOW_CODER_URN).build()).build()).putCoders("timer-coder", Coder.newBuilder().setSpec(FunctionSpec.newBuilder().setUrn(ModelCoders.TIMER_CODER_URN)).addComponentCoderIds("string_coder").addComponentCoderIds("window-strategy-coder").build()).build();
    Map<String, BeamFnApi.ProcessBundleDescriptor> fnApiRegistry = ImmutableMap.of("1L", processBundleDescriptor);
    Map<String, PTransformRunnerFactory> urnToPTransformRunnerFactoryMap = Maps.newHashMap(REGISTERED_RUNNER_FACTORIES);
    urnToPTransformRunnerFactoryMap.put(DATA_INPUT_URN, (PTransformRunnerFactory<Object>) (context) -> {
        context.addIncomingDataEndpoint(ApiServiceDescriptor.getDefaultInstance(), KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of()), (input) -> {
            dataOutput.add(input.getValue());
        });
        return null;
    });
    Mockito.doAnswer((invocation) -> new BeamFnDataOutboundAggregator(PipelineOptionsFactory.create(), invocation.getArgument(1), new StreamObserver<Elements>() {

        @Override
        public void onNext(Elements elements) {
            for (Timers timer : elements.getTimersList()) {
                timerOutput.addAll(elements.getTimersList());
            }
        }

        @Override
        public void onError(Throwable throwable) {
        }

        @Override
        public void onCompleted() {
        }
    }, invocation.getArgument(2))).when(beamFnDataClient).createOutboundAggregator(any(), any(), anyBoolean());
    return new ProcessBundleHandler(PipelineOptionsFactory.create(), enableOutputEmbedding ? Collections.singleton(BeamUrns.getUrn(StandardRunnerProtocols.Enum.CONTROL_RESPONSE_ELEMENTS_EMBEDDING)) : Collections.emptySet(), fnApiRegistry::get, beamFnDataClient, null, /* beamFnStateClient */
    null, /* finalizeBundleHandler */
    new ShortIdMap(), urnToPTransformRunnerFactoryMap, Caches.noop(), new BundleProcessorCache());
}
Also used : BeamFnDataOutboundAggregator(org.apache.beam.sdk.fn.data.BeamFnDataOutboundAggregator) Elements(org.apache.beam.model.fnexecution.v1.BeamFnApi.Elements) TimerSpecs(org.apache.beam.sdk.state.TimerSpecs) Assert.assertNotSame(org.junit.Assert.assertNotSame) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) MockitoAnnotations(org.mockito.MockitoAnnotations) FunctionSpec(org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec) MetricsContainerStepMap(org.apache.beam.runners.core.metrics.MetricsContainerStepMap) Arrays.asList(java.util.Arrays.asList) Mockito.doAnswer(org.mockito.Mockito.doAnswer) Map(java.util.Map) GlobalWindow(org.apache.beam.sdk.transforms.windowing.GlobalWindow) Uninterruptibles(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles) ApiServiceDescriptor(org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor) ShortIdMap(org.apache.beam.runners.core.metrics.ShortIdMap) KvCoder(org.apache.beam.sdk.coders.KvCoder) PTransformTranslation(org.apache.beam.runners.core.construction.PTransformTranslation) TimerEndpoint(org.apache.beam.sdk.fn.data.TimerEndpoint) Set(java.util.Set) BeamFnApi(org.apache.beam.model.fnexecution.v1.BeamFnApi) Data(org.apache.beam.model.fnexecution.v1.BeamFnApi.Elements.Data) ProcessBundleDescriptor(org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleDescriptor) StandardRunnerProtocols(org.apache.beam.model.pipeline.v1.RunnerApi.StandardRunnerProtocols) Matchers.contains(org.hamcrest.Matchers.contains) PTransformRunnerFactory(org.apache.beam.fn.harness.PTransformRunnerFactory) Matchers.is(org.hamcrest.Matchers.is) Mockito.eq(org.mockito.Mockito.eq) Mockito.mock(org.mockito.Mockito.mock) ClosingBehavior(org.apache.beam.model.pipeline.v1.RunnerApi.ClosingBehavior) KV(org.apache.beam.sdk.values.KV) TimerMap(org.apache.beam.sdk.state.TimerMap) ExecutionStateTracker(org.apache.beam.runners.core.metrics.ExecutionStateTracker) Mock(org.mockito.Mock) BundleFinalizer(org.apache.beam.sdk.transforms.DoFn.BundleFinalizer) RunWith(org.junit.runner.RunWith) TimerFamilySpec(org.apache.beam.model.pipeline.v1.RunnerApi.TimerFamilySpec) ArgumentMatchers.anyBoolean(org.mockito.ArgumentMatchers.anyBoolean) Supplier(java.util.function.Supplier) ArrayList(java.util.ArrayList) Assert.assertSame(org.junit.Assert.assertSame) Timers(org.apache.beam.model.fnexecution.v1.BeamFnApi.Elements.Timers) PCollectionConsumerRegistry(org.apache.beam.fn.harness.data.PCollectionConsumerRegistry) PCollection(org.apache.beam.model.pipeline.v1.RunnerApi.PCollection) TimerSpec(org.apache.beam.sdk.state.TimerSpec) TupleTag(org.apache.beam.sdk.values.TupleTag) Cache(org.apache.beam.fn.harness.Cache) BeamFnDataClient(org.apache.beam.fn.harness.data.BeamFnDataClient) Maps(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) Before(org.junit.Before) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) DoFn(org.apache.beam.sdk.transforms.DoFn) PTransformFunctionRegistry(org.apache.beam.fn.harness.data.PTransformFunctionRegistry) CloseableFnDataReceiver(org.apache.beam.sdk.fn.data.CloseableFnDataReceiver) ProcessBundleRequest(org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleRequest) TimerFamilyDeclaration(org.apache.beam.sdk.transforms.reflect.DoFnSignature.TimerFamilyDeclaration) Assert.assertTrue(org.junit.Assert.assertTrue) Mockito.times(org.mockito.Mockito.times) IOException(java.io.IOException) Test(org.junit.Test) OnTimeBehavior(org.apache.beam.model.pipeline.v1.RunnerApi.OnTimeBehavior) ProgressRequestCallback(org.apache.beam.fn.harness.PTransformRunnerFactory.ProgressRequestCallback) BeamUrns(org.apache.beam.runners.core.construction.BeamUrns) DataEndpoint(org.apache.beam.sdk.fn.data.DataEndpoint) Assert.assertNull(org.junit.Assert.assertNull) AccumulationMode(org.apache.beam.model.pipeline.v1.RunnerApi.AccumulationMode) OutputTime(org.apache.beam.model.pipeline.v1.RunnerApi.OutputTime) Preconditions.checkState(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState) Timer(org.apache.beam.runners.core.construction.Timer) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) TimeDomain(org.apache.beam.sdk.state.TimeDomain) Assert.assertEquals(org.junit.Assert.assertEquals) StateResponse(org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse) Coder(org.apache.beam.model.pipeline.v1.RunnerApi.Coder) BeamFnStateClient(org.apache.beam.fn.harness.state.BeamFnStateClient) CoderTranslation(org.apache.beam.runners.core.construction.CoderTranslation) Mockito.argThat(org.mockito.Mockito.argThat) DoFnSchemaInformation(org.apache.beam.sdk.transforms.DoFnSchemaInformation) BundleProcessorCache(org.apache.beam.fn.harness.control.ProcessBundleHandler.BundleProcessorCache) IsBounded(org.apache.beam.model.pipeline.v1.RunnerApi.IsBounded) Mockito.verifyNoMoreInteractions(org.mockito.Mockito.verifyNoMoreInteractions) Iterables(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables) BundleProcessor(org.apache.beam.fn.harness.control.ProcessBundleHandler.BundleProcessor) REGISTERED_RUNNER_FACTORIES(org.apache.beam.fn.harness.control.ProcessBundleHandler.REGISTERED_RUNNER_FACTORIES) PaneInfo(org.apache.beam.sdk.transforms.windowing.PaneInfo) Collection(java.util.Collection) DoFnWithExecutionInformation(org.apache.beam.sdk.util.DoFnWithExecutionInformation) ModelCoders(org.apache.beam.runners.core.construction.ModelCoders) List(java.util.List) InstructionRequest(org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionRequest) StateRequest(org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest) SerializableUtils(org.apache.beam.sdk.util.SerializableUtils) Matchers.containsInAnyOrder(org.hamcrest.Matchers.containsInAnyOrder) Trigger(org.apache.beam.model.pipeline.v1.RunnerApi.Trigger) Matchers.equalTo(org.hamcrest.Matchers.equalTo) StreamObserver(org.apache.beam.vendor.grpc.v1p43p2.io.grpc.stub.StreamObserver) WindowingStrategy(org.apache.beam.model.pipeline.v1.RunnerApi.WindowingStrategy) ThrowingRunnable(org.apache.beam.sdk.function.ThrowingRunnable) ArgumentMatchers.any(org.mockito.ArgumentMatchers.any) Always(org.apache.beam.model.pipeline.v1.RunnerApi.Trigger.Always) PTransform(org.apache.beam.model.pipeline.v1.RunnerApi.PTransform) Assert.assertThrows(org.junit.Assert.assertThrows) IsEmptyCollection.empty(org.hamcrest.collection.IsEmptyCollection.empty) CompletableFuture(java.util.concurrent.CompletableFuture) BeamFnStateGrpcClientCache(org.apache.beam.fn.harness.state.BeamFnStateGrpcClientCache) PipelineOptionsFactory(org.apache.beam.sdk.options.PipelineOptionsFactory) BeamFnDataReadRunner(org.apache.beam.fn.harness.BeamFnDataReadRunner) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) HashSet(java.util.HashSet) CacheToken(org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleRequest.CacheToken) ParDoPayload(org.apache.beam.model.pipeline.v1.RunnerApi.ParDoPayload) InstructionResponse(org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionResponse) ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) ParDoTranslation(org.apache.beam.runners.core.construction.ParDoTranslation) Assert.assertNotNull(org.junit.Assert.assertNotNull) Mockito.when(org.mockito.Mockito.when) JUnit4(org.junit.runners.JUnit4) Mockito.verify(org.mockito.Mockito.verify) TimeUnit(java.util.concurrent.TimeUnit) Mockito(org.mockito.Mockito) Matchers.emptyIterable(org.hamcrest.Matchers.emptyIterable) Instant(org.joda.time.Instant) Caches(org.apache.beam.fn.harness.Caches) CallbackRegistration(org.apache.beam.fn.harness.control.FinalizeBundleHandler.CallbackRegistration) Collections(java.util.Collections) BeamFnDataOutboundAggregator(org.apache.beam.sdk.fn.data.BeamFnDataOutboundAggregator) StreamObserver(org.apache.beam.vendor.grpc.v1p43p2.io.grpc.stub.StreamObserver) ProcessBundleDescriptor(org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleDescriptor) ParDoPayload(org.apache.beam.model.pipeline.v1.RunnerApi.ParDoPayload) BeamFnApi(org.apache.beam.model.fnexecution.v1.BeamFnApi) ProcessBundleDescriptor(org.apache.beam.model.fnexecution.v1.BeamFnApi.ProcessBundleDescriptor) BundleProcessorCache(org.apache.beam.fn.harness.control.ProcessBundleHandler.BundleProcessorCache) FunctionSpec(org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec) DoFnWithExecutionInformation(org.apache.beam.sdk.util.DoFnWithExecutionInformation) ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) Elements(org.apache.beam.model.fnexecution.v1.BeamFnApi.Elements) ShortIdMap(org.apache.beam.runners.core.metrics.ShortIdMap) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) PTransformRunnerFactory(org.apache.beam.fn.harness.PTransformRunnerFactory) Timers(org.apache.beam.model.fnexecution.v1.BeamFnApi.Elements.Timers)

Aggregations

RunnerApi (org.apache.beam.model.pipeline.v1.RunnerApi)48 Coder (org.apache.beam.sdk.coders.Coder)33 WindowedValue (org.apache.beam.sdk.util.WindowedValue)32 KvCoder (org.apache.beam.sdk.coders.KvCoder)30 Test (org.junit.Test)30 Map (java.util.Map)23 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)23 HashMap (java.util.HashMap)21 KV (org.apache.beam.sdk.values.KV)20 ArrayList (java.util.ArrayList)19 IOException (java.io.IOException)18 StringUtf8Coder (org.apache.beam.sdk.coders.StringUtf8Coder)17 List (java.util.List)16 ExecutableStage (org.apache.beam.runners.core.construction.graph.ExecutableStage)16 BoundedWindow (org.apache.beam.sdk.transforms.windowing.BoundedWindow)15 ImmutableMap (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap)15 Collection (java.util.Collection)13 Pipeline (org.apache.beam.sdk.Pipeline)13 FusedPipeline (org.apache.beam.runners.core.construction.graph.FusedPipeline)12 ConcurrentHashMap (java.util.concurrent.ConcurrentHashMap)11