Search in sources :

Example 76 with Coder

use of org.apache.beam.sdk.coders.Coder in project beam by apache.

the class DoFnOperator method createWrappingDoFnRunner.

// allow overriding this, for example SplittableDoFnOperator will not create a
// stateful DoFn runner because ProcessFn, which is used for executing a Splittable DoFn
// doesn't play by the normal DoFn rules and WindowDoFnOperator uses LateDataDroppingDoFnRunner
protected DoFnRunner<InputT, OutputT> createWrappingDoFnRunner(DoFnRunner<InputT, OutputT> wrappedRunner, StepContext stepContext) {
    if (keyCoder != null) {
        StatefulDoFnRunner.CleanupTimer<InputT> cleanupTimer = new StatefulDoFnRunner.TimeInternalsCleanupTimer<InputT>(timerInternals, windowingStrategy) {

            @Override
            public void setForWindow(InputT input, BoundedWindow window) {
                if (!window.equals(GlobalWindow.INSTANCE) || usesOnWindowExpiration) {
                    // Skip setting a cleanup timer for the global window as these timers
                    // lead to potentially unbounded state growth in the runner, depending on key
                    // cardinality. Cleanup for global window will be performed upon arrival of the
                    // final watermark.
                    // In the case of OnWindowExpiration, we still set the timer.
                    super.setForWindow(input, window);
                }
            }
        };
        // we don't know the window type
        // @SuppressWarnings({"unchecked", "rawtypes"})
        Coder windowCoder = windowingStrategy.getWindowFn().windowCoder();
        @SuppressWarnings({ "unchecked", "rawtypes" }) StatefulDoFnRunner.StateCleaner<?> stateCleaner = new StatefulDoFnRunner.StateInternalsStateCleaner<>(doFn, keyedStateInternals, windowCoder);
        return DoFnRunners.defaultStatefulDoFnRunner(doFn, getInputCoder(), wrappedRunner, stepContext, windowingStrategy, cleanupTimer, stateCleaner, true);
    } else {
        return doFnRunner;
    }
}
Also used : StructuredCoder(org.apache.beam.sdk.coders.StructuredCoder) VarIntCoder(org.apache.beam.sdk.coders.VarIntCoder) Coder(org.apache.beam.sdk.coders.Coder) StatefulDoFnRunner(org.apache.beam.runners.core.StatefulDoFnRunner) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow)

Example 77 with Coder

use of org.apache.beam.sdk.coders.Coder in project beam by apache.

the class DataflowPipelineTranslatorTest method testStreamingGroupIntoBatchesWithShardedKeyTranslationUnifiedWorker.

@Test
public void testStreamingGroupIntoBatchesWithShardedKeyTranslationUnifiedWorker() throws Exception {
    List<String> experiments = new ArrayList<>(ImmutableList.of(GcpOptions.STREAMING_ENGINE_EXPERIMENT, GcpOptions.WINDMILL_SERVICE_EXPERIMENT, "use_runner_v2"));
    JobSpecification jobSpec = runStreamingGroupIntoBatchesAndGetJobSpec(true, experiments);
    List<Step> steps = jobSpec.getJob().getSteps();
    Step shardedStateStep = steps.get(steps.size() - 1);
    Map<String, Object> properties = shardedStateStep.getProperties();
    assertTrue(properties.containsKey(PropertyNames.USES_KEYED_STATE));
    assertTrue(properties.containsKey(PropertyNames.ALLOWS_SHARDABLE_STATE));
    assertEquals("true", getString(properties, PropertyNames.ALLOWS_SHARDABLE_STATE));
    assertTrue(properties.containsKey(PropertyNames.PRESERVES_KEYS));
    assertEquals("true", getString(properties, PropertyNames.PRESERVES_KEYS));
    // Also checks the runner proto is correctly populated.
    Map<String, RunnerApi.PTransform> transformMap = jobSpec.getPipelineProto().getComponents().getTransformsMap();
    boolean transformFound = false;
    for (Map.Entry<String, RunnerApi.PTransform> transform : transformMap.entrySet()) {
        RunnerApi.FunctionSpec spec = transform.getValue().getSpec();
        if (spec.getUrn().equals(PTransformTranslation.GROUP_INTO_BATCHES_WITH_SHARDED_KEY_URN)) {
            for (String subtransform : transform.getValue().getSubtransformsList()) {
                RunnerApi.PTransform ptransform = transformMap.get(subtransform);
                if (ptransform.getSpec().getUrn().equals(PTransformTranslation.GROUP_INTO_BATCHES_URN)) {
                    transformFound = true;
                }
            }
        }
    }
    assertTrue(transformFound);
    boolean coderFound = false;
    Map<String, RunnerApi.Coder> coderMap = jobSpec.getPipelineProto().getComponents().getCodersMap();
    for (Map.Entry<String, RunnerApi.Coder> coder : coderMap.entrySet()) {
        if (coder.getValue().getSpec().getUrn().equals(ModelCoders.SHARDED_KEY_CODER_URN)) {
            coderFound = true;
        }
    }
    assertTrue(coderFound);
}
Also used : SerializableCoder(org.apache.beam.sdk.coders.SerializableCoder) KvCoder(org.apache.beam.sdk.coders.KvCoder) VarIntCoder(org.apache.beam.sdk.coders.VarIntCoder) VoidCoder(org.apache.beam.sdk.coders.VoidCoder) Coder(org.apache.beam.sdk.coders.Coder) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) ArrayList(java.util.ArrayList) Structs.getString(org.apache.beam.runners.dataflow.util.Structs.getString) ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) Step(com.google.api.services.dataflow.model.Step) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) JobSpecification(org.apache.beam.runners.dataflow.DataflowPipelineTranslator.JobSpecification) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) Map(java.util.Map) PTransform(org.apache.beam.sdk.transforms.PTransform) Test(org.junit.Test)

Example 78 with Coder

use of org.apache.beam.sdk.coders.Coder in project beam by apache.

the class CloudObjectTranslators method addComponents.

private static CloudObject addComponents(CloudObject base, List<? extends Coder<?>> components, SdkComponents sdkComponents) {
    if (!components.isEmpty()) {
        List<CloudObject> cloudComponents = new ArrayList<>(components.size());
        for (Coder component : components) {
            cloudComponents.add(CloudObjects.asCloudObject(component, sdkComponents));
        }
        Structs.addList(base, PropertyNames.COMPONENT_ENCODINGS, cloudComponents);
    }
    return base;
}
Also used : CoGbkResultCoder(org.apache.beam.sdk.transforms.join.CoGbkResult.CoGbkResultCoder) Coder(org.apache.beam.sdk.coders.Coder) MapCoder(org.apache.beam.sdk.coders.MapCoder) IntervalWindowCoder(org.apache.beam.sdk.transforms.windowing.IntervalWindow.IntervalWindowCoder) LengthPrefixCoder(org.apache.beam.sdk.coders.LengthPrefixCoder) CustomCoder(org.apache.beam.sdk.coders.CustomCoder) KvCoder(org.apache.beam.sdk.coders.KvCoder) IterableCoder(org.apache.beam.sdk.coders.IterableCoder) FullWindowedValueCoder(org.apache.beam.sdk.util.WindowedValue.FullWindowedValueCoder) VarLongCoder(org.apache.beam.sdk.coders.VarLongCoder) UnionCoder(org.apache.beam.sdk.transforms.join.UnionCoder) ByteArrayCoder(org.apache.beam.sdk.coders.ByteArrayCoder) IterableLikeCoder(org.apache.beam.sdk.coders.IterableLikeCoder) NullableCoder(org.apache.beam.sdk.coders.NullableCoder) TimestampPrefixingWindowCoder(org.apache.beam.sdk.coders.TimestampPrefixingWindowCoder) ArrayList(java.util.ArrayList)

Example 79 with Coder

use of org.apache.beam.sdk.coders.Coder in project beam by apache.

the class DirectRunnerTest method byteArrayCountShouldSucceed.

@Test
public void byteArrayCountShouldSucceed() {
    Pipeline p = getPipeline();
    SerializableFunction<Integer, byte[]> getBytes = input -> {
        try {
            return CoderUtils.encodeToByteArray(VarIntCoder.of(), input);
        } catch (CoderException e) {
            fail("Unexpected Coder Exception " + e);
            throw new AssertionError("Unreachable");
        }
    };
    TypeDescriptor<byte[]> td = new TypeDescriptor<byte[]>() {
    };
    PCollection<byte[]> foos = p.apply(Create.of(1, 1, 1, 2, 2, 3)).apply(MapElements.into(td).via(getBytes));
    PCollection<byte[]> msync = p.apply(Create.of(1, -2, -8, -16)).apply(MapElements.into(td).via(getBytes));
    PCollection<byte[]> bytes = PCollectionList.of(foos).and(msync).apply(Flatten.pCollections());
    PCollection<KV<byte[], Long>> counts = bytes.apply(Count.perElement());
    PCollection<KV<Integer, Long>> countsBackToString = counts.apply(MapElements.via(new SimpleFunction<KV<byte[], Long>, KV<Integer, Long>>() {

        @Override
        public KV<Integer, Long> apply(KV<byte[], Long> input) {
            try {
                return KV.of(CoderUtils.decodeFromByteArray(VarIntCoder.of(), input.getKey()), input.getValue());
            } catch (CoderException e) {
                fail("Unexpected Coder Exception " + e);
                throw new AssertionError("Unreachable");
            }
        }
    }));
    Map<Integer, Long> expected = ImmutableMap.<Integer, Long>builder().put(1, 4L).put(2, 2L).put(3, 1L).put(-2, 1L).put(-8, 1L).put(-16, 1L).build();
    PAssert.thatMap(countsBackToString).isEqualTo(expected);
}
Also used : Count(org.apache.beam.sdk.transforms.Count) Arrays(java.util.Arrays) SerializableCoder(org.apache.beam.sdk.coders.SerializableCoder) PBegin(org.apache.beam.sdk.values.PBegin) Matchers.isA(org.hamcrest.Matchers.isA) CoderUtils(org.apache.beam.sdk.util.CoderUtils) PipelineResult(org.apache.beam.sdk.PipelineResult) UnboundedSource(org.apache.beam.sdk.io.UnboundedSource) ListCoder(org.apache.beam.sdk.coders.ListCoder) SerializableFunction(org.apache.beam.sdk.transforms.SerializableFunction) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) SimpleFunction(org.apache.beam.sdk.transforms.SimpleFunction) ThrowableMessageMatcher(org.junit.internal.matchers.ThrowableMessageMatcher) Future(java.util.concurrent.Future) DirectPipelineResult(org.apache.beam.runners.direct.DirectRunner.DirectPipelineResult) PCollectionList(org.apache.beam.sdk.values.PCollectionList) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) Create(org.apache.beam.sdk.transforms.Create) Map(java.util.Map) Window(org.apache.beam.sdk.transforms.windowing.Window) Assert.fail(org.junit.Assert.fail) Flatten(org.apache.beam.sdk.transforms.Flatten) MapElements(org.apache.beam.sdk.transforms.MapElements) ConcurrentHashMap(java.util.concurrent.ConcurrentHashMap) Sum(org.apache.beam.sdk.transforms.Sum) BlockingQueue(java.util.concurrent.BlockingQueue) GenerateSequence(org.apache.beam.sdk.io.GenerateSequence) VarLongCoder(org.apache.beam.sdk.coders.VarLongCoder) Executors(java.util.concurrent.Executors) Serializable(java.io.Serializable) ArrayBlockingQueue(java.util.concurrent.ArrayBlockingQueue) CoderException(org.apache.beam.sdk.coders.CoderException) List(java.util.List) ParDo(org.apache.beam.sdk.transforms.ParDo) Matchers.equalTo(org.hamcrest.Matchers.equalTo) TypeDescriptors(org.apache.beam.sdk.values.TypeDescriptors) Optional(java.util.Optional) State(org.apache.beam.sdk.PipelineResult.State) Matchers.greaterThan(org.hamcrest.Matchers.greaterThan) Matchers.is(org.hamcrest.Matchers.is) GlobalWindows(org.apache.beam.sdk.transforms.windowing.GlobalWindows) KV(org.apache.beam.sdk.values.KV) TypeDescriptor(org.apache.beam.sdk.values.TypeDescriptor) AfterWatermark(org.apache.beam.sdk.transforms.windowing.AfterWatermark) Default(org.apache.beam.sdk.options.Default) Duration(org.joda.time.Duration) RunWith(org.junit.runner.RunWith) Coder(org.apache.beam.sdk.coders.Coder) Callable(java.util.concurrent.Callable) PipelineOptionsFactory(org.apache.beam.sdk.options.PipelineOptionsFactory) PTransform(org.apache.beam.sdk.transforms.PTransform) Read(org.apache.beam.sdk.io.Read) PipelineRunner(org.apache.beam.sdk.PipelineRunner) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) JsonIgnore(com.fasterxml.jackson.annotation.JsonIgnore) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) Pipeline(org.apache.beam.sdk.Pipeline) NoSuchElementException(java.util.NoSuchElementException) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) ExpectedException(org.junit.rules.ExpectedException) ExecutorService(java.util.concurrent.ExecutorService) Nullable(org.checkerframework.checker.nullness.qual.Nullable) OutputStream(java.io.OutputStream) DoFn(org.apache.beam.sdk.transforms.DoFn) DisplayData(org.apache.beam.sdk.transforms.display.DisplayData) CountingSource(org.apache.beam.sdk.io.CountingSource) PDone(org.apache.beam.sdk.values.PDone) PAssert(org.apache.beam.sdk.testing.PAssert) IllegalMutationException(org.apache.beam.sdk.util.IllegalMutationException) Matchers(org.hamcrest.Matchers) IOException(java.io.IOException) Test(org.junit.Test) JUnit4(org.junit.runners.JUnit4) PCollection(org.apache.beam.sdk.values.PCollection) AtomicLong(java.util.concurrent.atomic.AtomicLong) BoundedSource(org.apache.beam.sdk.io.BoundedSource) Rule(org.junit.Rule) Preconditions.checkState(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) Preconditions(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions) Instant(org.joda.time.Instant) AtomicCoder(org.apache.beam.sdk.coders.AtomicCoder) VarIntCoder(org.apache.beam.sdk.coders.VarIntCoder) Assert.assertEquals(org.junit.Assert.assertEquals) InputStream(java.io.InputStream) KV(org.apache.beam.sdk.values.KV) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Pipeline(org.apache.beam.sdk.Pipeline) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) TypeDescriptor(org.apache.beam.sdk.values.TypeDescriptor) SimpleFunction(org.apache.beam.sdk.transforms.SimpleFunction) AtomicLong(java.util.concurrent.atomic.AtomicLong) CoderException(org.apache.beam.sdk.coders.CoderException) Test(org.junit.Test)

Example 80 with Coder

use of org.apache.beam.sdk.coders.Coder in project beam by apache.

the class ParDoEvaluator method create.

public static <InputT, OutputT> ParDoEvaluator<InputT> create(EvaluationContext evaluationContext, PipelineOptions options, DirectStepContext stepContext, AppliedPTransform<?, ?, ?> application, Coder<InputT> inputCoder, WindowingStrategy<?, ? extends BoundedWindow> windowingStrategy, DoFn<InputT, OutputT> fn, StructuralKey<?> key, List<PCollectionView<?>> sideInputs, TupleTag<OutputT> mainOutputTag, List<TupleTag<?>> additionalOutputTags, Map<TupleTag<?>, PCollection<?>> outputs, DoFnSchemaInformation doFnSchemaInformation, Map<String, PCollectionView<?>> sideInputMapping, DoFnRunnerFactory<InputT, OutputT> runnerFactory) {
    BundleOutputManager outputManager = createOutputManager(evaluationContext, key, outputs);
    ReadyCheckingSideInputReader sideInputReader = evaluationContext.createSideInputReader(sideInputs);
    Map<TupleTag<?>, Coder<?>> outputCoders = outputs.entrySet().stream().collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue().getCoder()));
    PushbackSideInputDoFnRunner<InputT, OutputT> runner = runnerFactory.createRunner(options, fn, sideInputs, sideInputReader, outputManager, mainOutputTag, additionalOutputTags, stepContext, inputCoder, outputCoders, windowingStrategy, doFnSchemaInformation, sideInputMapping);
    return create(runner, stepContext, application, outputManager);
}
Also used : StatefulDoFnRunner(org.apache.beam.runners.core.StatefulDoFnRunner) UserCodeException(org.apache.beam.sdk.util.UserCodeException) PushbackSideInputDoFnRunner(org.apache.beam.runners.core.PushbackSideInputDoFnRunner) WindowedValue(org.apache.beam.sdk.util.WindowedValue) KeyedWorkItemCoder(org.apache.beam.runners.core.KeyedWorkItemCoder) DoFnRunner(org.apache.beam.runners.core.DoFnRunner) Coder(org.apache.beam.sdk.coders.Coder) HashMap(java.util.HashMap) DoFnRunners(org.apache.beam.runners.core.DoFnRunners) DoFnSchemaInformation(org.apache.beam.sdk.transforms.DoFnSchemaInformation) DoFnSignatures(org.apache.beam.sdk.transforms.reflect.DoFnSignatures) TupleTag(org.apache.beam.sdk.values.TupleTag) Map(java.util.Map) Preconditions.checkArgument(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument) TimerData(org.apache.beam.runners.core.TimerInternals.TimerData) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform) StructuralKey(org.apache.beam.runners.local.StructuralKey) DoFn(org.apache.beam.sdk.transforms.DoFn) ReadyCheckingSideInputReader(org.apache.beam.runners.core.ReadyCheckingSideInputReader) PCollection(org.apache.beam.sdk.values.PCollection) Collectors(java.util.stream.Collectors) List(java.util.List) DirectStepContext(org.apache.beam.runners.direct.DirectExecutionContext.DirectStepContext) SimplePushbackSideInputDoFnRunner(org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunner) PCollectionView(org.apache.beam.sdk.values.PCollectionView) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) OutputManager(org.apache.beam.runners.core.DoFnRunners.OutputManager) ImmutableList(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList) WindowingStrategy(org.apache.beam.sdk.values.WindowingStrategy) KeyedWorkItemCoder(org.apache.beam.runners.core.KeyedWorkItemCoder) Coder(org.apache.beam.sdk.coders.Coder) TupleTag(org.apache.beam.sdk.values.TupleTag) ReadyCheckingSideInputReader(org.apache.beam.runners.core.ReadyCheckingSideInputReader)

Aggregations

Coder (org.apache.beam.sdk.coders.Coder)119 KvCoder (org.apache.beam.sdk.coders.KvCoder)75 WindowedValue (org.apache.beam.sdk.util.WindowedValue)55 StringUtf8Coder (org.apache.beam.sdk.coders.StringUtf8Coder)44 Test (org.junit.Test)43 HashMap (java.util.HashMap)42 ArrayList (java.util.ArrayList)38 Map (java.util.Map)36 BoundedWindow (org.apache.beam.sdk.transforms.windowing.BoundedWindow)35 List (java.util.List)32 KV (org.apache.beam.sdk.values.KV)30 RunnerApi (org.apache.beam.model.pipeline.v1.RunnerApi)28 IterableCoder (org.apache.beam.sdk.coders.IterableCoder)28 PCollection (org.apache.beam.sdk.values.PCollection)28 TupleTag (org.apache.beam.sdk.values.TupleTag)24 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)23 IOException (java.io.IOException)22 PCollectionView (org.apache.beam.sdk.values.PCollectionView)22 Instant (org.joda.time.Instant)21 WindowingStrategy (org.apache.beam.sdk.values.WindowingStrategy)20