Search in sources :

Example 6 with PValue

use of org.apache.beam.sdk.values.PValue in project beam by apache.

the class ReplacementOutputsTest method singletonSucceeds.

@Test
public void singletonSucceeds() {
    Map<PValue, ReplacementOutput> replacements = ReplacementOutputs.singleton(ints.expand(), replacementInts);
    assertThat(replacements, Matchers.<PValue>hasKey(replacementInts));
    ReplacementOutput replacement = replacements.get(replacementInts);
    Map.Entry<TupleTag<?>, PValue> taggedInts = Iterables.getOnlyElement(ints.expand().entrySet());
    assertThat(replacement.getOriginal().getTag(), Matchers.<TupleTag<?>>equalTo(taggedInts.getKey()));
    assertThat(replacement.getOriginal().getValue(), equalTo(taggedInts.getValue()));
    assertThat(replacement.getReplacement().getValue(), Matchers.<PValue>equalTo(replacementInts));
}
Also used : ReplacementOutput(org.apache.beam.sdk.runners.PTransformOverrideFactory.ReplacementOutput) TupleTag(org.apache.beam.sdk.values.TupleTag) PValue(org.apache.beam.sdk.values.PValue) TaggedPValue(org.apache.beam.sdk.values.TaggedPValue) ImmutableMap(com.google.common.collect.ImmutableMap) Map(java.util.Map) Test(org.junit.Test)

Example 7 with PValue

use of org.apache.beam.sdk.values.PValue in project beam by apache.

the class SdkComponentsTest method translatePipeline.

@Test
public void translatePipeline() {
    BigEndianLongCoder customCoder = BigEndianLongCoder.of();
    PCollection<Long> elems = pipeline.apply(GenerateSequence.from(0L).to(207L));
    PCollection<Long> counted = elems.apply(Count.<Long>globally()).setCoder(customCoder);
    PCollection<Long> windowed = counted.apply(Window.<Long>into(FixedWindows.of(Duration.standardMinutes(7))).triggering(AfterWatermark.pastEndOfWindow().withEarlyFirings(AfterPane.elementCountAtLeast(19))).accumulatingFiredPanes().withAllowedLateness(Duration.standardMinutes(3L)));
    final WindowingStrategy<?, ?> windowedStrategy = windowed.getWindowingStrategy();
    PCollection<KV<String, Long>> keyed = windowed.apply(WithKeys.<String, Long>of("foo"));
    PCollection<KV<String, Iterable<Long>>> grouped = keyed.apply(GroupByKey.<String, Long>create());
    final RunnerApi.Pipeline pipelineProto = SdkComponents.translatePipeline(pipeline);
    pipeline.traverseTopologically(new PipelineVisitor.Defaults() {

        Set<Node> transforms = new HashSet<>();

        Set<PCollection<?>> pcollections = new HashSet<>();

        Set<Equivalence.Wrapper<? extends Coder<?>>> coders = new HashSet<>();

        Set<WindowingStrategy<?, ?>> windowingStrategies = new HashSet<>();

        @Override
        public void leaveCompositeTransform(Node node) {
            if (node.isRootNode()) {
                assertThat("Unexpected number of PTransforms", pipelineProto.getComponents().getTransformsCount(), equalTo(transforms.size()));
                assertThat("Unexpected number of PCollections", pipelineProto.getComponents().getPcollectionsCount(), equalTo(pcollections.size()));
                assertThat("Unexpected number of Coders", pipelineProto.getComponents().getCodersCount(), equalTo(coders.size()));
                assertThat("Unexpected number of Windowing Strategies", pipelineProto.getComponents().getWindowingStrategiesCount(), equalTo(windowingStrategies.size()));
            } else {
                transforms.add(node);
            }
        }

        @Override
        public void visitPrimitiveTransform(Node node) {
            transforms.add(node);
        }

        @Override
        public void visitValue(PValue value, Node producer) {
            if (value instanceof PCollection) {
                PCollection pc = (PCollection) value;
                pcollections.add(pc);
                addCoders(pc.getCoder());
                windowingStrategies.add(pc.getWindowingStrategy());
                addCoders(pc.getWindowingStrategy().getWindowFn().windowCoder());
            }
        }

        private void addCoders(Coder<?> coder) {
            coders.add(Equivalence.<Coder<?>>identity().wrap(coder));
            if (coder instanceof StructuredCoder) {
                for (Coder<?> component : ((StructuredCoder<?>) coder).getComponents()) {
                    addCoders(component);
                }
            }
        }
    });
}
Also used : Node(org.apache.beam.sdk.runners.TransformHierarchy.Node) WindowingStrategy(org.apache.beam.sdk.values.WindowingStrategy) RunnerApi(org.apache.beam.sdk.common.runner.v1.RunnerApi) PipelineVisitor(org.apache.beam.sdk.Pipeline.PipelineVisitor) BigEndianLongCoder(org.apache.beam.sdk.coders.BigEndianLongCoder) HashSet(java.util.HashSet) Coder(org.apache.beam.sdk.coders.Coder) SetCoder(org.apache.beam.sdk.coders.SetCoder) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) KvCoder(org.apache.beam.sdk.coders.KvCoder) BigEndianLongCoder(org.apache.beam.sdk.coders.BigEndianLongCoder) IterableCoder(org.apache.beam.sdk.coders.IterableCoder) VarLongCoder(org.apache.beam.sdk.coders.VarLongCoder) StructuredCoder(org.apache.beam.sdk.coders.StructuredCoder) ByteArrayCoder(org.apache.beam.sdk.coders.ByteArrayCoder) KV(org.apache.beam.sdk.values.KV) PValue(org.apache.beam.sdk.values.PValue) PCollection(org.apache.beam.sdk.values.PCollection) StructuredCoder(org.apache.beam.sdk.coders.StructuredCoder) Test(org.junit.Test)

Example 8 with PValue

use of org.apache.beam.sdk.values.PValue in project beam by apache.

the class PTransformTranslationTest method multiMultiParDo.

private static AppliedPTransform<?, ?, ?> multiMultiParDo(Pipeline pipeline) {
    PCollectionView<String> view = pipeline.apply(Create.of("foo")).apply(View.<String>asSingleton());
    PCollection<Long> input = pipeline.apply(GenerateSequence.from(0));
    ParDo.MultiOutput<Long, KV<Long, String>> parDo = ParDo.of(new TestDoFn()).withSideInputs(view).withOutputTags(new TupleTag<KV<Long, String>>() {
    }, TupleTagList.of(new TupleTag<KV<String, Long>>() {
    }));
    PCollectionTuple output = input.apply(parDo);
    Map<TupleTag<?>, PValue> inputs = new HashMap<>();
    inputs.putAll(parDo.getAdditionalInputs());
    inputs.putAll(input.expand());
    return AppliedPTransform.<PCollection<Long>, PCollectionTuple, ParDo.MultiOutput<Long, KV<Long, String>>>of("MultiParDoInAndOut", inputs, output.expand(), parDo, pipeline);
}
Also used : HashMap(java.util.HashMap) TupleTag(org.apache.beam.sdk.values.TupleTag) KV(org.apache.beam.sdk.values.KV) PValue(org.apache.beam.sdk.values.PValue) PCollection(org.apache.beam.sdk.values.PCollection) ParDo(org.apache.beam.sdk.transforms.ParDo) PCollectionTuple(org.apache.beam.sdk.values.PCollectionTuple)

Example 9 with PValue

use of org.apache.beam.sdk.values.PValue in project beam by apache.

the class WatermarkManager method refreshWatermarks.

private Set<AppliedPTransform<?, ?, ?>> refreshWatermarks(AppliedPTransform<?, ?, ?> toRefresh) {
    TransformWatermarks myWatermarks = transformToWatermarks.get(toRefresh);
    WatermarkUpdate updateResult = myWatermarks.refresh();
    if (updateResult.isAdvanced()) {
        Set<AppliedPTransform<?, ?, ?>> additionalRefreshes = new HashSet<>();
        for (PValue outputPValue : toRefresh.getOutputs().values()) {
            additionalRefreshes.addAll(graph.getPrimitiveConsumers(outputPValue));
        }
        return additionalRefreshes;
    }
    return Collections.emptySet();
}
Also used : AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform) PValue(org.apache.beam.sdk.values.PValue) HashSet(java.util.HashSet)

Example 10 with PValue

use of org.apache.beam.sdk.values.PValue in project beam by apache.

the class StreamingTransformTranslator method flattenPColl.

private static <T> TransformEvaluator<Flatten.PCollections<T>> flattenPColl() {
    return new TransformEvaluator<Flatten.PCollections<T>>() {

        @SuppressWarnings("unchecked")
        @Override
        public void evaluate(Flatten.PCollections<T> transform, EvaluationContext context) {
            Map<TupleTag<?>, PValue> pcs = context.getInputs(transform);
            // since this is a streaming pipeline, at least one of the PCollections to "flatten" are
            // unbounded, meaning it represents a DStream.
            // So we could end up with an unbounded unified DStream.
            final List<JavaDStream<WindowedValue<T>>> dStreams = new ArrayList<>();
            final List<Integer> streamingSources = new ArrayList<>();
            for (PValue pv : pcs.values()) {
                checkArgument(pv instanceof PCollection, "Flatten had non-PCollection value in input: %s of type %s", pv, pv.getClass().getSimpleName());
                PCollection<T> pcol = (PCollection<T>) pv;
                Dataset dataset = context.borrowDataset(pcol);
                if (dataset instanceof UnboundedDataset) {
                    UnboundedDataset<T> unboundedDataset = (UnboundedDataset<T>) dataset;
                    streamingSources.addAll(unboundedDataset.getStreamSources());
                    dStreams.add(unboundedDataset.getDStream());
                } else {
                    // create a single RDD stream.
                    Queue<JavaRDD<WindowedValue<T>>> q = new LinkedBlockingQueue<>();
                    q.offer(((BoundedDataset) dataset).getRDD());
                    //TODO: this is not recoverable from checkpoint!
                    JavaDStream<WindowedValue<T>> dStream = context.getStreamingContext().queueStream(q);
                    dStreams.add(dStream);
                }
            }
            // start by unifying streams into a single stream.
            JavaDStream<WindowedValue<T>> unifiedStreams = context.getStreamingContext().union(dStreams.remove(0), dStreams);
            context.putDataset(transform, new UnboundedDataset<>(unifiedStreams, streamingSources));
        }

        @Override
        public String toNativeString() {
            return "streamingContext.union(...)";
        }
    };
}
Also used : Dataset(org.apache.beam.runners.spark.translation.Dataset) BoundedDataset(org.apache.beam.runners.spark.translation.BoundedDataset) Flatten(org.apache.beam.sdk.transforms.Flatten) ArrayList(java.util.ArrayList) TupleTag(org.apache.beam.sdk.values.TupleTag) PValue(org.apache.beam.sdk.values.PValue) JavaDStream(org.apache.spark.streaming.api.java.JavaDStream) LinkedBlockingQueue(java.util.concurrent.LinkedBlockingQueue) TransformEvaluator(org.apache.beam.runners.spark.translation.TransformEvaluator) JavaRDD(org.apache.spark.api.java.JavaRDD) PCollection(org.apache.beam.sdk.values.PCollection) WindowedValue(org.apache.beam.sdk.util.WindowedValue) EvaluationContext(org.apache.beam.runners.spark.translation.EvaluationContext)

Aggregations

PValue (org.apache.beam.sdk.values.PValue)28 TupleTag (org.apache.beam.sdk.values.TupleTag)13 PCollection (org.apache.beam.sdk.values.PCollection)12 Test (org.junit.Test)9 TaggedPValue (org.apache.beam.sdk.values.TaggedPValue)7 HashSet (java.util.HashSet)5 Map (java.util.Map)5 Node (org.apache.beam.sdk.runners.TransformHierarchy.Node)5 WindowedValue (org.apache.beam.sdk.util.WindowedValue)5 ImmutableMap (com.google.common.collect.ImmutableMap)4 ReplacementOutput (org.apache.beam.sdk.runners.PTransformOverrideFactory.ReplacementOutput)4 PTransform (org.apache.beam.sdk.transforms.PTransform)4 PCollectionTuple (org.apache.beam.sdk.values.PCollectionTuple)4 JavaRDD (org.apache.spark.api.java.JavaRDD)4 DoFn (org.apache.beam.sdk.transforms.DoFn)3 ParDo (org.apache.beam.sdk.transforms.ParDo)3 ImmutableList (com.google.common.collect.ImmutableList)2 HashMap (java.util.HashMap)2 MetricsContainerStepMap (org.apache.beam.runners.core.metrics.MetricsContainerStepMap)2 EvaluationContext (org.apache.beam.runners.spark.translation.EvaluationContext)2