Search in sources :

Example 1 with PTransform

use of org.apache.beam.sdk.transforms.PTransform in project beam by apache.

the class PipelineTest method testReplaceAll.

@Test
public void testReplaceAll() {
    pipeline.enableAbandonedNodeEnforcement(false);
    pipeline.apply("unbounded", GenerateSequence.from(0));
    pipeline.apply("bounded", GenerateSequence.from(0).to(100));
    pipeline.replaceAll(ImmutableList.of(PTransformOverride.of(new PTransformMatcher() {

        @Override
        public boolean matches(AppliedPTransform<?, ?, ?> application) {
            return application.getTransform() instanceof GenerateSequence;
        }
    }, new GenerateSequenceToCreateOverride()), PTransformOverride.of(new PTransformMatcher() {

        @Override
        public boolean matches(AppliedPTransform<?, ?, ?> application) {
            return application.getTransform() instanceof Create.Values;
        }
    }, new CreateValuesToEmptyFlattenOverride())));
    pipeline.traverseTopologically(new PipelineVisitor.Defaults() {

        @Override
        public CompositeBehavior enterCompositeTransform(Node node) {
            if (!node.isRootNode()) {
                assertThat(node.getTransform().getClass(), not(anyOf(Matchers.<Class<? extends PTransform>>equalTo(GenerateSequence.class), Matchers.<Class<? extends PTransform>>equalTo(Create.Values.class))));
            }
            return CompositeBehavior.ENTER_TRANSFORM;
        }
    });
}
Also used : PTransformMatcher(org.apache.beam.sdk.runners.PTransformMatcher) Node(org.apache.beam.sdk.runners.TransformHierarchy.Node) GenerateSequence(org.apache.beam.sdk.io.GenerateSequence) Create(org.apache.beam.sdk.transforms.Create) PipelineVisitor(org.apache.beam.sdk.Pipeline.PipelineVisitor) PTransformOverride(org.apache.beam.sdk.runners.PTransformOverride) PTransform(org.apache.beam.sdk.transforms.PTransform) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform) Test(org.junit.Test)

Example 2 with PTransform

use of org.apache.beam.sdk.transforms.PTransform in project beam by apache.

the class TranslationContext method optimizeStreams.

private void optimizeStreams(DAG.StreamMeta streamMeta, Map.Entry<PCollection, Pair<OutputPortInfo, List<InputPortInfo>>> streamEntry) {
    DAG.Locality loc = null;
    List<InputPortInfo> sinks = streamEntry.getValue().getRight();
    OutputPortInfo source = streamEntry.getValue().getLeft();
    PTransform sourceTransform = source.transform.getTransform();
    if (sourceTransform instanceof ParDo.MultiOutput || sourceTransform instanceof Window.Assign) {
        // source qualifies for chaining, check sink(s)
        for (InputPortInfo sink : sinks) {
            PTransform transform = sink.transform.getTransform();
            if (transform instanceof ParDo.MultiOutput) {
                ParDo.MultiOutput t = (ParDo.MultiOutput) transform;
                if (t.getSideInputs().size() > 0) {
                    loc = DAG.Locality.CONTAINER_LOCAL;
                    break;
                } else {
                    loc = DAG.Locality.THREAD_LOCAL;
                }
            } else if (transform instanceof Window.Assign) {
                loc = DAG.Locality.THREAD_LOCAL;
            } else {
                // cannot chain, if there is any other sink
                loc = null;
                break;
            }
        }
    }
    streamMeta.setLocality(loc);
}
Also used : Window(org.apache.beam.sdk.transforms.windowing.Window) ParDo(org.apache.beam.sdk.transforms.ParDo) DAG(com.datatorrent.api.DAG) PTransform(org.apache.beam.sdk.transforms.PTransform) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform)

Example 3 with PTransform

use of org.apache.beam.sdk.transforms.PTransform in project components by Talend.

the class PubSubOutputRuntimeTestIT method outputCsv.

private void outputCsv(Pipeline pipeline) throws IOException {
    String testID = "csvBasicTest" + new Random().nextInt();
    final String fieldDelimited = ";";
    List<Person> expectedPersons = Person.genRandomList(testID, maxRecords);
    List<String> expectedMessages = new ArrayList<>();
    List<String[]> sendMessages = new ArrayList<>();
    for (Person person : expectedPersons) {
        expectedMessages.add(person.toCSV(fieldDelimited));
        sendMessages.add(person.toCSV(fieldDelimited).split(fieldDelimited));
    }
    PubSubOutputRuntime outputRuntime = new PubSubOutputRuntime();
    outputRuntime.initialize(runtimeContainer, createOutput(createDatasetFromCSV(createDatastore(), topicName, fieldDelimited)));
    PCollection<IndexedRecord> records = (PCollection<IndexedRecord>) pipeline.apply(Create.of(sendMessages)).apply((PTransform) ConvertToIndexedRecord.of());
    records.setCoder(LazyAvroCoder.of()).apply(outputRuntime);
    pipeline.run().waitUntilFinish();
    List<String> actual = new ArrayList<>();
    while (true) {
        List<ReceivedMessage> messages = client.pull(subscriptionName, maxRecords);
        List<String> ackIds = new ArrayList<>();
        for (ReceivedMessage message : messages) {
            actual.add(new String(message.getMessage().decodeData()));
            ackIds.add(message.getAckId());
        }
        client.ack(subscriptionName, ackIds);
        if (actual.size() >= maxRecords) {
            break;
        }
    }
    assertThat(actual, containsInAnyOrder(expectedMessages.toArray()));
}
Also used : ConvertToIndexedRecord(org.talend.components.adapter.beam.transform.ConvertToIndexedRecord) IndexedRecord(org.apache.avro.generic.IndexedRecord) ArrayList(java.util.ArrayList) ReceivedMessage(com.google.api.services.pubsub.model.ReceivedMessage) PCollection(org.apache.beam.sdk.values.PCollection) Random(java.util.Random) PTransform(org.apache.beam.sdk.transforms.PTransform)

Example 4 with PTransform

use of org.apache.beam.sdk.transforms.PTransform in project component-runtime by Talend.

the class BeamProcessorChainImpl method extractDoFn.

private static Collection<DoFn<?, ?>> extractDoFn(final CapturingPipeline.TransformWithCoder step, final CoderRegistry coderRegistry) {
    final CapturingPipeline capturingPipeline = new CapturingPipeline(PipelineOptionsFactory.create());
    if (coderRegistry != null) {
        capturingPipeline.setCoderRegistry(coderRegistry);
    }
    final POutput apply = capturingPipeline.apply(new PTransform<PBegin, PCollection<Object>>() {

        @Override
        public PCollection<Object> expand(final PBegin input) {
            return PCollection.createPrimitiveOutputInternal(capturingPipeline, WindowingStrategy.globalDefault(), PCollection.IsBounded.BOUNDED, TypingCoder.INSTANCE);
        }

        @Override
        protected Coder<?> getDefaultOutputCoder() {
            return TypingCoder.INSTANCE;
        }
    }).apply(step.getTransform());
    if (PCollectionTuple.class.isInstance(apply) && step.getCoders() != null) {
        final Map<TupleTag<?>, PCollection<?>> all = PCollectionTuple.class.cast(apply).getAll();
        step.getCoders().forEach((k, v) -> {
            final PCollection<?> collection = all.get(k);
            if (collection != null) {
                collection.setCoder(Coder.class.cast(v));
            }
        });
    } else if (PCollection.class.isInstance(apply) && step.getCoders() != null && !step.getCoders().isEmpty()) {
        PCollection.class.cast(apply).setCoder(Coder.class.cast(step.getCoders().values().iterator().next()));
    }
    final CapturingPipeline.SinkExtractor sinkExtractor = new CapturingPipeline.SinkExtractor();
    capturingPipeline.traverseTopologically(sinkExtractor);
    return sinkExtractor.getOutputs();
}
Also used : Coder(org.apache.beam.sdk.coders.Coder) TupleTag(org.apache.beam.sdk.values.TupleTag) PBegin(org.apache.beam.sdk.values.PBegin) PCollection(org.apache.beam.sdk.values.PCollection) POutput(org.apache.beam.sdk.values.POutput) PCollectionTuple(org.apache.beam.sdk.values.PCollectionTuple) PTransform(org.apache.beam.sdk.transforms.PTransform)

Example 5 with PTransform

use of org.apache.beam.sdk.transforms.PTransform in project beam by apache.

the class TransformHierarchyTest method replaceSucceeds.

@Test
public void replaceSucceeds() {
    PTransform<?, ?> enclosingPT = new PTransform<PInput, POutput>() {

        @Override
        public POutput expand(PInput input) {
            return PDone.in(input.getPipeline());
        }
    };
    TransformHierarchy.Node enclosing = hierarchy.pushNode("Enclosing", PBegin.in(pipeline), enclosingPT);
    Create.Values<Long> originalTransform = Create.of(1L);
    TransformHierarchy.Node original = hierarchy.pushNode("Create", PBegin.in(pipeline), originalTransform);
    assertThat(hierarchy.getCurrent(), equalTo(original));
    PCollection<Long> originalOutput = pipeline.apply(originalTransform);
    hierarchy.setOutput(originalOutput);
    hierarchy.popNode();
    assertThat(original.finishedSpecifying, is(true));
    hierarchy.setOutput(PDone.in(pipeline));
    hierarchy.popNode();
    assertThat(hierarchy.getCurrent(), not(equalTo(enclosing)));
    Read.Bounded<Long> replacementTransform = Read.from(CountingSource.upTo(1L));
    PCollection<Long> replacementOutput = pipeline.apply(replacementTransform);
    Node replacement = hierarchy.replaceNode(original, PBegin.in(pipeline), replacementTransform);
    assertThat(hierarchy.getCurrent(), equalTo(replacement));
    hierarchy.setOutput(replacementOutput);
    TaggedPValue taggedReplacement = TaggedPValue.ofExpandedValue(replacementOutput);
    Map<PCollection<?>, ReplacementOutput> replacementOutputs = Collections.singletonMap(replacementOutput, ReplacementOutput.of(TaggedPValue.ofExpandedValue(originalOutput), taggedReplacement));
    hierarchy.replaceOutputs(replacementOutputs);
    assertThat(replacement.getInputs(), equalTo(original.getInputs()));
    assertThat(replacement.getEnclosingNode(), equalTo(original.getEnclosingNode()));
    assertThat(replacement.getEnclosingNode(), equalTo(enclosing));
    assertThat(replacement.getTransform(), equalTo(replacementTransform));
    // THe tags of the replacement transform are matched to the appropriate PValues of the original
    assertThat(replacement.getOutputs().keySet(), Matchers.contains(taggedReplacement.getTag()));
    assertThat(replacement.getOutputs().values(), Matchers.contains(originalOutput));
    hierarchy.popNode();
}
Also used : Node(org.apache.beam.sdk.runners.TransformHierarchy.Node) PInput(org.apache.beam.sdk.values.PInput) Node(org.apache.beam.sdk.runners.TransformHierarchy.Node) Read(org.apache.beam.sdk.io.Read) PCollection(org.apache.beam.sdk.values.PCollection) ReplacementOutput(org.apache.beam.sdk.runners.PTransformOverrideFactory.ReplacementOutput) Create(org.apache.beam.sdk.transforms.Create) TaggedPValue(org.apache.beam.sdk.values.TaggedPValue) PTransform(org.apache.beam.sdk.transforms.PTransform) Test(org.junit.Test)

Aggregations

PTransform (org.apache.beam.sdk.transforms.PTransform)41 PCollection (org.apache.beam.sdk.values.PCollection)29 Test (org.junit.Test)18 AppliedPTransform (org.apache.beam.sdk.runners.AppliedPTransform)11 PBegin (org.apache.beam.sdk.values.PBegin)11 IOException (java.io.IOException)10 ArrayList (java.util.ArrayList)10 List (java.util.List)10 Map (java.util.Map)10 TupleTag (org.apache.beam.sdk.values.TupleTag)10 DoFn (org.apache.beam.sdk.transforms.DoFn)9 Coder (org.apache.beam.sdk.coders.Coder)8 Create (org.apache.beam.sdk.transforms.Create)8 ParDo (org.apache.beam.sdk.transforms.ParDo)7 PDone (org.apache.beam.sdk.values.PDone)7 PCollectionTuple (org.apache.beam.sdk.values.PCollectionTuple)6 Collection (java.util.Collection)5 HashMap (java.util.HashMap)5 Collectors.toList (java.util.stream.Collectors.toList)5 Schema (org.apache.beam.sdk.schemas.Schema)5