Search in sources :

Example 11 with PTransform

use of org.apache.beam.sdk.transforms.PTransform in project beam by apache.

the class ProjectionProducerVisitor method enterCompositeTransform.

@Override
public CompositeBehavior enterCompositeTransform(Node node) {
    PTransform<?, ?> transform = node.getTransform();
    // TODO(BEAM-13658) Support inputs other than PBegin.
    if (!node.getInputs().isEmpty()) {
        return CompositeBehavior.DO_NOT_ENTER_TRANSFORM;
    }
    if (!(transform instanceof ProjectionProducer)) {
        return CompositeBehavior.ENTER_TRANSFORM;
    }
    ProjectionProducer<PTransform<?, ?>> pushdownProjector = (ProjectionProducer<PTransform<?, ?>>) transform;
    if (!pushdownProjector.supportsProjectionPushdown()) {
        return CompositeBehavior.ENTER_TRANSFORM;
    }
    ImmutableMap.Builder<PCollection<?>, FieldAccessDescriptor> builder = ImmutableMap.builder();
    for (PCollection<?> output : node.getOutputs().values()) {
        FieldAccessDescriptor fieldAccess = pCollectionFieldAccess.get(output);
        if (fieldAccess != null && !fieldAccess.getAllFields()) {
            builder.put(output, fieldAccess);
        }
    }
    Map<PCollection<?>, FieldAccessDescriptor> localOpportunities = builder.build();
    if (localOpportunities.isEmpty()) {
        return CompositeBehavior.ENTER_TRANSFORM;
    }
    pushdownOpportunities.put(pushdownProjector, localOpportunities);
    // If there are nested PushdownProjector implementations, apply only the outermost one.
    return CompositeBehavior.DO_NOT_ENTER_TRANSFORM;
}
Also used : PCollection(org.apache.beam.sdk.values.PCollection) FieldAccessDescriptor(org.apache.beam.sdk.schemas.FieldAccessDescriptor) ProjectionProducer(org.apache.beam.sdk.schemas.ProjectionProducer) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) PTransform(org.apache.beam.sdk.transforms.PTransform)

Example 12 with PTransform

use of org.apache.beam.sdk.transforms.PTransform in project beam by apache.

the class FlinkStreamingTransformTranslatorsTest method applyReadSourceTransform.

private Object applyReadSourceTransform(PTransform<?, ?> transform, PCollection.IsBounded isBounded, StreamExecutionEnvironment env) {
    FlinkStreamingPipelineTranslator.StreamTransformTranslator<PTransform<?, ?>> translator = getReadSourceTranslator();
    FlinkStreamingTranslationContext ctx = new FlinkStreamingTranslationContext(env, PipelineOptionsFactory.create());
    Pipeline pipeline = Pipeline.create();
    PCollection<String> pc = PCollection.createPrimitiveOutputInternal(pipeline, WindowingStrategy.globalDefault(), isBounded, StringUtf8Coder.of());
    pc.setName("output");
    Map<TupleTag<?>, PValue> outputs = new HashMap<>();
    outputs.put(new TupleTag<>(), pc);
    AppliedPTransform<?, ?, ?> appliedTransform = AppliedPTransform.of("test-transform", Collections.emptyMap(), PValues.fullyExpand(outputs), transform, ResourceHints.create(), Pipeline.create());
    ctx.setCurrentTransform(appliedTransform);
    translator.translateNode(transform, ctx);
    return ctx.getInputDataStream(pc).getTransformation();
}
Also used : HashMap(java.util.HashMap) TupleTag(org.apache.beam.sdk.values.TupleTag) PValue(org.apache.beam.sdk.values.PValue) Pipeline(org.apache.beam.sdk.Pipeline) PTransform(org.apache.beam.sdk.transforms.PTransform) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform)

Example 13 with PTransform

use of org.apache.beam.sdk.transforms.PTransform in project beam by apache.

the class DataflowRunnerTest method testApplyIsScopedToExactClass.

@Test
public void testApplyIsScopedToExactClass() throws IOException {
    DataflowPipelineOptions options = buildPipelineOptions();
    Pipeline p = Pipeline.create(options);
    Create.TimestampedValues<String> transform = Create.timestamped(Arrays.asList(TimestampedValue.of("TestString", Instant.now())));
    p.apply(transform);
    CompositeTransformRecorder recorder = new CompositeTransformRecorder();
    p.traverseTopologically(recorder);
    // The recorder will also have seen a Create.Values composite as well, but we can't obtain that
    // transform.
    assertThat("Expected to have seen CreateTimestamped composite transform.", recorder.getCompositeTransforms(), hasItem(transform));
    assertThat("Expected to have two composites, CreateTimestamped and Create.Values", recorder.getCompositeTransforms(), hasItem(Matchers.<PTransform<?, ?>>isA((Class) Create.Values.class)));
}
Also used : DataflowPipelineOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineOptions) Create(org.apache.beam.sdk.transforms.Create) PValues(org.apache.beam.sdk.values.PValues) Matchers.containsString(org.hamcrest.Matchers.containsString) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Pipeline(org.apache.beam.sdk.Pipeline) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform) PTransform(org.apache.beam.sdk.transforms.PTransform) Test(org.junit.Test) PrepareForTest(org.powermock.core.classloader.annotations.PrepareForTest)

Example 14 with PTransform

use of org.apache.beam.sdk.transforms.PTransform in project beam by apache.

the class ReadSourceTranslatorBatch method translateTransform.

@SuppressWarnings("unchecked")
@Override
public void translateTransform(PTransform<PBegin, PCollection<T>> transform, AbstractTranslationContext context) {
    AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>> rootTransform = (AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>>) context.getCurrentTransform();
    BoundedSource<T> source;
    try {
        source = ReadTranslation.boundedSourceFromTransform(rootTransform);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
    SparkSession sparkSession = context.getSparkSession();
    String serializedSource = Base64Serializer.serializeUnchecked(source);
    Dataset<Row> rowDataset = sparkSession.read().format(sourceProviderClass).option(BEAM_SOURCE_OPTION, serializedSource).option(DEFAULT_PARALLELISM, String.valueOf(context.getSparkSession().sparkContext().defaultParallelism())).option(PIPELINE_OPTIONS, context.getSerializableOptions().toString()).load();
    // extract windowedValue from Row
    WindowedValue.FullWindowedValueCoder<T> windowedValueCoder = WindowedValue.FullWindowedValueCoder.of(source.getOutputCoder(), GlobalWindow.Coder.INSTANCE);
    Dataset<WindowedValue<T>> dataset = rowDataset.map(RowHelpers.extractWindowedValueFromRowMapFunction(windowedValueCoder), EncoderHelpers.fromBeamCoder(windowedValueCoder));
    PCollection<T> output = (PCollection<T>) context.getOutput();
    context.putDataset(output, dataset);
}
Also used : SparkSession(org.apache.spark.sql.SparkSession) IOException(java.io.IOException) PBegin(org.apache.beam.sdk.values.PBegin) PCollection(org.apache.beam.sdk.values.PCollection) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform) WindowedValue(org.apache.beam.sdk.util.WindowedValue) Row(org.apache.spark.sql.Row) PTransform(org.apache.beam.sdk.transforms.PTransform) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform)

Example 15 with PTransform

use of org.apache.beam.sdk.transforms.PTransform in project beam by apache.

the class PTransformTranslation method toProto.

/**
   * Translates an {@link AppliedPTransform} into a runner API proto.
   *
   * <p>Does not register the {@code appliedPTransform} within the provided {@link SdkComponents}.
   */
static RunnerApi.PTransform toProto(AppliedPTransform<?, ?, ?> appliedPTransform, List<AppliedPTransform<?, ?, ?>> subtransforms, SdkComponents components) throws IOException {
    RunnerApi.PTransform.Builder transformBuilder = RunnerApi.PTransform.newBuilder();
    for (Map.Entry<TupleTag<?>, PValue> taggedInput : appliedPTransform.getInputs().entrySet()) {
        checkArgument(taggedInput.getValue() instanceof PCollection, "Unexpected input type %s", taggedInput.getValue().getClass());
        transformBuilder.putInputs(toProto(taggedInput.getKey()), components.registerPCollection((PCollection<?>) taggedInput.getValue()));
    }
    for (Map.Entry<TupleTag<?>, PValue> taggedOutput : appliedPTransform.getOutputs().entrySet()) {
        // TODO: Remove gating
        if (taggedOutput.getValue() instanceof PCollection) {
            checkArgument(taggedOutput.getValue() instanceof PCollection, "Unexpected output type %s", taggedOutput.getValue().getClass());
            transformBuilder.putOutputs(toProto(taggedOutput.getKey()), components.registerPCollection((PCollection<?>) taggedOutput.getValue()));
        }
    }
    for (AppliedPTransform<?, ?, ?> subtransform : subtransforms) {
        transformBuilder.addSubtransforms(components.getExistingPTransformId(subtransform));
    }
    transformBuilder.setUniqueName(appliedPTransform.getFullName());
    // TODO: Display Data
    PTransform<?, ?> transform = appliedPTransform.getTransform();
    if (KNOWN_PAYLOAD_TRANSLATORS.containsKey(transform.getClass())) {
        FunctionSpec payload = KNOWN_PAYLOAD_TRANSLATORS.get(transform.getClass()).translate(appliedPTransform, components);
        transformBuilder.setSpec(payload);
    }
    return transformBuilder.build();
}
Also used : PCollection(org.apache.beam.sdk.values.PCollection) FunctionSpec(org.apache.beam.sdk.common.runner.v1.RunnerApi.FunctionSpec) TupleTag(org.apache.beam.sdk.values.TupleTag) PValue(org.apache.beam.sdk.values.PValue) ImmutableMap(com.google.common.collect.ImmutableMap) Map(java.util.Map) PTransform(org.apache.beam.sdk.transforms.PTransform) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform)

Aggregations

PTransform (org.apache.beam.sdk.transforms.PTransform)41 PCollection (org.apache.beam.sdk.values.PCollection)29 Test (org.junit.Test)18 AppliedPTransform (org.apache.beam.sdk.runners.AppliedPTransform)11 PBegin (org.apache.beam.sdk.values.PBegin)11 IOException (java.io.IOException)10 ArrayList (java.util.ArrayList)10 List (java.util.List)10 Map (java.util.Map)10 TupleTag (org.apache.beam.sdk.values.TupleTag)10 DoFn (org.apache.beam.sdk.transforms.DoFn)9 Coder (org.apache.beam.sdk.coders.Coder)8 Create (org.apache.beam.sdk.transforms.Create)8 ParDo (org.apache.beam.sdk.transforms.ParDo)7 PDone (org.apache.beam.sdk.values.PDone)7 PCollectionTuple (org.apache.beam.sdk.values.PCollectionTuple)6 Collection (java.util.Collection)5 HashMap (java.util.HashMap)5 Collectors.toList (java.util.stream.Collectors.toList)5 Schema (org.apache.beam.sdk.schemas.Schema)5