Search in sources :

Example 1 with ProjectionProducer

use of org.apache.beam.sdk.schemas.ProjectionProducer in project beam by apache.

the class ProjectionProducerVisitorTest method testPushdownProducersWithMultipleOutputs_returnsMultiplePushdowns.

@Test
public void testPushdownProducersWithMultipleOutputs_returnsMultiplePushdowns() {
    Pipeline p = Pipeline.create();
    PTransform<PBegin, PCollectionTuple> source = new MultipleOutputSourceWithPushdown();
    PCollectionTuple outputs = p.apply(source);
    Map<PCollection<?>, FieldAccessDescriptor> pCollectionFieldAccess = ImmutableMap.of(outputs.get("output1"), FieldAccessDescriptor.withFieldNames("field1", "field2"), outputs.get("output2"), FieldAccessDescriptor.withFieldNames("field3", "field4"));
    ProjectionProducerVisitor visitor = new ProjectionProducerVisitor(pCollectionFieldAccess);
    p.traverseTopologically(visitor);
    Map<ProjectionProducer<PTransform<?, ?>>, Map<PCollection<?>, FieldAccessDescriptor>> pushdownOpportunities = visitor.getPushdownOpportunities();
    Assert.assertEquals(1, pushdownOpportunities.size());
    Map<PCollection<?>, FieldAccessDescriptor> opportunitiesForSource = pushdownOpportunities.get(source);
    Assert.assertNotNull(opportunitiesForSource);
    Assert.assertEquals(2, opportunitiesForSource.size());
    FieldAccessDescriptor fieldAccessDescriptor1 = opportunitiesForSource.get(outputs.get("output1"));
    Assert.assertNotNull(fieldAccessDescriptor1);
    Assert.assertFalse(fieldAccessDescriptor1.getAllFields());
    assertThat(fieldAccessDescriptor1.fieldNamesAccessed(), containsInAnyOrder("field1", "field2"));
    FieldAccessDescriptor fieldAccessDescriptor2 = opportunitiesForSource.get(outputs.get("output2"));
    Assert.assertNotNull(fieldAccessDescriptor2);
    Assert.assertFalse(fieldAccessDescriptor2.getAllFields());
    assertThat(fieldAccessDescriptor2.fieldNamesAccessed(), containsInAnyOrder("field3", "field4"));
}
Also used : PCollection(org.apache.beam.sdk.values.PCollection) FieldAccessDescriptor(org.apache.beam.sdk.schemas.FieldAccessDescriptor) ProjectionProducer(org.apache.beam.sdk.schemas.ProjectionProducer) PCollectionTuple(org.apache.beam.sdk.values.PCollectionTuple) PBegin(org.apache.beam.sdk.values.PBegin) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) Map(java.util.Map) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Example 2 with ProjectionProducer

use of org.apache.beam.sdk.schemas.ProjectionProducer in project beam by apache.

the class PCollectionOutputTagVisitor method visitValue.

@Override
public void visitValue(PValue value, Node producer) {
    for (Entry<ProjectionProducer<PTransform<?, ?>>, Map<PCollection<?>, FieldAccessDescriptor>> entry : pCollFieldAccess.entrySet()) {
        FieldAccessDescriptor fieldAccess = entry.getValue().get(value);
        if (fieldAccess == null) {
            continue;
        }
        BiMap<PCollection<?>, TupleTag<?>> outputs = ImmutableBiMap.copyOf(producer.getOutputs()).inverse();
        TupleTag<?> tag = outputs.get(value);
        Preconditions.checkArgumentNotNull(tag, "PCollection %s not found in outputs of producer %s", value, producer);
        ImmutableMap.Builder<TupleTag<?>, FieldAccessDescriptor> tagEntryBuilder = tagFieldAccess.build().get(entry.getKey());
        if (tagEntryBuilder == null) {
            tagEntryBuilder = ImmutableMap.builder();
            tagFieldAccess.put(entry.getKey(), tagEntryBuilder);
        }
        tagEntryBuilder.put(tag, fieldAccess);
    }
}
Also used : FieldAccessDescriptor(org.apache.beam.sdk.schemas.FieldAccessDescriptor) PCollection(org.apache.beam.sdk.values.PCollection) ProjectionProducer(org.apache.beam.sdk.schemas.ProjectionProducer) TupleTag(org.apache.beam.sdk.values.TupleTag) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) ImmutableBiMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableBiMap) Map(java.util.Map) BiMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.BiMap) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap)

Example 3 with ProjectionProducer

use of org.apache.beam.sdk.schemas.ProjectionProducer in project beam by apache.

the class ProjectionPushdownOptimizer method optimize.

/**
 * Performs all known projection pushdown optimizations in-place on a Pipeline.
 *
 * <p>A pushdown optimization is possible wherever there is a {@link ProjectionProducer} that
 * produces a {@link PCollection} that is consumed by one or more PTransforms with an annotated
 * {@link FieldAccessDescriptor}, where the number of fields consumed is less than the number of
 * fields produced. The optimizer replaces the {@link ProjectionProducer} with the result of
 * calling {@link ProjectionProducer#actuateProjectionPushdown(Map)} on that producer with those
 * PCollections/fields.
 *
 * <p>Currently only supports pushdown on {@link ProjectionProducer} instances that are applied
 * directly to {@link PBegin} (https://issues.apache.org/jira/browse/BEAM-13658).
 */
public static void optimize(Pipeline pipeline) {
    // Compute which Schema fields are (or conversely, are not) accessed in a pipeline.
    FieldAccessVisitor fieldAccessVisitor = new FieldAccessVisitor();
    pipeline.traverseTopologically(fieldAccessVisitor);
    // Find transforms in this pipeline which both: 1. support projection pushdown and 2. output
    // unused fields.
    ProjectionProducerVisitor pushdownProjectorVisitor = new ProjectionProducerVisitor(fieldAccessVisitor.getPCollectionFieldAccess());
    pipeline.traverseTopologically(pushdownProjectorVisitor);
    Map<ProjectionProducer<PTransform<?, ?>>, Map<PCollection<?>, FieldAccessDescriptor>> pushdownOpportunities = pushdownProjectorVisitor.getPushdownOpportunities();
    // Translate target PCollections to their output TupleTags.
    PCollectionOutputTagVisitor outputTagVisitor = new PCollectionOutputTagVisitor(pushdownOpportunities);
    pipeline.traverseTopologically(outputTagVisitor);
    Map<ProjectionProducer<PTransform<?, ?>>, Map<TupleTag<?>, FieldAccessDescriptor>> taggedFieldAccess = outputTagVisitor.getTaggedFieldAccess();
    // fields.
    for (Entry<ProjectionProducer<PTransform<?, ?>>, Map<TupleTag<?>, FieldAccessDescriptor>> entry : taggedFieldAccess.entrySet()) {
        for (Entry<TupleTag<?>, FieldAccessDescriptor> outputFields : entry.getValue().entrySet()) {
            LOG.info("Optimizing transform {}: output {} will contain reduced field set {}", entry.getKey(), outputFields.getKey(), outputFields.getValue().fieldNamesAccessed());
        }
        PTransformMatcher matcher = application -> application.getTransform() == entry.getKey();
        PushdownOverrideFactory<?, ?> overrideFactory = new PushdownOverrideFactory<>(entry.getValue());
        pipeline.replaceAll(ImmutableList.of(PTransformOverride.of(matcher, overrideFactory)));
    }
}
Also used : Preconditions(org.apache.beam.sdk.util.Preconditions) PBegin(org.apache.beam.sdk.values.PBegin) Logger(org.slf4j.Logger) ProjectionProducer(org.apache.beam.sdk.schemas.ProjectionProducer) LoggerFactory(org.slf4j.LoggerFactory) PTransformOverride(org.apache.beam.sdk.runners.PTransformOverride) PCollection(org.apache.beam.sdk.values.PCollection) Collectors(java.util.stream.Collectors) PTransform(org.apache.beam.sdk.transforms.PTransform) POutput(org.apache.beam.sdk.values.POutput) PTransformOverrideFactory(org.apache.beam.sdk.runners.PTransformOverrideFactory) TupleTag(org.apache.beam.sdk.values.TupleTag) Map(java.util.Map) FieldAccessDescriptor(org.apache.beam.sdk.schemas.FieldAccessDescriptor) Iterables(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables) Entry(java.util.Map.Entry) TaggedPValue(org.apache.beam.sdk.values.TaggedPValue) ImmutableList(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList) Pipeline(org.apache.beam.sdk.Pipeline) PTransformMatcher(org.apache.beam.sdk.runners.PTransformMatcher) SimpleEntry(java.util.AbstractMap.SimpleEntry) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform) FieldAccessDescriptor(org.apache.beam.sdk.schemas.FieldAccessDescriptor) PTransformMatcher(org.apache.beam.sdk.runners.PTransformMatcher) TupleTag(org.apache.beam.sdk.values.TupleTag) ProjectionProducer(org.apache.beam.sdk.schemas.ProjectionProducer) Map(java.util.Map)

Example 4 with ProjectionProducer

use of org.apache.beam.sdk.schemas.ProjectionProducer in project beam by apache.

the class ProjectionProducerVisitor method enterCompositeTransform.

@Override
public CompositeBehavior enterCompositeTransform(Node node) {
    PTransform<?, ?> transform = node.getTransform();
    // TODO(BEAM-13658) Support inputs other than PBegin.
    if (!node.getInputs().isEmpty()) {
        return CompositeBehavior.DO_NOT_ENTER_TRANSFORM;
    }
    if (!(transform instanceof ProjectionProducer)) {
        return CompositeBehavior.ENTER_TRANSFORM;
    }
    ProjectionProducer<PTransform<?, ?>> pushdownProjector = (ProjectionProducer<PTransform<?, ?>>) transform;
    if (!pushdownProjector.supportsProjectionPushdown()) {
        return CompositeBehavior.ENTER_TRANSFORM;
    }
    ImmutableMap.Builder<PCollection<?>, FieldAccessDescriptor> builder = ImmutableMap.builder();
    for (PCollection<?> output : node.getOutputs().values()) {
        FieldAccessDescriptor fieldAccess = pCollectionFieldAccess.get(output);
        if (fieldAccess != null && !fieldAccess.getAllFields()) {
            builder.put(output, fieldAccess);
        }
    }
    Map<PCollection<?>, FieldAccessDescriptor> localOpportunities = builder.build();
    if (localOpportunities.isEmpty()) {
        return CompositeBehavior.ENTER_TRANSFORM;
    }
    pushdownOpportunities.put(pushdownProjector, localOpportunities);
    // If there are nested PushdownProjector implementations, apply only the outermost one.
    return CompositeBehavior.DO_NOT_ENTER_TRANSFORM;
}
Also used : PCollection(org.apache.beam.sdk.values.PCollection) FieldAccessDescriptor(org.apache.beam.sdk.schemas.FieldAccessDescriptor) ProjectionProducer(org.apache.beam.sdk.schemas.ProjectionProducer) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) PTransform(org.apache.beam.sdk.transforms.PTransform)

Example 5 with ProjectionProducer

use of org.apache.beam.sdk.schemas.ProjectionProducer in project beam by apache.

the class ProjectionProducerVisitorTest method testMissingFieldAccessInformation_returnsNoPushdown.

@Test
public void testMissingFieldAccessInformation_returnsNoPushdown() {
    Pipeline p = Pipeline.create();
    p.apply(new SimpleSource());
    Map<PCollection<?>, FieldAccessDescriptor> pCollectionFieldAccess = ImmutableMap.of();
    ProjectionProducerVisitor visitor = new ProjectionProducerVisitor(pCollectionFieldAccess);
    p.traverseTopologically(visitor);
    Map<ProjectionProducer<PTransform<?, ?>>, Map<PCollection<?>, FieldAccessDescriptor>> pushdownOpportunities = visitor.getPushdownOpportunities();
    Assert.assertTrue(pushdownOpportunities.isEmpty());
}
Also used : PCollection(org.apache.beam.sdk.values.PCollection) FieldAccessDescriptor(org.apache.beam.sdk.schemas.FieldAccessDescriptor) ProjectionProducer(org.apache.beam.sdk.schemas.ProjectionProducer) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) Map(java.util.Map) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Aggregations

FieldAccessDescriptor (org.apache.beam.sdk.schemas.FieldAccessDescriptor)9 ProjectionProducer (org.apache.beam.sdk.schemas.ProjectionProducer)9 PCollection (org.apache.beam.sdk.values.PCollection)9 Map (java.util.Map)8 ImmutableMap (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap)8 Pipeline (org.apache.beam.sdk.Pipeline)7 Test (org.junit.Test)6 PBegin (org.apache.beam.sdk.values.PBegin)4 Row (org.apache.beam.sdk.values.Row)4 PTransform (org.apache.beam.sdk.transforms.PTransform)2 TupleTag (org.apache.beam.sdk.values.TupleTag)2 SimpleEntry (java.util.AbstractMap.SimpleEntry)1 Entry (java.util.Map.Entry)1 Collectors (java.util.stream.Collectors)1 AppliedPTransform (org.apache.beam.sdk.runners.AppliedPTransform)1 PTransformMatcher (org.apache.beam.sdk.runners.PTransformMatcher)1 PTransformOverride (org.apache.beam.sdk.runners.PTransformOverride)1 PTransformOverrideFactory (org.apache.beam.sdk.runners.PTransformOverrideFactory)1 Preconditions (org.apache.beam.sdk.util.Preconditions)1 PCollectionTuple (org.apache.beam.sdk.values.PCollectionTuple)1