Search in sources :

Example 41 with Components

use of org.apache.beam.model.pipeline.v1.RunnerApi.Components in project beam by apache.

the class QueryablePipeline method buildNetwork.

private MutableNetwork<PipelineNode, PipelineEdge> buildNetwork(Collection<String> transformIds, Components components) {
    MutableNetwork<PipelineNode, PipelineEdge> network = NetworkBuilder.directed().allowsParallelEdges(true).allowsSelfLoops(false).build();
    Set<PCollectionNode> unproducedCollections = new HashSet<>();
    for (String transformId : transformIds) {
        PTransform transform = components.getTransformsOrThrow(transformId);
        PTransformNode transformNode = PipelineNode.pTransform(transformId, this.components.getTransformsOrThrow(transformId));
        network.addNode(transformNode);
        for (String produced : transform.getOutputsMap().values()) {
            PCollectionNode producedNode = PipelineNode.pCollection(produced, components.getPcollectionsOrThrow(produced));
            network.addNode(producedNode);
            network.addEdge(transformNode, producedNode, new PerElementEdge());
            checkArgument(network.inDegree(producedNode) == 1, "A %s should have exactly one producing %s, but found %s:\nPCollection:\n%s\nProducers:\n%s", PCollectionNode.class.getSimpleName(), PTransformNode.class.getSimpleName(), network.predecessors(producedNode).size(), producedNode, network.predecessors(producedNode));
            unproducedCollections.remove(producedNode);
        }
        for (Map.Entry<String, String> consumed : transform.getInputsMap().entrySet()) {
            // This loop may add an edge between the consumed PCollection and the current PTransform.
            // The local name of the transform must be used to determine the type of edge.
            String pcollectionId = consumed.getValue();
            PCollectionNode consumedNode = PipelineNode.pCollection(pcollectionId, this.components.getPcollectionsOrThrow(pcollectionId));
            if (network.addNode(consumedNode)) {
                // This node has been added to the network for the first time, so it has no producer.
                unproducedCollections.add(consumedNode);
            }
            if (getLocalSideInputNames(transform).contains(consumed.getKey())) {
                network.addEdge(consumedNode, transformNode, new SingletonEdge());
            } else {
                network.addEdge(consumedNode, transformNode, new PerElementEdge());
            }
        }
    }
    checkArgument(unproducedCollections.isEmpty(), "%ss %s were consumed but never produced", PCollectionNode.class.getSimpleName(), unproducedCollections);
    return network;
}
Also used : PTransformNode(org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode) PCollectionNode(org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode) Map(java.util.Map) HashSet(java.util.HashSet) LinkedHashSet(java.util.LinkedHashSet) PTransform(org.apache.beam.model.pipeline.v1.RunnerApi.PTransform)

Example 42 with Components

use of org.apache.beam.model.pipeline.v1.RunnerApi.Components in project beam by apache.

the class QueryablePipeline method getPrimitiveTransformIds.

/**
 * Produces a {@link RunnerApi.Components} which contains only primitive transforms.
 */
@VisibleForTesting
static Collection<String> getPrimitiveTransformIds(RunnerApi.Components components) {
    Collection<String> ids = new LinkedHashSet<>();
    for (Map.Entry<String, PTransform> transformEntry : components.getTransformsMap().entrySet()) {
        PTransform transform = transformEntry.getValue();
        boolean isPrimitive = isPrimitiveTransform(transform);
        if (isPrimitive) {
            // Sometimes "primitive" transforms have sub-transforms (and even deeper-nested
            // descendents), due to runners
            // either rewriting them in terms of runner-specific transforms, or SDKs constructing them
            // in terms of other
            // underlying transforms (see https://issues.apache.org/jira/browse/BEAM-5441).
            // We consider any "leaf" descendents of these "primitive" transforms to be the true
            // "primitives" that we
            // preserve here; in the common case, this is just the "primitive" itself, which has no
            // descendents).
            Deque<String> transforms = new ArrayDeque<>();
            transforms.push(transformEntry.getKey());
            while (!transforms.isEmpty()) {
                String id = transforms.pop();
                PTransform next = components.getTransformsMap().get(id);
                List<String> subtransforms = next.getSubtransformsList();
                if (subtransforms.isEmpty()) {
                    ids.add(id);
                } else {
                    transforms.addAll(subtransforms);
                }
            }
        }
    }
    return ids;
}
Also used : LinkedHashSet(java.util.LinkedHashSet) Map(java.util.Map) ArrayDeque(java.util.ArrayDeque) PTransform(org.apache.beam.model.pipeline.v1.RunnerApi.PTransform) VisibleForTesting(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting)

Example 43 with Components

use of org.apache.beam.model.pipeline.v1.RunnerApi.Components in project beam by apache.

the class CreatePCollectionViewTranslationTest method testEncodedProto.

@Test
public void testEncodedProto() throws Exception {
    SdkComponents components = SdkComponents.create();
    components.registerEnvironment(Environments.createDockerEnvironment("java"));
    components.registerPCollection(testPCollection);
    AppliedPTransform<?, ?, ?> appliedPTransform = AppliedPTransform.of("foo", PValues.expandInput(testPCollection), PValues.expandOutput(createViewTransform.getView()), createViewTransform, ResourceHints.create(), p);
    FunctionSpec payload = PTransformTranslation.toProto(appliedPTransform, components).getSpec();
    // Checks that the payload is what it should be
    PCollectionView<?> deserializedView = (PCollectionView<?>) SerializableUtils.deserializeFromByteArray(payload.getPayload().toByteArray(), PCollectionView.class.getSimpleName());
    assertThat(deserializedView, Matchers.equalTo(createViewTransform.getView()));
}
Also used : PCollectionView(org.apache.beam.sdk.values.PCollectionView) CreatePCollectionView(org.apache.beam.sdk.transforms.View.CreatePCollectionView) FunctionSpec(org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec) Test(org.junit.Test)

Example 44 with Components

use of org.apache.beam.model.pipeline.v1.RunnerApi.Components in project beam by apache.

the class EnvironmentsTest method getEnvironmentPTransform.

@Test
public void getEnvironmentPTransform() throws IOException {
    Pipeline p = Pipeline.create();
    SdkComponents components = SdkComponents.create();
    Environment env = Environments.createDockerEnvironment("java");
    components.registerEnvironment(env);
    ParDoPayload payload = ParDoTranslation.translateParDo(ParDo.of(new DoFn<String, String>() {

        @ProcessElement
        public void process(ProcessContext ctxt) {
        }
    }).withOutputTags(new TupleTag<>(), TupleTagList.empty()), PCollection.createPrimitiveOutputInternal(p, WindowingStrategy.globalDefault(), IsBounded.BOUNDED, StringUtf8Coder.of()), DoFnSchemaInformation.create(), Pipeline.create(), components);
    RehydratedComponents rehydratedComponents = RehydratedComponents.forComponents(components.toComponents());
    PTransform ptransform = PTransform.newBuilder().setSpec(FunctionSpec.newBuilder().setUrn(PTransformTranslation.PAR_DO_TRANSFORM_URN).setPayload(payload.toByteString()).build()).setEnvironmentId(components.getOnlyEnvironmentId()).build();
    Environment env1 = Environments.getEnvironment(ptransform, rehydratedComponents).get();
    assertThat(env1, equalTo(components.toComponents().getEnvironmentsOrThrow(ptransform.getEnvironmentId())));
}
Also used : ParDoPayload(org.apache.beam.model.pipeline.v1.RunnerApi.ParDoPayload) Environment(org.apache.beam.model.pipeline.v1.RunnerApi.Environment) TupleTag(org.apache.beam.sdk.values.TupleTag) Pipeline(org.apache.beam.sdk.Pipeline) PTransform(org.apache.beam.model.pipeline.v1.RunnerApi.PTransform) Test(org.junit.Test)

Example 45 with Components

use of org.apache.beam.model.pipeline.v1.RunnerApi.Components in project beam by apache.

the class OutputDeduplicator method ensureSingleProducer.

/**
 * Ensure that no {@link PCollection} output by any of the {@code stages} or {@code
 * unfusedTransforms} is produced by more than one of those stages or transforms.
 *
 * <p>For each {@link PCollection} output by multiple stages and/or transforms, each producer is
 * rewritten to produce a partial {@link PCollection}, which are then flattened together via an
 * introduced Flatten node which produces the original output.
 */
static DeduplicationResult ensureSingleProducer(QueryablePipeline pipeline, Collection<ExecutableStage> stages, Collection<PTransformNode> unfusedTransforms) {
    RunnerApi.Components.Builder unzippedComponents = pipeline.getComponents().toBuilder();
    Multimap<PCollectionNode, StageOrTransform> pcollectionProducers = getProducers(pipeline, stages, unfusedTransforms);
    Multimap<StageOrTransform, PCollectionNode> requiresNewOutput = HashMultimap.create();
    // ExecutableStage must also be rewritten to have updated outputs and transforms.
    for (Map.Entry<PCollectionNode, Collection<StageOrTransform>> collectionProducer : pcollectionProducers.asMap().entrySet()) {
        if (collectionProducer.getValue().size() > 1) {
            for (StageOrTransform producer : collectionProducer.getValue()) {
                requiresNewOutput.put(producer, collectionProducer.getKey());
            }
        }
    }
    Map<ExecutableStage, ExecutableStage> updatedStages = new LinkedHashMap<>();
    Map<String, PTransformNode> updatedTransforms = new LinkedHashMap<>();
    Multimap<String, PCollectionNode> originalToPartial = HashMultimap.create();
    for (Map.Entry<StageOrTransform, Collection<PCollectionNode>> deduplicationTargets : requiresNewOutput.asMap().entrySet()) {
        if (deduplicationTargets.getKey().getStage() != null) {
            StageDeduplication deduplication = deduplicatePCollections(deduplicationTargets.getKey().getStage(), deduplicationTargets.getValue(), unzippedComponents::containsPcollections);
            for (Entry<String, PCollectionNode> originalToPartialReplacement : deduplication.getOriginalToPartialPCollections().entrySet()) {
                originalToPartial.put(originalToPartialReplacement.getKey(), originalToPartialReplacement.getValue());
                unzippedComponents.putPcollections(originalToPartialReplacement.getValue().getId(), originalToPartialReplacement.getValue().getPCollection());
            }
            updatedStages.put(deduplicationTargets.getKey().getStage(), deduplication.getUpdatedStage());
        } else if (deduplicationTargets.getKey().getTransform() != null) {
            PTransformDeduplication deduplication = deduplicatePCollections(deduplicationTargets.getKey().getTransform(), deduplicationTargets.getValue(), unzippedComponents::containsPcollections);
            for (Entry<String, PCollectionNode> originalToPartialReplacement : deduplication.getOriginalToPartialPCollections().entrySet()) {
                originalToPartial.put(originalToPartialReplacement.getKey(), originalToPartialReplacement.getValue());
                unzippedComponents.putPcollections(originalToPartialReplacement.getValue().getId(), originalToPartialReplacement.getValue().getPCollection());
            }
            updatedTransforms.put(deduplicationTargets.getKey().getTransform().getId(), deduplication.getUpdatedTransform());
        } else {
            throw new IllegalStateException(String.format("%s with no %s or %s", StageOrTransform.class.getSimpleName(), ExecutableStage.class.getSimpleName(), PTransformNode.class.getSimpleName()));
        }
    }
    Set<PTransformNode> introducedFlattens = new LinkedHashSet<>();
    for (Map.Entry<String, Collection<PCollectionNode>> partialFlattenTargets : originalToPartial.asMap().entrySet()) {
        String flattenId = SyntheticComponents.uniqueId("unzipped_flatten", unzippedComponents::containsTransforms);
        PTransform flattenPartialPCollections = createFlattenOfPartials(flattenId, partialFlattenTargets.getKey(), partialFlattenTargets.getValue());
        unzippedComponents.putTransforms(flattenId, flattenPartialPCollections);
        introducedFlattens.add(PipelineNode.pTransform(flattenId, flattenPartialPCollections));
    }
    Components components = unzippedComponents.build();
    return DeduplicationResult.of(components, introducedFlattens, updatedStages, updatedTransforms);
}
Also used : LinkedHashSet(java.util.LinkedHashSet) PTransformNode(org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode) LinkedHashMap(java.util.LinkedHashMap) SyntheticComponents(org.apache.beam.runners.core.construction.SyntheticComponents) Components(org.apache.beam.model.pipeline.v1.RunnerApi.Components) Entry(java.util.Map.Entry) PTransform(org.apache.beam.model.pipeline.v1.RunnerApi.PTransform) PCollectionNode(org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode) Collection(java.util.Collection) PCollection(org.apache.beam.model.pipeline.v1.RunnerApi.PCollection) LinkedHashMap(java.util.LinkedHashMap) Map(java.util.Map)

Aggregations

Test (org.junit.Test)55 Components (org.apache.beam.model.pipeline.v1.RunnerApi.Components)49 RunnerApi (org.apache.beam.model.pipeline.v1.RunnerApi)40 PTransform (org.apache.beam.model.pipeline.v1.RunnerApi.PTransform)31 PTransformNode (org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode)20 Map (java.util.Map)16 WindowedValue (org.apache.beam.sdk.util.WindowedValue)16 IOException (java.io.IOException)15 PCollectionNode (org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode)15 PCollection (org.apache.beam.model.pipeline.v1.RunnerApi.PCollection)14 Coder (org.apache.beam.sdk.coders.Coder)14 SdkComponents (org.apache.beam.runners.core.construction.SdkComponents)13 Pipeline (org.apache.beam.sdk.Pipeline)13 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)12 FunctionSpec (org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec)11 KvCoder (org.apache.beam.sdk.coders.KvCoder)11 BoundedWindow (org.apache.beam.sdk.transforms.windowing.BoundedWindow)11 ArrayList (java.util.ArrayList)10 List (java.util.List)10 Environment (org.apache.beam.model.pipeline.v1.RunnerApi.Environment)10