Search in sources :

Example 21 with CloudObject

use of org.apache.beam.runners.dataflow.util.CloudObject in project beam by apache.

the class UserParDoFnFactory method create.

@Override
public ParDoFn create(PipelineOptions options, CloudObject cloudUserFn, @Nullable List<SideInputInfo> sideInputInfos, TupleTag<?> mainOutputTag, Map<TupleTag<?>, Integer> outputTupleTagsToReceiverIndices, DataflowExecutionContext<?> executionContext, DataflowOperationContext operationContext) throws Exception {
    DoFnInstanceManager instanceManager = fnCache.get(operationContext.nameContext().systemName(), () -> DoFnInstanceManagers.cloningPool(doFnExtractor.getDoFnInfo(cloudUserFn), options));
    DoFnInfo<?, ?> doFnInfo = instanceManager.peek();
    DataflowExecutionContext.DataflowStepContext stepContext = executionContext.getStepContext(operationContext);
    Iterable<PCollectionView<?>> sideInputViews = doFnInfo.getSideInputViews();
    SideInputReader sideInputReader = executionContext.getSideInputReader(sideInputInfos, sideInputViews, operationContext);
    if (doFnInfo.getDoFn() instanceof BatchStatefulParDoOverrides.BatchStatefulDoFn) {
        // HACK: BatchStatefulDoFn is a class from DataflowRunner's overrides
        // that just instructs the worker to execute it differently. This will
        // be replaced by metadata in the Runner API payload
        BatchStatefulParDoOverrides.BatchStatefulDoFn fn = (BatchStatefulParDoOverrides.BatchStatefulDoFn) doFnInfo.getDoFn();
        DoFn underlyingFn = fn.getUnderlyingDoFn();
        return new BatchModeUngroupingParDoFn((BatchModeExecutionContext.StepContext) stepContext, new SimpleParDoFn(options, DoFnInstanceManagers.singleInstance(doFnInfo.withFn(underlyingFn)), sideInputReader, doFnInfo.getMainOutput(), outputTupleTagsToReceiverIndices, stepContext, operationContext, doFnInfo.getDoFnSchemaInformation(), doFnInfo.getSideInputMapping(), runnerFactory));
    } else if (doFnInfo.getDoFn() instanceof StreamingPCollectionViewWriterFn) {
        // HACK: StreamingPCollectionViewWriterFn is a class from
        // DataflowPipelineTranslator. Using the class as an indicator is a migration path
        // to simply having an indicator string.
        checkArgument(stepContext instanceof StreamingModeExecutionContext.StreamingModeStepContext, "stepContext must be a StreamingModeStepContext to use StreamingPCollectionViewWriterFn");
        DataflowRunner.StreamingPCollectionViewWriterFn<Object> writerFn = (StreamingPCollectionViewWriterFn<Object>) doFnInfo.getDoFn();
        return new StreamingPCollectionViewWriterParDoFn((StreamingModeExecutionContext.StreamingModeStepContext) stepContext, writerFn.getView().getTagInternal(), writerFn.getDataCoder(), (Coder<BoundedWindow>) doFnInfo.getWindowingStrategy().getWindowFn().windowCoder());
    } else {
        return new SimpleParDoFn(options, instanceManager, sideInputReader, doFnInfo.getMainOutput(), outputTupleTagsToReceiverIndices, stepContext, operationContext, doFnInfo.getDoFnSchemaInformation(), doFnInfo.getSideInputMapping(), runnerFactory);
    }
}
Also used : Coder(org.apache.beam.sdk.coders.Coder) SideInputReader(org.apache.beam.runners.core.SideInputReader) BatchStatefulParDoOverrides(org.apache.beam.runners.dataflow.BatchStatefulParDoOverrides) StreamingPCollectionViewWriterFn(org.apache.beam.runners.dataflow.DataflowRunner.StreamingPCollectionViewWriterFn) PCollectionView(org.apache.beam.sdk.values.PCollectionView) DoFn(org.apache.beam.sdk.transforms.DoFn) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject)

Example 22 with CloudObject

use of org.apache.beam.runners.dataflow.util.CloudObject in project beam by apache.

the class IntrinsicMapTaskExecutorFactory method createWriteOperation.

OperationNode createWriteOperation(ParallelInstructionNode node, PipelineOptions options, SinkFactory sinkFactory, DataflowExecutionContext executionContext, DataflowOperationContext context) throws Exception {
    ParallelInstruction instruction = node.getParallelInstruction();
    WriteInstruction write = instruction.getWrite();
    Coder<?> coder = CloudObjects.coderFromCloudObject(CloudObject.fromSpec(write.getSink().getCodec()));
    CloudObject cloudSink = CloudObject.fromSpec(write.getSink().getSpec());
    Sink<?> sink = sinkFactory.create(cloudSink, coder, options, executionContext, context);
    return OperationNode.create(WriteOperation.create(sink, EMPTY_OUTPUT_RECEIVER_ARRAY, context));
}
Also used : ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) WriteInstruction(com.google.api.services.dataflow.model.WriteInstruction)

Example 23 with CloudObject

use of org.apache.beam.runners.dataflow.util.CloudObject in project beam by apache.

the class IsmSinkFactory method create.

@Override
public Sink<?> create(CloudObject spec, @Nullable Coder<?> coder, @Nullable PipelineOptions options, @Nullable DataflowExecutionContext executionContext, DataflowOperationContext operationContext) throws Exception {
    options = checkArgumentNotNull(options);
    coder = checkArgumentNotNull(coder);
    // The validity of this coder is checked in detail by the typed create, below
    @SuppressWarnings("unchecked") Coder<WindowedValue<IsmRecord<Object>>> typedCoder = (Coder<WindowedValue<IsmRecord<Object>>>) coder;
    String filename = getString(spec, WorkerPropertyNames.FILENAME);
    checkArgument(typedCoder instanceof WindowedValueCoder, "%s only supports using %s but got %s.", IsmSink.class, WindowedValueCoder.class, typedCoder);
    WindowedValueCoder<IsmRecord<Object>> windowedCoder = (WindowedValueCoder<IsmRecord<Object>>) typedCoder;
    checkArgument(windowedCoder.getValueCoder() instanceof IsmRecordCoder, "%s only supports using %s but got %s.", IsmSink.class, IsmRecordCoder.class, windowedCoder.getValueCoder());
    @SuppressWarnings("unchecked") IsmRecordCoder<Object> ismCoder = (IsmRecordCoder<Object>) windowedCoder.getValueCoder();
    long bloomFilterSizeLimitBytes = Math.max(MIN_BLOOM_FILTER_SIZE_BYTES, DoubleMath.roundToLong(BLOOM_FILTER_SIZE_LIMIT_MULTIPLIER * options.as(DataflowWorkerHarnessOptions.class).getWorkerCacheMb() * // Note the conversion from MiB to bytes
    1024 * 1024, RoundingMode.DOWN));
    return new IsmSink<>(FileSystems.matchNewResource(filename, false), ismCoder, bloomFilterSizeLimitBytes);
}
Also used : WindowedValueCoder(org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder) Coder(org.apache.beam.sdk.coders.Coder) IsmRecordCoder(org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecordCoder) IsmRecord(org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecord) Structs.getString(org.apache.beam.runners.dataflow.util.Structs.getString) IsmRecordCoder(org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecordCoder) WindowedValueCoder(org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder) WindowedValue(org.apache.beam.sdk.util.WindowedValue) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject)

Example 24 with CloudObject

use of org.apache.beam.runners.dataflow.util.CloudObject in project beam by apache.

the class ReaderRegistry method create.

/**
 * Creates a {@link NativeReader} from a Dataflow API {@link Source} specification, using the
 * {@link Coder} contained in the {@link Source} specification.
 */
public NativeReader<?> create(Source cloudSource, @Nullable PipelineOptions options, @Nullable DataflowExecutionContext executionContext, DataflowOperationContext operationContext) throws Exception {
    cloudSource = CloudSourceUtils.flattenBaseSpecs(cloudSource);
    CloudObject sourceSpec = CloudObject.fromSpec(cloudSource.getSpec());
    Coder<?> coder = null;
    if (cloudSource.getCodec() != null) {
        coder = CloudObjects.coderFromCloudObject(CloudObject.fromSpec(cloudSource.getCodec()));
    }
    return create(sourceSpec, coder, options, executionContext, operationContext);
}
Also used : CloudObject(org.apache.beam.runners.dataflow.util.CloudObject)

Example 25 with CloudObject

use of org.apache.beam.runners.dataflow.util.CloudObject in project beam by apache.

the class PairWithConstantKeyDoFnFactory method create.

@Override
public ParDoFn create(PipelineOptions options, CloudObject cloudUserFn, List<SideInputInfo> sideInputInfos, TupleTag<?> mainOutputTag, Map<TupleTag<?>, Integer> outputTupleTagsToReceiverIndices, DataflowExecutionContext<?> executionContext, DataflowOperationContext operationContext) throws Exception {
    Coder<?> coder = CloudObjects.coderFromCloudObject(CloudObject.fromSpec(Structs.getObject(cloudUserFn, PropertyNames.ENCODING)));
    Object key = CoderUtils.decodeFromByteArray(coder, Structs.getBytes(cloudUserFn, WorkerPropertyNames.ENCODED_KEY));
    return new PairWithConstantKeyParDoFn(key);
}
Also used : CloudObject(org.apache.beam.runners.dataflow.util.CloudObject)

Aggregations

CloudObject (org.apache.beam.runners.dataflow.util.CloudObject)62 ParallelInstruction (com.google.api.services.dataflow.model.ParallelInstruction)23 Test (org.junit.Test)21 Source (com.google.api.services.dataflow.model.Source)15 ParDoInstruction (com.google.api.services.dataflow.model.ParDoInstruction)13 InstructionOutput (com.google.api.services.dataflow.model.InstructionOutput)12 ParDoFn (org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn)11 OutputReceiver (org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver)10 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)10 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)10 ReadInstruction (com.google.api.services.dataflow.model.ReadInstruction)9 HashMap (java.util.HashMap)9 InstructionInput (com.google.api.services.dataflow.model.InstructionInput)8 Map (java.util.Map)8 ArrayList (java.util.ArrayList)7 Structs.getString (org.apache.beam.runners.dataflow.util.Structs.getString)7 SdkComponents (org.apache.beam.runners.core.construction.SdkComponents)6 Structs.addString (org.apache.beam.runners.dataflow.util.Structs.addString)6 ImmutableList (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList)6 List (java.util.List)5