Search in sources :

Example 31 with Coder

use of org.apache.beam.sdk.coders.Coder in project beam by apache.

the class BoundedDataset method cache.

@Override
@SuppressWarnings("unchecked")
public void cache(String storageLevel, Coder<?> coder) {
    StorageLevel level = StorageLevel.fromString(storageLevel);
    if (TranslationUtils.canAvoidRddSerialization(level)) {
        // if it is memory only reduce the overhead of moving to bytes
        this.rdd = getRDD().persist(level);
    } else {
        // Caching can cause Serialization, we need to code to bytes
        // more details in https://issues.apache.org/jira/browse/BEAM-2669
        Coder<WindowedValue<T>> windowedValueCoder = (Coder<WindowedValue<T>>) coder;
        this.rdd = getRDD().map(v -> ValueAndCoderLazySerializable.of(v, windowedValueCoder)).persist(level).map(v -> v.getOrDecode(windowedValueCoder));
    }
}
Also used : WindowedValue(org.apache.beam.sdk.util.WindowedValue) JavaRDDLike(org.apache.spark.api.java.JavaRDDLike) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) Coder(org.apache.beam.sdk.coders.Coder) WindowFn(org.apache.beam.sdk.transforms.windowing.WindowFn) PCollection(org.apache.beam.sdk.values.PCollection) Collectors(java.util.stream.Collectors) CoderHelpers(org.apache.beam.runners.spark.coders.CoderHelpers) List(java.util.List) StorageLevel(org.apache.spark.storage.StorageLevel) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) Iterables(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables) GlobalWindows(org.apache.beam.sdk.transforms.windowing.GlobalWindows) JavaRDD(org.apache.spark.api.java.JavaRDD) Nullable(org.checkerframework.checker.nullness.qual.Nullable) Coder(org.apache.beam.sdk.coders.Coder) WindowedValue(org.apache.beam.sdk.util.WindowedValue) StorageLevel(org.apache.spark.storage.StorageLevel)

Example 32 with Coder

use of org.apache.beam.sdk.coders.Coder in project beam by apache.

the class DoFnFunction method prepareSerialization.

/**
 * prepares the DoFnFunction class so it can be serialized properly. This involves using various
 * protobuf's and byte arrays which are later converted back into the proper classes during
 * deserialization.
 */
private void prepareSerialization() {
    SdkComponents components = SdkComponents.create();
    components.registerEnvironment(Environments.createOrGetDefaultEnvironment(pipelineOptions.as(PortablePipelineOptions.class)));
    this.serializedOptions = new SerializablePipelineOptions(pipelineOptions).toString();
    doFnwithEx = ParDoTranslation.translateDoFn(this.doFn, mainOutput, sideInputMapping, doFnSchemaInformation, components);
    doFnwithExBytes = doFnwithEx.getPayload().toByteArray();
    outputCodersBytes = new HashMap<>();
    try {
        coderBytes = SerializableUtils.serializeToByteArray(inputCoder);
        windowStrategyProto = WindowingStrategyTranslation.toMessageProto(windowingStrategy, components);
        windowBytes = windowStrategyProto.toByteArray();
        for (Map.Entry<TupleTag<?>, Coder<?>> entry : outputCoders.entrySet()) {
            outputCodersBytes.put(entry.getKey().getId(), SerializableUtils.serializeToByteArray(entry.getValue()));
        }
        sideInputBytes = new HashMap<>();
        for (Map.Entry<TupleTag<?>, WindowingStrategy<?, ?>> entry : sideInputs.entrySet()) {
            windowStrategyProto = WindowingStrategyTranslation.toMessageProto(entry.getValue(), components);
            sideInputBytes.put(entry.getKey().getId(), windowStrategyProto.toByteArray());
        }
        serializedSideOutputs = new ArrayList<>();
        for (TupleTag<?> sideOutput : sideOutputs) {
            serializedSideOutputs.add(sideOutput.getId());
        }
        serializedOutputMap = new HashMap<>();
        for (Map.Entry<TupleTag<?>, Integer> entry : outputMap.entrySet()) {
            serializedOutputMap.put(entry.getKey().getId(), entry.getValue());
        }
    } catch (IOException e) {
        LOG.info(e.getMessage());
    }
}
Also used : Coder(org.apache.beam.sdk.coders.Coder) TupleTag(org.apache.beam.sdk.values.TupleTag) IOException(java.io.IOException) SdkComponents(org.apache.beam.runners.core.construction.SdkComponents) WindowingStrategy(org.apache.beam.sdk.values.WindowingStrategy) SerializablePipelineOptions(org.apache.beam.runners.core.construction.SerializablePipelineOptions) HashMap(java.util.HashMap) Map(java.util.Map)

Example 33 with Coder

use of org.apache.beam.sdk.coders.Coder in project beam by apache.

the class Twister2SideInputReader method getMultimapSideInput.

private <T> T getMultimapSideInput(PCollectionView<T> view, BoundedWindow window) {
    Map<BoundedWindow, List<WindowedValue<?>>> partitionedElements = getPartitionedElements(view);
    Map<BoundedWindow, T> resultMap = new HashMap<>();
    ViewFn<MultimapView, T> viewFn = (ViewFn<MultimapView, T>) view.getViewFn();
    for (Map.Entry<BoundedWindow, List<WindowedValue<?>>> elements : partitionedElements.entrySet()) {
        Coder keyCoder = ((KvCoder<?, ?>) view.getCoderInternal()).getKeyCoder();
        resultMap.put(elements.getKey(), viewFn.apply(InMemoryMultimapSideInputView.fromIterable(keyCoder, (Iterable) elements.getValue().stream().map(WindowedValue::getValue).collect(Collectors.toList()))));
    }
    T result = resultMap.get(window);
    if (result == null) {
        result = viewFn.apply(InMemoryMultimapSideInputView.empty());
    }
    return result;
}
Also used : KvCoder(org.apache.beam.sdk.coders.KvCoder) Coder(org.apache.beam.sdk.coders.Coder) HashMap(java.util.HashMap) MultimapView(org.apache.beam.sdk.transforms.Materializations.MultimapView) KvCoder(org.apache.beam.sdk.coders.KvCoder) ViewFn(org.apache.beam.sdk.transforms.ViewFn) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) ArrayList(java.util.ArrayList) List(java.util.List) HashMap(java.util.HashMap) Map(java.util.Map)

Example 34 with Coder

use of org.apache.beam.sdk.coders.Coder in project beam by apache.

the class WriteFiles method expand.

@Override
public WriteFilesResult<DestinationT> expand(PCollection<UserT> input) {
    if (input.isBounded() == IsBounded.UNBOUNDED) {
        checkArgument(getWindowedWrites(), "Must use windowed writes when applying %s to an unbounded PCollection", WriteFiles.class.getSimpleName());
        // Check merging window here due to https://issues.apache.org/jira/browse/BEAM-12040.
        if (input.getWindowingStrategy().needsMerge()) {
            checkArgument(getComputeNumShards() != null || getNumShardsProvider() != null, "When applying %s to an unbounded PCollection with merging windows," + " must specify number of output shards explicitly", WriteFiles.class.getSimpleName());
        }
    }
    this.writeOperation = getSink().createWriteOperation();
    if (getWindowedWrites()) {
        this.writeOperation.setWindowedWrites();
    } else {
        // Re-window the data into the global window and remove any existing triggers.
        input = input.apply("RewindowIntoGlobal", Window.<UserT>into(new GlobalWindows()).triggering(DefaultTrigger.of()).discardingFiredPanes());
    }
    Coder<DestinationT> destinationCoder;
    try {
        destinationCoder = getDynamicDestinations().getDestinationCoderWithDefault(input.getPipeline().getCoderRegistry());
        destinationCoder.verifyDeterministic();
    } catch (CannotProvideCoderException | NonDeterministicException e) {
        throw new RuntimeException(e);
    }
    @SuppressWarnings("unchecked") Coder<BoundedWindow> windowCoder = (Coder<BoundedWindow>) input.getWindowingStrategy().getWindowFn().windowCoder();
    FileResultCoder<DestinationT> fileResultCoder = FileResultCoder.of(windowCoder, destinationCoder);
    PCollectionView<Integer> numShardsView = (getComputeNumShards() == null) ? null : input.apply(getComputeNumShards());
    boolean fixedSharding = getComputeNumShards() != null || getNumShardsProvider() != null;
    PCollection<List<FileResult<DestinationT>>> tempFileResults;
    if (fixedSharding) {
        tempFileResults = input.apply("WriteShardedBundlesToTempFiles", new WriteShardedBundlesToTempFiles(destinationCoder, fileResultCoder, numShardsView)).apply("GatherTempFileResults", new GatherResults<>(fileResultCoder));
    } else {
        if (input.isBounded() == IsBounded.BOUNDED) {
            tempFileResults = input.apply("WriteUnshardedBundlesToTempFiles", new WriteUnshardedBundlesToTempFiles(destinationCoder, fileResultCoder)).apply("GatherTempFileResults", new GatherResults<>(fileResultCoder));
        } else {
            tempFileResults = input.apply("WriteAutoShardedBundlesToTempFiles", new WriteAutoShardedBundlesToTempFiles(destinationCoder, fileResultCoder));
        }
    }
    return tempFileResults.apply("FinalizeTempFileBundles", new FinalizeTempFileBundles(numShardsView, destinationCoder));
}
Also used : ListCoder(org.apache.beam.sdk.coders.ListCoder) KvCoder(org.apache.beam.sdk.coders.KvCoder) Coder(org.apache.beam.sdk.coders.Coder) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) IterableCoder(org.apache.beam.sdk.coders.IterableCoder) ShardedKeyCoder(org.apache.beam.sdk.coders.ShardedKeyCoder) FileResultCoder(org.apache.beam.sdk.io.FileBasedSink.FileResultCoder) VarIntCoder(org.apache.beam.sdk.coders.VarIntCoder) GlobalWindows(org.apache.beam.sdk.transforms.windowing.GlobalWindows) NonDeterministicException(org.apache.beam.sdk.coders.Coder.NonDeterministicException) CannotProvideCoderException(org.apache.beam.sdk.coders.CannotProvideCoderException) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) PCollectionList(org.apache.beam.sdk.values.PCollectionList) List(java.util.List) TupleTagList(org.apache.beam.sdk.values.TupleTagList) ArrayList(java.util.ArrayList)

Example 35 with Coder

use of org.apache.beam.sdk.coders.Coder in project beam by apache.

the class ParDo method codersForStateSpecTypes.

/**
 * Try to provide coders for as many of the type arguments of given {@link
 * DoFnSignature.StateDeclaration} as possible.
 */
private static <InputT> Coder[] codersForStateSpecTypes(DoFnSignature.StateDeclaration stateDeclaration, CoderRegistry coderRegistry, Coder<InputT> inputCoder) {
    Type stateType = stateDeclaration.stateType().getType();
    if (!(stateType instanceof ParameterizedType)) {
        // No type arguments means no coders to infer.
        return new Coder[0];
    }
    Type[] typeArguments = ((ParameterizedType) stateType).getActualTypeArguments();
    Coder[] coders = new Coder[typeArguments.length];
    for (int i = 0; i < typeArguments.length; i++) {
        Type typeArgument = typeArguments[i];
        TypeDescriptor<?> typeDescriptor = TypeDescriptor.of(typeArgument);
        try {
            coders[i] = coderRegistry.getCoder(typeDescriptor);
        } catch (CannotProvideCoderException e) {
            try {
                coders[i] = coderRegistry.getCoder(typeDescriptor, inputCoder.getEncodedTypeDescriptor(), inputCoder);
            } catch (CannotProvideCoderException ignored) {
            // Since not all type arguments will have a registered coder we ignore this exception.
            }
        }
    }
    return coders;
}
Also used : ParameterizedType(java.lang.reflect.ParameterizedType) KvCoder(org.apache.beam.sdk.coders.KvCoder) SchemaCoder(org.apache.beam.sdk.schemas.SchemaCoder) Coder(org.apache.beam.sdk.coders.Coder) Type(java.lang.reflect.Type) ParameterizedType(java.lang.reflect.ParameterizedType) CannotProvideCoderException(org.apache.beam.sdk.coders.CannotProvideCoderException)

Aggregations

Coder (org.apache.beam.sdk.coders.Coder)117 KvCoder (org.apache.beam.sdk.coders.KvCoder)74 WindowedValue (org.apache.beam.sdk.util.WindowedValue)53 StringUtf8Coder (org.apache.beam.sdk.coders.StringUtf8Coder)44 Test (org.junit.Test)43 HashMap (java.util.HashMap)40 ArrayList (java.util.ArrayList)36 Map (java.util.Map)34 BoundedWindow (org.apache.beam.sdk.transforms.windowing.BoundedWindow)34 List (java.util.List)31 KV (org.apache.beam.sdk.values.KV)29 RunnerApi (org.apache.beam.model.pipeline.v1.RunnerApi)28 IterableCoder (org.apache.beam.sdk.coders.IterableCoder)28 PCollection (org.apache.beam.sdk.values.PCollection)28 TupleTag (org.apache.beam.sdk.values.TupleTag)23 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)23 IOException (java.io.IOException)21 PCollectionView (org.apache.beam.sdk.values.PCollectionView)21 Instant (org.joda.time.Instant)21 ImmutableMap (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap)20