Search in sources :

Example 1 with FileResultCoder

use of org.apache.beam.sdk.io.FileBasedSink.FileResultCoder in project beam by apache.

the class WriteFiles method expand.

@Override
public WriteFilesResult<DestinationT> expand(PCollection<UserT> input) {
    if (input.isBounded() == IsBounded.UNBOUNDED) {
        checkArgument(getWindowedWrites(), "Must use windowed writes when applying %s to an unbounded PCollection", WriteFiles.class.getSimpleName());
        // Check merging window here due to https://issues.apache.org/jira/browse/BEAM-12040.
        if (input.getWindowingStrategy().needsMerge()) {
            checkArgument(getComputeNumShards() != null || getNumShardsProvider() != null, "When applying %s to an unbounded PCollection with merging windows," + " must specify number of output shards explicitly", WriteFiles.class.getSimpleName());
        }
    }
    this.writeOperation = getSink().createWriteOperation();
    if (getWindowedWrites()) {
        this.writeOperation.setWindowedWrites();
    } else {
        // Re-window the data into the global window and remove any existing triggers.
        input = input.apply("RewindowIntoGlobal", Window.<UserT>into(new GlobalWindows()).triggering(DefaultTrigger.of()).discardingFiredPanes());
    }
    Coder<DestinationT> destinationCoder;
    try {
        destinationCoder = getDynamicDestinations().getDestinationCoderWithDefault(input.getPipeline().getCoderRegistry());
        destinationCoder.verifyDeterministic();
    } catch (CannotProvideCoderException | NonDeterministicException e) {
        throw new RuntimeException(e);
    }
    @SuppressWarnings("unchecked") Coder<BoundedWindow> windowCoder = (Coder<BoundedWindow>) input.getWindowingStrategy().getWindowFn().windowCoder();
    FileResultCoder<DestinationT> fileResultCoder = FileResultCoder.of(windowCoder, destinationCoder);
    PCollectionView<Integer> numShardsView = (getComputeNumShards() == null) ? null : input.apply(getComputeNumShards());
    boolean fixedSharding = getComputeNumShards() != null || getNumShardsProvider() != null;
    PCollection<List<FileResult<DestinationT>>> tempFileResults;
    if (fixedSharding) {
        tempFileResults = input.apply("WriteShardedBundlesToTempFiles", new WriteShardedBundlesToTempFiles(destinationCoder, fileResultCoder, numShardsView)).apply("GatherTempFileResults", new GatherResults<>(fileResultCoder));
    } else {
        if (input.isBounded() == IsBounded.BOUNDED) {
            tempFileResults = input.apply("WriteUnshardedBundlesToTempFiles", new WriteUnshardedBundlesToTempFiles(destinationCoder, fileResultCoder)).apply("GatherTempFileResults", new GatherResults<>(fileResultCoder));
        } else {
            tempFileResults = input.apply("WriteAutoShardedBundlesToTempFiles", new WriteAutoShardedBundlesToTempFiles(destinationCoder, fileResultCoder));
        }
    }
    return tempFileResults.apply("FinalizeTempFileBundles", new FinalizeTempFileBundles(numShardsView, destinationCoder));
}
Also used : ListCoder(org.apache.beam.sdk.coders.ListCoder) KvCoder(org.apache.beam.sdk.coders.KvCoder) Coder(org.apache.beam.sdk.coders.Coder) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) IterableCoder(org.apache.beam.sdk.coders.IterableCoder) ShardedKeyCoder(org.apache.beam.sdk.coders.ShardedKeyCoder) FileResultCoder(org.apache.beam.sdk.io.FileBasedSink.FileResultCoder) VarIntCoder(org.apache.beam.sdk.coders.VarIntCoder) GlobalWindows(org.apache.beam.sdk.transforms.windowing.GlobalWindows) NonDeterministicException(org.apache.beam.sdk.coders.Coder.NonDeterministicException) CannotProvideCoderException(org.apache.beam.sdk.coders.CannotProvideCoderException) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) PCollectionList(org.apache.beam.sdk.values.PCollectionList) List(java.util.List) TupleTagList(org.apache.beam.sdk.values.TupleTagList) ArrayList(java.util.ArrayList)

Aggregations

ArrayList (java.util.ArrayList)1 List (java.util.List)1 CannotProvideCoderException (org.apache.beam.sdk.coders.CannotProvideCoderException)1 Coder (org.apache.beam.sdk.coders.Coder)1 NonDeterministicException (org.apache.beam.sdk.coders.Coder.NonDeterministicException)1 IterableCoder (org.apache.beam.sdk.coders.IterableCoder)1 KvCoder (org.apache.beam.sdk.coders.KvCoder)1 ListCoder (org.apache.beam.sdk.coders.ListCoder)1 ShardedKeyCoder (org.apache.beam.sdk.coders.ShardedKeyCoder)1 StringUtf8Coder (org.apache.beam.sdk.coders.StringUtf8Coder)1 VarIntCoder (org.apache.beam.sdk.coders.VarIntCoder)1 FileResultCoder (org.apache.beam.sdk.io.FileBasedSink.FileResultCoder)1 BoundedWindow (org.apache.beam.sdk.transforms.windowing.BoundedWindow)1 GlobalWindows (org.apache.beam.sdk.transforms.windowing.GlobalWindows)1 PCollectionList (org.apache.beam.sdk.values.PCollectionList)1 TupleTagList (org.apache.beam.sdk.values.TupleTagList)1