Search in sources :

Example 1 with IsmRecordCoder

use of org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecordCoder in project beam by apache.

the class IsmSinkFactory method create.

@Override
public Sink<?> create(CloudObject spec, @Nullable Coder<?> coder, @Nullable PipelineOptions options, @Nullable DataflowExecutionContext executionContext, DataflowOperationContext operationContext) throws Exception {
    options = checkArgumentNotNull(options);
    coder = checkArgumentNotNull(coder);
    // The validity of this coder is checked in detail by the typed create, below
    @SuppressWarnings("unchecked") Coder<WindowedValue<IsmRecord<Object>>> typedCoder = (Coder<WindowedValue<IsmRecord<Object>>>) coder;
    String filename = getString(spec, WorkerPropertyNames.FILENAME);
    checkArgument(typedCoder instanceof WindowedValueCoder, "%s only supports using %s but got %s.", IsmSink.class, WindowedValueCoder.class, typedCoder);
    WindowedValueCoder<IsmRecord<Object>> windowedCoder = (WindowedValueCoder<IsmRecord<Object>>) typedCoder;
    checkArgument(windowedCoder.getValueCoder() instanceof IsmRecordCoder, "%s only supports using %s but got %s.", IsmSink.class, IsmRecordCoder.class, windowedCoder.getValueCoder());
    @SuppressWarnings("unchecked") IsmRecordCoder<Object> ismCoder = (IsmRecordCoder<Object>) windowedCoder.getValueCoder();
    long bloomFilterSizeLimitBytes = Math.max(MIN_BLOOM_FILTER_SIZE_BYTES, DoubleMath.roundToLong(BLOOM_FILTER_SIZE_LIMIT_MULTIPLIER * options.as(DataflowWorkerHarnessOptions.class).getWorkerCacheMb() * // Note the conversion from MiB to bytes
    1024 * 1024, RoundingMode.DOWN));
    return new IsmSink<>(FileSystems.matchNewResource(filename, false), ismCoder, bloomFilterSizeLimitBytes);
}
Also used : WindowedValueCoder(org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder) Coder(org.apache.beam.sdk.coders.Coder) IsmRecordCoder(org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecordCoder) IsmRecord(org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecord) Structs.getString(org.apache.beam.runners.dataflow.util.Structs.getString) IsmRecordCoder(org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecordCoder) WindowedValueCoder(org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder) WindowedValue(org.apache.beam.sdk.util.WindowedValue) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject)

Example 2 with IsmRecordCoder

use of org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecordCoder in project beam by apache.

the class IsmReaderFactory method createImpl.

<V> NativeReader<?> createImpl(CloudObject spec, Coder<?> coder, PipelineOptions options, DataflowExecutionContext executionContext, DataflowOperationContext operationContext) throws Exception {
    final ResourceId resourceId = FileSystems.matchNewResource(getString(spec, WorkerPropertyNames.FILENAME), false);
    checkArgument(coder instanceof WindowedValueCoder, "%s only supports using %s but got %s.", IsmReader.class, WindowedValueCoder.class, coder);
    @SuppressWarnings("unchecked") WindowedValueCoder<IsmRecord<V>> windowedCoder = (WindowedValueCoder<IsmRecord<V>>) coder;
    checkArgument(windowedCoder.getValueCoder() instanceof IsmRecordCoder, "%s only supports using %s but got %s.", IsmReader.class, IsmRecordCoder.class, windowedCoder.getValueCoder());
    @SuppressWarnings("unchecked") final IsmRecordCoder<V> ismCoder = (IsmRecordCoder<V>) windowedCoder.getValueCoder();
    checkArgument(executionContext instanceof BatchModeExecutionContext, "%s only supports using %s but got %s.", IsmReader.class, BatchModeExecutionContext.class, executionContext);
    final BatchModeExecutionContext execContext = (BatchModeExecutionContext) executionContext;
    // the same file.
    return execContext.<IsmReaderKey, NativeReader<?>>getLogicalReferenceCache().get(new IsmReaderKey(resourceId.toString()), () -> new IsmReaderImpl<V>(resourceId, ismCoder, execContext.<IsmReaderImpl.IsmShardKey, WeightedValue<NavigableMap<RandomAccessData, WindowedValue<IsmRecord<V>>>>>getDataCache()));
}
Also used : RandomAccessData(org.apache.beam.runners.dataflow.util.RandomAccessData) IsmRecord(org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecord) WeightedValue(org.apache.beam.sdk.util.WeightedValue) IsmRecordCoder(org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecordCoder) WindowedValueCoder(org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder) ResourceId(org.apache.beam.sdk.io.fs.ResourceId) WindowedValue(org.apache.beam.sdk.util.WindowedValue)

Example 3 with IsmRecordCoder

use of org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecordCoder in project beam by apache.

the class CreateIsmShardKeyAndSortKeyDoFnFactoryTest method testConversionOfRecord.

@Test
public void testConversionOfRecord() throws Exception {
    ParDoFn parDoFn = new CreateIsmShardKeyAndSortKeyDoFnFactory().create(null, /* pipeline options */
    CloudObject.fromSpec(ImmutableMap.of(PropertyNames.OBJECT_TYPE_NAME, "CreateIsmShardKeyAndSortKeyDoFn", PropertyNames.ENCODING, createIsmRecordEncoding())), null, /* side input infos */
    null, /* main output tag */
    null, /* output tag to receiver index */
    null, /* exection context */
    null);
    List<Object> outputReceiver = new ArrayList<>();
    parDoFn.startBundle(outputReceiver::add);
    parDoFn.processElement(valueInGlobalWindow(KV.of(42, 43)));
    IsmRecordCoder<?> coder = (IsmRecordCoder) CloudObjects.coderFromCloudObject(CloudObject.fromSpec(createIsmRecordEncoding()));
    assertThat(outputReceiver, contains(valueInGlobalWindow(KV.of(coder.hash(ImmutableList.of(42)), /* hash key */
    KV.of(KV.of(42, GlobalWindow.INSTANCE), /* sort key */
    43)))));
}
Also used : IsmRecordCoder(org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecordCoder) ArrayList(java.util.ArrayList) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Aggregations

IsmRecordCoder (org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecordCoder)3 IsmRecord (org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecord)2 CloudObject (org.apache.beam.runners.dataflow.util.CloudObject)2 WindowedValue (org.apache.beam.sdk.util.WindowedValue)2 WindowedValueCoder (org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder)2 ArrayList (java.util.ArrayList)1 RandomAccessData (org.apache.beam.runners.dataflow.util.RandomAccessData)1 Structs.getString (org.apache.beam.runners.dataflow.util.Structs.getString)1 ParDoFn (org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn)1 Coder (org.apache.beam.sdk.coders.Coder)1 ResourceId (org.apache.beam.sdk.io.fs.ResourceId)1 WeightedValue (org.apache.beam.sdk.util.WeightedValue)1 Test (org.junit.Test)1