Search in sources :

Example 11 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class PartialGroupByKeyParDoFnsTest method testPartialGroupByKey.

@Test
public void testPartialGroupByKey() throws Exception {
    Coder keyCoder = StringUtf8Coder.of();
    Coder valueCoder = BigEndianIntegerCoder.of();
    TestOutputReceiver receiver = new TestOutputReceiver(new ElementByteSizeObservableCoder(WindowedValue.getValueOnlyCoder(KvCoder.of(keyCoder, IterableCoder.of(valueCoder)))), counterSet, NameContextsForTests.nameContextForTest());
    ParDoFn pgbkParDoFn = new SimplePartialGroupByKeyParDoFn(GroupingTables.buffering(new WindowingCoderGroupingKeyCreator(keyCoder), PairInfo.create(), new CoderSizeEstimator(WindowedValue.getValueOnlyCoder(keyCoder)), new CoderSizeEstimator(valueCoder)), receiver);
    pgbkParDoFn.startBundle(receiver);
    pgbkParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("hi", 4)));
    pgbkParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("there", 5)));
    pgbkParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("hi", 6)));
    pgbkParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("joe", 7)));
    pgbkParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("there", 8)));
    pgbkParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("hi", 9)));
    pgbkParDoFn.finishBundle();
    assertThat(receiver.outputElems, IsIterableContainingInAnyOrder.<Object>containsInAnyOrder(WindowedValue.valueInGlobalWindow(KV.of("hi", Arrays.asList(4, 6, 9))), WindowedValue.valueInGlobalWindow(KV.of("there", Arrays.asList(5, 8))), WindowedValue.valueInGlobalWindow(KV.of("joe", Arrays.asList(7)))));
    // Exact counter values depend on size of encoded data.  If encoding
    // changes, then these expected counters should change to match.
    CounterUpdateExtractor<?> updateExtractor = Mockito.mock(CounterUpdateExtractor.class);
    counterSet.extractUpdates(false, updateExtractor);
    verify(updateExtractor).longSum(getObjectCounterName("test_receiver_out"), false, 3L);
    verify(updateExtractor).longMean(getMeanByteCounterName("test_receiver_out"), false, LongCounterMean.ZERO.addValue(49L, 3));
    verifyNoMoreInteractions(updateExtractor);
}
Also used : CoderSizeEstimator(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.CoderSizeEstimator) ElementByteSizeObservableCoder(org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.ElementByteSizeObservableCoder) KvCoder(org.apache.beam.sdk.coders.KvCoder) BigEndianIntegerCoder(org.apache.beam.sdk.coders.BigEndianIntegerCoder) Coder(org.apache.beam.sdk.coders.Coder) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) IterableCoder(org.apache.beam.sdk.coders.IterableCoder) WindowingCoderGroupingKeyCreator(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.WindowingCoderGroupingKeyCreator) ElementByteSizeObservableCoder(org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.ElementByteSizeObservableCoder) BatchSideInputPGBKParDoFn(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.BatchSideInputPGBKParDoFn) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) StreamingSideInputPGBKParDoFn(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.StreamingSideInputPGBKParDoFn) SimplePartialGroupByKeyParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.SimplePartialGroupByKeyParDoFn) SimplePartialGroupByKeyParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.SimplePartialGroupByKeyParDoFn) TestOutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver) Test(org.junit.Test)

Example 12 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class PartialGroupByKeyParDoFnsTest method testCreateWithCombinerAndStreamingSideInputs.

@Test
public void testCreateWithCombinerAndStreamingSideInputs() throws Exception {
    StreamingOptions options = PipelineOptionsFactory.as(StreamingOptions.class);
    options.setStreaming(true);
    Coder keyCoder = StringUtf8Coder.of();
    Coder valueCoder = BigEndianIntegerCoder.of();
    KvCoder<String, Integer> kvCoder = KvCoder.of(keyCoder, valueCoder);
    TestOutputReceiver receiver = new TestOutputReceiver(new ElementByteSizeObservableCoder(WindowedValue.getValueOnlyCoder(kvCoder)), counterSet, NameContextsForTests.nameContextForTest());
    when(mockSideInputReader.isEmpty()).thenReturn(false);
    when(mockStreamingStepContext.stateInternals()).thenReturn((StateInternals) mockStateInternals);
    when(mockStateInternals.state(Matchers.<StateNamespace>any(), Matchers.<StateTag>any())).thenReturn(mockState);
    when(mockState.read()).thenReturn(Maps.newHashMap());
    ParDoFn pgbk = PartialGroupByKeyParDoFns.create(options, kvCoder, AppliedCombineFn.withInputCoder(Sum.ofIntegers(), CoderRegistry.createDefault(), kvCoder, ImmutableList.<PCollectionView<?>>of(), WindowingStrategy.globalDefault()), mockSideInputReader, receiver, mockStreamingStepContext);
    assertTrue(pgbk instanceof StreamingSideInputPGBKParDoFn);
}
Also used : ElementByteSizeObservableCoder(org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.ElementByteSizeObservableCoder) KvCoder(org.apache.beam.sdk.coders.KvCoder) BigEndianIntegerCoder(org.apache.beam.sdk.coders.BigEndianIntegerCoder) Coder(org.apache.beam.sdk.coders.Coder) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) IterableCoder(org.apache.beam.sdk.coders.IterableCoder) PCollectionView(org.apache.beam.sdk.values.PCollectionView) StreamingOptions(org.apache.beam.sdk.options.StreamingOptions) ElementByteSizeObservableCoder(org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.ElementByteSizeObservableCoder) BatchSideInputPGBKParDoFn(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.BatchSideInputPGBKParDoFn) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) StreamingSideInputPGBKParDoFn(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.StreamingSideInputPGBKParDoFn) SimplePartialGroupByKeyParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.SimplePartialGroupByKeyParDoFn) TestOutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver) StreamingSideInputPGBKParDoFn(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.StreamingSideInputPGBKParDoFn) Test(org.junit.Test)

Example 13 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class PairWithConstantKeyDoFnFactoryTest method testConversionOfRecord.

@Test
public void testConversionOfRecord() throws Exception {
    ParDoFn parDoFn = new PairWithConstantKeyDoFnFactory().create(null, /* pipeline options */
    CloudObject.fromSpec(ImmutableMap.of(PropertyNames.OBJECT_TYPE_NAME, "PairWithConstantKeyDoFn", WorkerPropertyNames.ENCODED_KEY, StringUtils.byteArrayToJsonString(CoderUtils.encodeToByteArray(BigEndianIntegerCoder.of(), 42)), PropertyNames.ENCODING, ImmutableMap.of(PropertyNames.OBJECT_TYPE_NAME, "kind:fixed_big_endian_int32"))), null, /* side input infos */
    null, /* main output tag */
    null, /* output tag to receiver index */
    null, /* exection context */
    null);
    List<Object> outputReceiver = new ArrayList<>();
    parDoFn.startBundle(outputReceiver::add);
    parDoFn.processElement(valueInGlobalWindow(43));
    assertThat(outputReceiver, contains(valueInGlobalWindow(KV.of(42, 43))));
}
Also used : ArrayList(java.util.ArrayList) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Example 14 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class ReifyTimestampAndWindowsParDoFnFactoryTest method verifyReifiedIsInTheSameWindows.

private <K, V> void verifyReifiedIsInTheSameWindows(WindowedValue<KV<K, V>> elem) throws Exception {
    ParDoFn reifyFn = new ReifyTimestampAndWindowsParDoFnFactory().create(null, null, null, null, null, null, null);
    SingleValueReceiver<WindowedValue<KV<K, WindowedValue<V>>>> receiver = new SingleValueReceiver<>();
    reifyFn.startBundle(receiver);
    // The important thing to test that is not just a restatement of the ParDoFn is that
    // it only produces one element per input
    reifyFn.processElement(elem);
    assertThat(receiver.reified.getValue().getKey(), equalTo(elem.getValue().getKey()));
    assertThat(receiver.reified.getValue().getValue().getValue(), equalTo(elem.getValue().getValue()));
    assertThat(receiver.reified.getValue().getValue().getTimestamp(), equalTo(elem.getTimestamp()));
    assertThat(receiver.reified.getValue().getValue().getWindows(), equalTo(elem.getWindows()));
    assertThat(receiver.reified.getValue().getValue().getPane(), equalTo(elem.getPane()));
}
Also used : WindowedValue(org.apache.beam.sdk.util.WindowedValue) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn)

Example 15 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class UserParDoFnFactory method create.

@Override
public ParDoFn create(PipelineOptions options, CloudObject cloudUserFn, @Nullable List<SideInputInfo> sideInputInfos, TupleTag<?> mainOutputTag, Map<TupleTag<?>, Integer> outputTupleTagsToReceiverIndices, DataflowExecutionContext<?> executionContext, DataflowOperationContext operationContext) throws Exception {
    DoFnInstanceManager instanceManager = fnCache.get(operationContext.nameContext().systemName(), () -> DoFnInstanceManagers.cloningPool(doFnExtractor.getDoFnInfo(cloudUserFn), options));
    DoFnInfo<?, ?> doFnInfo = instanceManager.peek();
    DataflowExecutionContext.DataflowStepContext stepContext = executionContext.getStepContext(operationContext);
    Iterable<PCollectionView<?>> sideInputViews = doFnInfo.getSideInputViews();
    SideInputReader sideInputReader = executionContext.getSideInputReader(sideInputInfos, sideInputViews, operationContext);
    if (doFnInfo.getDoFn() instanceof BatchStatefulParDoOverrides.BatchStatefulDoFn) {
        // HACK: BatchStatefulDoFn is a class from DataflowRunner's overrides
        // that just instructs the worker to execute it differently. This will
        // be replaced by metadata in the Runner API payload
        BatchStatefulParDoOverrides.BatchStatefulDoFn fn = (BatchStatefulParDoOverrides.BatchStatefulDoFn) doFnInfo.getDoFn();
        DoFn underlyingFn = fn.getUnderlyingDoFn();
        return new BatchModeUngroupingParDoFn((BatchModeExecutionContext.StepContext) stepContext, new SimpleParDoFn(options, DoFnInstanceManagers.singleInstance(doFnInfo.withFn(underlyingFn)), sideInputReader, doFnInfo.getMainOutput(), outputTupleTagsToReceiverIndices, stepContext, operationContext, doFnInfo.getDoFnSchemaInformation(), doFnInfo.getSideInputMapping(), runnerFactory));
    } else if (doFnInfo.getDoFn() instanceof StreamingPCollectionViewWriterFn) {
        // HACK: StreamingPCollectionViewWriterFn is a class from
        // DataflowPipelineTranslator. Using the class as an indicator is a migration path
        // to simply having an indicator string.
        checkArgument(stepContext instanceof StreamingModeExecutionContext.StreamingModeStepContext, "stepContext must be a StreamingModeStepContext to use StreamingPCollectionViewWriterFn");
        DataflowRunner.StreamingPCollectionViewWriterFn<Object> writerFn = (StreamingPCollectionViewWriterFn<Object>) doFnInfo.getDoFn();
        return new StreamingPCollectionViewWriterParDoFn((StreamingModeExecutionContext.StreamingModeStepContext) stepContext, writerFn.getView().getTagInternal(), writerFn.getDataCoder(), (Coder<BoundedWindow>) doFnInfo.getWindowingStrategy().getWindowFn().windowCoder());
    } else {
        return new SimpleParDoFn(options, instanceManager, sideInputReader, doFnInfo.getMainOutput(), outputTupleTagsToReceiverIndices, stepContext, operationContext, doFnInfo.getDoFnSchemaInformation(), doFnInfo.getSideInputMapping(), runnerFactory);
    }
}
Also used : Coder(org.apache.beam.sdk.coders.Coder) SideInputReader(org.apache.beam.runners.core.SideInputReader) BatchStatefulParDoOverrides(org.apache.beam.runners.dataflow.BatchStatefulParDoOverrides) StreamingPCollectionViewWriterFn(org.apache.beam.runners.dataflow.DataflowRunner.StreamingPCollectionViewWriterFn) PCollectionView(org.apache.beam.sdk.values.PCollectionView) DoFn(org.apache.beam.sdk.transforms.DoFn) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject)

Aggregations

ParDoFn (org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn)34 Test (org.junit.Test)26 CloudObject (org.apache.beam.runners.dataflow.util.CloudObject)18 OutputReceiver (org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver)10 Coder (org.apache.beam.sdk.coders.Coder)9 KvCoder (org.apache.beam.sdk.coders.KvCoder)9 CounterSet (org.apache.beam.runners.dataflow.worker.counters.CounterSet)7 StringUtf8Coder (org.apache.beam.sdk.coders.StringUtf8Coder)7 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)7 ElementByteSizeObservableCoder (org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.ElementByteSizeObservableCoder)6 BatchSideInputPGBKParDoFn (org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.BatchSideInputPGBKParDoFn)6 StreamingSideInputPGBKParDoFn (org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.StreamingSideInputPGBKParDoFn)6 SimplePartialGroupByKeyParDoFn (org.apache.beam.runners.dataflow.worker.util.common.worker.SimplePartialGroupByKeyParDoFn)6 TestOutputReceiver (org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver)6 BigEndianIntegerCoder (org.apache.beam.sdk.coders.BigEndianIntegerCoder)6 IterableCoder (org.apache.beam.sdk.coders.IterableCoder)6 TupleTag (org.apache.beam.sdk.values.TupleTag)6 ArrayList (java.util.ArrayList)5 Structs.addString (org.apache.beam.runners.dataflow.util.Structs.addString)5 Receiver (org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver)5