Search in sources :

Example 21 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class CombineValuesFnFactoryTest method testCombineValuesFnMerge.

@Test
public void testCombineValuesFnMerge() throws Exception {
    TestReceiver receiver = new TestReceiver();
    MeanInts mean = new MeanInts();
    Combine.CombineFn<Integer, CountSum, String> combiner = mean;
    ParDoFn combineParDoFn = createCombineValuesFn(CombinePhase.MERGE, combiner, StringUtf8Coder.of(), BigEndianIntegerCoder.of(), new CountSumCoder(), WindowingStrategy.globalDefault());
    combineParDoFn.startBundle(receiver);
    combineParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("a", Arrays.asList(new CountSum(3, 6), new CountSum(2, 9), new CountSum(1, 12)))));
    combineParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("b", Arrays.asList(new CountSum(2, 20), new CountSum(1, 1)))));
    combineParDoFn.finishBundle();
    Object[] expectedReceivedElems = { WindowedValue.valueInGlobalWindow(KV.of("a", new CountSum(6, 27))), WindowedValue.valueInGlobalWindow(KV.of("b", new CountSum(3, 21))) };
    assertArrayEquals(expectedReceivedElems, receiver.receivedElems.toArray());
}
Also used : Combine(org.apache.beam.sdk.transforms.Combine) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) Structs.addString(org.apache.beam.runners.dataflow.util.Structs.addString) StringUtils.byteArrayToJsonString(org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Example 22 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class CombineValuesFnFactoryTest method testCombineValuesFnExtract.

@Test
public void testCombineValuesFnExtract() throws Exception {
    TestReceiver receiver = new TestReceiver();
    MeanInts mean = new MeanInts();
    Combine.CombineFn<Integer, CountSum, String> combiner = mean;
    ParDoFn combineParDoFn = createCombineValuesFn(CombinePhase.EXTRACT, combiner, StringUtf8Coder.of(), BigEndianIntegerCoder.of(), new CountSumCoder(), WindowingStrategy.globalDefault());
    combineParDoFn.startBundle(receiver);
    combineParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("a", new CountSum(6, 27))));
    combineParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("b", new CountSum(3, 21))));
    combineParDoFn.finishBundle();
    assertArrayEquals(new Object[] { WindowedValue.valueInGlobalWindow(KV.of("a", String.format("%.1f", 4.5))), WindowedValue.valueInGlobalWindow(KV.of("b", String.format("%.1f", 7.0))) }, receiver.receivedElems.toArray());
}
Also used : Combine(org.apache.beam.sdk.transforms.Combine) Structs.addString(org.apache.beam.runners.dataflow.util.Structs.addString) StringUtils.byteArrayToJsonString(org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Example 23 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class CreateIsmShardKeyAndSortKeyDoFnFactoryTest method testConversionOfRecord.

@Test
public void testConversionOfRecord() throws Exception {
    ParDoFn parDoFn = new CreateIsmShardKeyAndSortKeyDoFnFactory().create(null, /* pipeline options */
    CloudObject.fromSpec(ImmutableMap.of(PropertyNames.OBJECT_TYPE_NAME, "CreateIsmShardKeyAndSortKeyDoFn", PropertyNames.ENCODING, createIsmRecordEncoding())), null, /* side input infos */
    null, /* main output tag */
    null, /* output tag to receiver index */
    null, /* exection context */
    null);
    List<Object> outputReceiver = new ArrayList<>();
    parDoFn.startBundle(outputReceiver::add);
    parDoFn.processElement(valueInGlobalWindow(KV.of(42, 43)));
    IsmRecordCoder<?> coder = (IsmRecordCoder) CloudObjects.coderFromCloudObject(CloudObject.fromSpec(createIsmRecordEncoding()));
    assertThat(outputReceiver, contains(valueInGlobalWindow(KV.of(coder.hash(ImmutableList.of(42)), /* hash key */
    KV.of(KV.of(42, GlobalWindow.INSTANCE), /* sort key */
    43)))));
}
Also used : IsmRecordCoder(org.apache.beam.runners.dataflow.internal.IsmFormat.IsmRecordCoder) ArrayList(java.util.ArrayList) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Example 24 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class PartialGroupByKeyParDoFnsTest method testPartialGroupByKeyWithCombinerAndSideInputs.

@Test
public void testPartialGroupByKeyWithCombinerAndSideInputs() throws Exception {
    Coder keyCoder = StringUtf8Coder.of();
    Coder valueCoder = BigEndianIntegerCoder.of();
    TestOutputReceiver receiver = new TestOutputReceiver(new ElementByteSizeObservableCoder(WindowedValue.getValueOnlyCoder(KvCoder.of(keyCoder, valueCoder))), counterSet, NameContextsForTests.nameContextForTest());
    Combiner<WindowedValue<String>, Integer, Integer, Integer> combineFn = new TestCombiner();
    ParDoFn pgbkParDoFn = new StreamingSideInputPGBKParDoFn(GroupingTables.combining(new WindowingCoderGroupingKeyCreator(keyCoder), PairInfo.create(), combineFn, new CoderSizeEstimator(WindowedValue.getValueOnlyCoder(keyCoder)), new CoderSizeEstimator(valueCoder)), receiver, mockSideInputFetcher);
    Set<BoundedWindow> readyWindows = ImmutableSet.<BoundedWindow>of(GlobalWindow.INSTANCE);
    when(mockSideInputFetcher.getReadyWindows()).thenReturn(readyWindows);
    when(mockSideInputFetcher.prefetchElements(readyWindows)).thenReturn(ImmutableList.of(elemsBag));
    when(elemsBag.read()).thenReturn(ImmutableList.of(WindowedValue.valueInGlobalWindow(KV.of("hi", 4)), WindowedValue.valueInGlobalWindow(KV.of("there", 5))));
    when(mockSideInputFetcher.storeIfBlocked(Matchers.<WindowedValue<KV<String, Integer>>>any())).thenReturn(false, false, false, true);
    pgbkParDoFn.startBundle(receiver);
    pgbkParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("hi", 6)));
    pgbkParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("joe", 7)));
    pgbkParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("there", 8)));
    pgbkParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("hi", 9)));
    pgbkParDoFn.finishBundle();
    assertThat(receiver.outputElems, IsIterableContainingInAnyOrder.<Object>containsInAnyOrder(WindowedValue.valueInGlobalWindow(KV.of("hi", 10)), WindowedValue.valueInGlobalWindow(KV.of("there", 13)), WindowedValue.valueInGlobalWindow(KV.of("joe", 7))));
    // Exact counter values depend on size of encoded data.  If encoding
    // changes, then these expected counters should change to match.
    CounterUpdateExtractor<?> updateExtractor = Mockito.mock(CounterUpdateExtractor.class);
    counterSet.extractUpdates(false, updateExtractor);
    verify(updateExtractor).longSum(getObjectCounterName("test_receiver_out"), false, 3L);
    verify(updateExtractor).longMean(getMeanByteCounterName("test_receiver_out"), false, LongCounterMean.ZERO.addValue(25L, 3));
    verifyNoMoreInteractions(updateExtractor);
}
Also used : CoderSizeEstimator(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.CoderSizeEstimator) ElementByteSizeObservableCoder(org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.ElementByteSizeObservableCoder) KvCoder(org.apache.beam.sdk.coders.KvCoder) BigEndianIntegerCoder(org.apache.beam.sdk.coders.BigEndianIntegerCoder) Coder(org.apache.beam.sdk.coders.Coder) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) IterableCoder(org.apache.beam.sdk.coders.IterableCoder) WindowingCoderGroupingKeyCreator(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.WindowingCoderGroupingKeyCreator) BatchSideInputPGBKParDoFn(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.BatchSideInputPGBKParDoFn) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) StreamingSideInputPGBKParDoFn(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.StreamingSideInputPGBKParDoFn) SimplePartialGroupByKeyParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.SimplePartialGroupByKeyParDoFn) KV(org.apache.beam.sdk.values.KV) TestOutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver) WindowedValue(org.apache.beam.sdk.util.WindowedValue) ElementByteSizeObservableCoder(org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.ElementByteSizeObservableCoder) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) StreamingSideInputPGBKParDoFn(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.StreamingSideInputPGBKParDoFn) Test(org.junit.Test)

Example 25 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class PartialGroupByKeyParDoFnsTest method testCreateWithCombinerAndStreaming.

@Test
public void testCreateWithCombinerAndStreaming() throws Exception {
    StreamingOptions options = PipelineOptionsFactory.as(StreamingOptions.class);
    options.setStreaming(true);
    Coder keyCoder = StringUtf8Coder.of();
    Coder valueCoder = BigEndianIntegerCoder.of();
    KvCoder<String, Integer> kvCoder = KvCoder.of(keyCoder, valueCoder);
    TestOutputReceiver receiver = new TestOutputReceiver(new ElementByteSizeObservableCoder(WindowedValue.getValueOnlyCoder(kvCoder)), counterSet, NameContextsForTests.nameContextForTest());
    ParDoFn pgbk = PartialGroupByKeyParDoFns.create(options, kvCoder, AppliedCombineFn.withInputCoder(Sum.ofIntegers(), CoderRegistry.createDefault(), kvCoder), NullSideInputReader.empty(), receiver, null);
    assertTrue(pgbk instanceof SimplePartialGroupByKeyParDoFn);
}
Also used : ElementByteSizeObservableCoder(org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.ElementByteSizeObservableCoder) KvCoder(org.apache.beam.sdk.coders.KvCoder) BigEndianIntegerCoder(org.apache.beam.sdk.coders.BigEndianIntegerCoder) Coder(org.apache.beam.sdk.coders.Coder) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) IterableCoder(org.apache.beam.sdk.coders.IterableCoder) StreamingOptions(org.apache.beam.sdk.options.StreamingOptions) ElementByteSizeObservableCoder(org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.ElementByteSizeObservableCoder) BatchSideInputPGBKParDoFn(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.BatchSideInputPGBKParDoFn) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) StreamingSideInputPGBKParDoFn(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.StreamingSideInputPGBKParDoFn) SimplePartialGroupByKeyParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.SimplePartialGroupByKeyParDoFn) SimplePartialGroupByKeyParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.SimplePartialGroupByKeyParDoFn) TestOutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver) Test(org.junit.Test)

Aggregations

ParDoFn (org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn)34 Test (org.junit.Test)26 CloudObject (org.apache.beam.runners.dataflow.util.CloudObject)18 OutputReceiver (org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver)10 Coder (org.apache.beam.sdk.coders.Coder)9 KvCoder (org.apache.beam.sdk.coders.KvCoder)9 CounterSet (org.apache.beam.runners.dataflow.worker.counters.CounterSet)7 StringUtf8Coder (org.apache.beam.sdk.coders.StringUtf8Coder)7 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)7 ElementByteSizeObservableCoder (org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.ElementByteSizeObservableCoder)6 BatchSideInputPGBKParDoFn (org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.BatchSideInputPGBKParDoFn)6 StreamingSideInputPGBKParDoFn (org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.StreamingSideInputPGBKParDoFn)6 SimplePartialGroupByKeyParDoFn (org.apache.beam.runners.dataflow.worker.util.common.worker.SimplePartialGroupByKeyParDoFn)6 TestOutputReceiver (org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver)6 BigEndianIntegerCoder (org.apache.beam.sdk.coders.BigEndianIntegerCoder)6 IterableCoder (org.apache.beam.sdk.coders.IterableCoder)6 TupleTag (org.apache.beam.sdk.values.TupleTag)6 ArrayList (java.util.ArrayList)5 Structs.addString (org.apache.beam.runners.dataflow.util.Structs.addString)5 Receiver (org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver)5