Search in sources :

Example 1 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class CombineValuesFnFactoryTest method testCombineValuesFnAll.

@Test
public void testCombineValuesFnAll() throws Exception {
    TestReceiver receiver = new TestReceiver();
    Combine.CombineFn<Integer, CountSum, String> combiner = (new MeanInts());
    ParDoFn combineParDoFn = createCombineValuesFn(CombinePhase.ALL, combiner, StringUtf8Coder.of(), BigEndianIntegerCoder.of(), new CountSumCoder(), WindowingStrategy.globalDefault());
    combineParDoFn.startBundle(receiver);
    combineParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("a", Arrays.asList(5, 6, 7))));
    combineParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("b", Arrays.asList(1, 3, 7))));
    combineParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("c", Arrays.asList(3, 6, 8, 9))));
    combineParDoFn.finishBundle();
    Object[] expectedReceivedElems = { WindowedValue.valueInGlobalWindow(KV.of("a", String.format("%.1f", 6.0))), WindowedValue.valueInGlobalWindow(KV.of("b", String.format("%.1f", 3.7))), WindowedValue.valueInGlobalWindow(KV.of("c", String.format("%.1f", 6.5))) };
    assertArrayEquals(expectedReceivedElems, receiver.receivedElems.toArray());
}
Also used : Combine(org.apache.beam.sdk.transforms.Combine) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) Structs.addString(org.apache.beam.runners.dataflow.util.Structs.addString) StringUtils.byteArrayToJsonString(org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Example 2 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class CombineValuesFnFactoryTest method testCombineValuesFnAdd.

@Test
public void testCombineValuesFnAdd() throws Exception {
    TestReceiver receiver = new TestReceiver();
    MeanInts mean = new MeanInts();
    Combine.CombineFn<Integer, CountSum, String> combiner = mean;
    ParDoFn combineParDoFn = createCombineValuesFn(CombinePhase.ADD, combiner, StringUtf8Coder.of(), BigEndianIntegerCoder.of(), new CountSumCoder(), WindowingStrategy.globalDefault());
    combineParDoFn.startBundle(receiver);
    combineParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("a", Arrays.asList(5, 6, 7))));
    combineParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("b", Arrays.asList(1, 3, 7))));
    combineParDoFn.processElement(WindowedValue.valueInGlobalWindow(KV.of("c", Arrays.asList(3, 6, 8, 9))));
    combineParDoFn.finishBundle();
    Object[] expectedReceivedElems = { WindowedValue.valueInGlobalWindow(KV.of("a", new CountSum(3, 18))), WindowedValue.valueInGlobalWindow(KV.of("b", new CountSum(3, 11))), WindowedValue.valueInGlobalWindow(KV.of("c", new CountSum(4, 26))) };
    assertArrayEquals(expectedReceivedElems, receiver.receivedElems.toArray());
}
Also used : Combine(org.apache.beam.sdk.transforms.Combine) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) Structs.addString(org.apache.beam.runners.dataflow.util.Structs.addString) StringUtils.byteArrayToJsonString(org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Example 3 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class BeamFnMapTaskExecutorFactory method createParDoOperation.

private OperationNode createParDoOperation(Network<Node, Edge> network, ParallelInstructionNode node, PipelineOptions options, DataflowExecutionContext<?> executionContext, DataflowOperationContext operationContext) throws Exception {
    ParallelInstruction instruction = node.getParallelInstruction();
    ParDoInstruction parDo = instruction.getParDo();
    TupleTag<?> mainOutputTag = tupleTag(parDo.getMultiOutputInfos().get(0));
    ImmutableMap.Builder<TupleTag<?>, Integer> outputTagsToReceiverIndicesBuilder = ImmutableMap.builder();
    int successorOffset = 0;
    for (Node successor : network.successors(node)) {
        for (Edge edge : network.edgesConnecting(node, successor)) {
            outputTagsToReceiverIndicesBuilder.put(tupleTag(((MultiOutputInfoEdge) edge).getMultiOutputInfo()), successorOffset);
        }
        successorOffset += 1;
    }
    ParDoFn fn = parDoFnFactory.create(options, CloudObject.fromSpec(parDo.getUserFn()), parDo.getSideInputs(), mainOutputTag, outputTagsToReceiverIndicesBuilder.build(), executionContext, operationContext);
    OutputReceiver[] receivers = getOutputReceivers(network, node);
    return OperationNode.create(new ParDoOperation(fn, receivers, operationContext));
}
Also used : RegisterRequestNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.RegisterRequestNode) FetchAndFilterStreamingSideInputsNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.FetchAndFilterStreamingSideInputsNode) InstructionOutputNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode) OperationNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.OperationNode) ParallelInstructionNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode) Node(org.apache.beam.runners.dataflow.worker.graph.Nodes.Node) ExecutableStageNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.ExecutableStageNode) RemoteGrpcPortNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.RemoteGrpcPortNode) OutputReceiverNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.OutputReceiverNode) TupleTag(org.apache.beam.sdk.values.TupleTag) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) ParDoOperation(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation) ParallelInstruction(com.google.api.services.dataflow.model.ParallelInstruction) ParDoInstruction(com.google.api.services.dataflow.model.ParDoInstruction) Edge(org.apache.beam.runners.dataflow.worker.graph.Edges.Edge) MultiOutputInfoEdge(org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge)

Example 4 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class ToIsmRecordForMultimapDoFnFactoryTest method testConversionOfRecord.

@Test
public void testConversionOfRecord() throws Exception {
    ParDoFn parDoFn = new ToIsmRecordForMultimapDoFnFactory().create(null, /* pipeline options */
    CloudObject.fromSpec(ImmutableMap.of(PropertyNames.OBJECT_TYPE_NAME, "ToIsmRecordForMultimapDoFn", PropertyNames.ENCODING, createIsmRecordEncoding())), null, /* side input infos */
    null, /* main output tag */
    null, /* output tag to receiver index */
    null, /* exection context */
    null);
    List<Object> outputReceiver = new ArrayList<>();
    parDoFn.startBundle(outputReceiver::add);
    parDoFn.processElement(valueInGlobalWindow(KV.of(12, /* shard key */
    ImmutableList.of(KV.of(KV.of(42, /* user key */
    GlobalWindow.INSTANCE), /* sort key */
    4), KV.of(KV.of(42, /* user key */
    GlobalWindow.INSTANCE), /* sort key */
    5), KV.of(KV.of(43, /* user key */
    GlobalWindow.INSTANCE), /* sort key */
    6), KV.of(KV.of(44, /* user key */
    GlobalWindow.INSTANCE), /* sort key */
    7), KV.of(KV.of(44, /* user key */
    GlobalWindow.INSTANCE), /* sort key */
    8)))));
    assertThat(outputReceiver, contains(valueInGlobalWindow(IsmRecord.of(ImmutableList.of(42, GlobalWindow.INSTANCE, 0L), 4)), /* same structural value  as above */
    valueInGlobalWindow(IsmRecord.of(ImmutableList.of(42, GlobalWindow.INSTANCE, 1L), 5)), valueInGlobalWindow(IsmRecord.of(ImmutableList.of(43, GlobalWindow.INSTANCE, 0L), 6)), valueInGlobalWindow(IsmRecord.of(ImmutableList.of(44, GlobalWindow.INSTANCE, 0L), 7)), /* same structural value as above and final value */
    valueInGlobalWindow(IsmRecord.of(ImmutableList.of(44, GlobalWindow.INSTANCE, 1L), 8))));
}
Also used : ArrayList(java.util.ArrayList) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Example 5 with ParDoFn

use of org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn in project beam by apache.

the class UserParDoFnFactoryTest method testFactoryDoesNotReuseAfterAborted.

@Test
public void testFactoryDoesNotReuseAfterAborted() throws Exception {
    PipelineOptions options = PipelineOptionsFactory.create();
    CounterSet counters = new CounterSet();
    TestDoFn initialFn = new TestDoFn(Collections.<TupleTag<String>>emptyList());
    CloudObject cloudObject = getCloudObject(initialFn);
    ParDoFn parDoFn = factory.create(options, cloudObject, null, MAIN_OUTPUT, ImmutableMap.<TupleTag<?>, Integer>of(MAIN_OUTPUT, 0), BatchModeExecutionContext.forTesting(options, "testStage"), TestOperationContext.create(counters));
    Receiver rcvr = new OutputReceiver();
    parDoFn.startBundle(rcvr);
    parDoFn.processElement(WindowedValue.valueInGlobalWindow("foo"));
    TestDoFn fn = (TestDoFn) ((SimpleParDoFn) parDoFn).getDoFnInfo().getDoFn();
    parDoFn.abort();
    assertThat(fn.state, equalTo(TestDoFn.State.TORN_DOWN));
    // The fn should not be torn down here
    ParDoFn secondParDoFn = factory.create(options, cloudObject.clone(), null, MAIN_OUTPUT, ImmutableMap.<TupleTag<?>, Integer>of(MAIN_OUTPUT, 0), BatchModeExecutionContext.forTesting(options, "testStage"), TestOperationContext.create(counters));
    secondParDoFn.startBundle(rcvr);
    secondParDoFn.processElement(WindowedValue.valueInGlobalWindow("foo"));
    TestDoFn secondFn = (TestDoFn) ((SimpleParDoFn) secondParDoFn).getDoFnInfo().getDoFn();
    assertThat(secondFn, not(theInstance(fn)));
    assertThat(fn.state, equalTo(TestDoFn.State.TORN_DOWN));
    assertThat(secondFn.state, equalTo(TestDoFn.State.PROCESSING));
}
Also used : CounterSet(org.apache.beam.runners.dataflow.worker.counters.CounterSet) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) Receiver(org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) Test(org.junit.Test)

Aggregations

ParDoFn (org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn)34 Test (org.junit.Test)26 CloudObject (org.apache.beam.runners.dataflow.util.CloudObject)18 OutputReceiver (org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver)10 Coder (org.apache.beam.sdk.coders.Coder)9 KvCoder (org.apache.beam.sdk.coders.KvCoder)9 CounterSet (org.apache.beam.runners.dataflow.worker.counters.CounterSet)7 StringUtf8Coder (org.apache.beam.sdk.coders.StringUtf8Coder)7 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)7 ElementByteSizeObservableCoder (org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.ElementByteSizeObservableCoder)6 BatchSideInputPGBKParDoFn (org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.BatchSideInputPGBKParDoFn)6 StreamingSideInputPGBKParDoFn (org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.StreamingSideInputPGBKParDoFn)6 SimplePartialGroupByKeyParDoFn (org.apache.beam.runners.dataflow.worker.util.common.worker.SimplePartialGroupByKeyParDoFn)6 TestOutputReceiver (org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver)6 BigEndianIntegerCoder (org.apache.beam.sdk.coders.BigEndianIntegerCoder)6 IterableCoder (org.apache.beam.sdk.coders.IterableCoder)6 TupleTag (org.apache.beam.sdk.values.TupleTag)6 ArrayList (java.util.ArrayList)5 Structs.addString (org.apache.beam.runners.dataflow.util.Structs.addString)5 Receiver (org.apache.beam.runners.dataflow.worker.util.common.worker.Receiver)5