Search in sources :

Example 6 with TupleTag

use of org.apache.beam.sdk.values.TupleTag in project beam by apache.

the class ParDoTest method testParDoEmptyWithTaggedOutput.

@Test
@Category(ValidatesRunner.class)
public void testParDoEmptyWithTaggedOutput() {
    TupleTag<String> mainOutputTag = new TupleTag<String>("main") {
    };
    TupleTag<String> additionalOutputTag1 = new TupleTag<String>("additional1") {
    };
    TupleTag<String> additionalOutputTag2 = new TupleTag<String>("additional2") {
    };
    TupleTag<String> additionalOutputTag3 = new TupleTag<String>("additional3") {
    };
    TupleTag<String> additionalOutputTagUnwritten = new TupleTag<String>("unwrittenOutput") {
    };
    PCollectionTuple outputs = pipeline.apply(Create.empty(VarIntCoder.of())).apply(ParDo.of(new TestDoFn(Arrays.<PCollectionView<Integer>>asList(), Arrays.asList(additionalOutputTag1, additionalOutputTag2, additionalOutputTag3))).withOutputTags(mainOutputTag, TupleTagList.of(additionalOutputTag3).and(additionalOutputTag1).and(additionalOutputTagUnwritten).and(additionalOutputTag2)));
    List<Integer> inputs = Collections.emptyList();
    PAssert.that(outputs.get(mainOutputTag)).satisfies(ParDoTest.HasExpectedOutput.forInput(inputs));
    PAssert.that(outputs.get(additionalOutputTag1)).satisfies(ParDoTest.HasExpectedOutput.forInput(inputs).fromOutput(additionalOutputTag1));
    PAssert.that(outputs.get(additionalOutputTag2)).satisfies(ParDoTest.HasExpectedOutput.forInput(inputs).fromOutput(additionalOutputTag2));
    PAssert.that(outputs.get(additionalOutputTag3)).satisfies(ParDoTest.HasExpectedOutput.forInput(inputs).fromOutput(additionalOutputTag3));
    PAssert.that(outputs.get(additionalOutputTagUnwritten)).empty();
    pipeline.run();
}
Also used : PCollectionView(org.apache.beam.sdk.values.PCollectionView) TupleTag(org.apache.beam.sdk.values.TupleTag) PCollectionTuple(org.apache.beam.sdk.values.PCollectionTuple) StringUtils.byteArrayToJsonString(org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString) Matchers.containsString(org.hamcrest.Matchers.containsString) Category(org.junit.experimental.categories.Category) Test(org.junit.Test)

Example 7 with TupleTag

use of org.apache.beam.sdk.values.TupleTag in project beam by apache.

the class ParDoTest method testTaggedOutputUnregisteredExplicitCoder.

@Test
public void testTaggedOutputUnregisteredExplicitCoder() throws Exception {
    pipeline.enableAbandonedNodeEnforcement(false);
    PCollection<Integer> input = pipeline.apply(Create.of(Arrays.asList(1, 2, 3)));
    final TupleTag<Integer> mainOutputTag = new TupleTag<Integer>("main");
    final TupleTag<TestDummy> additionalOutputTag = new TupleTag<TestDummy>("unregisteredSide");
    ParDo.MultiOutput<Integer, Integer> pardo = ParDo.of(new TaggedOutputDummyFn(additionalOutputTag)).withOutputTags(mainOutputTag, TupleTagList.of(additionalOutputTag));
    PCollectionTuple outputTuple = input.apply(pardo);
    outputTuple.get(additionalOutputTag).setCoder(new TestDummyCoder());
    outputTuple.get(additionalOutputTag).apply(View.<TestDummy>asSingleton());
    assertEquals(new TestDummyCoder(), outputTuple.get(additionalOutputTag).getCoder());
    outputTuple.get(additionalOutputTag).finishSpecifyingOutput("ParDo", input, // Check for crashes
    pardo);
    assertEquals(new TestDummyCoder(), // Check for corruption
    outputTuple.get(additionalOutputTag).getCoder());
}
Also used : TupleTag(org.apache.beam.sdk.values.TupleTag) UsesStatefulParDo(org.apache.beam.sdk.testing.UsesStatefulParDo) UsesTimersInParDo(org.apache.beam.sdk.testing.UsesTimersInParDo) PCollectionTuple(org.apache.beam.sdk.values.PCollectionTuple) Test(org.junit.Test)

Example 8 with TupleTag

use of org.apache.beam.sdk.values.TupleTag in project beam by apache.

the class ParDoTest method testMultiOutputAppliedMultipleTimesDifferentOutputs.

@Test
public void testMultiOutputAppliedMultipleTimesDifferentOutputs() {
    pipeline.enableAbandonedNodeEnforcement(false);
    PCollection<Long> longs = pipeline.apply(GenerateSequence.from(0));
    TupleTag<Long> mainOut = new TupleTag<>();
    final TupleTag<String> valueAsString = new TupleTag<>();
    final TupleTag<Integer> valueAsInt = new TupleTag<>();
    DoFn<Long, Long> fn = new DoFn<Long, Long>() {

        @ProcessElement
        public void processElement(ProcessContext cxt) {
            cxt.output(cxt.element());
            cxt.output(valueAsString, Long.toString(cxt.element()));
            cxt.output(valueAsInt, Long.valueOf(cxt.element()).intValue());
        }
    };
    ParDo.MultiOutput<Long, Long> parDo = ParDo.of(fn).withOutputTags(mainOut, TupleTagList.of(valueAsString).and(valueAsInt));
    PCollectionTuple firstApplication = longs.apply("first", parDo);
    PCollectionTuple secondApplication = longs.apply("second", parDo);
    assertThat(firstApplication, not(equalTo(secondApplication)));
    assertThat(firstApplication.getAll().keySet(), Matchers.<TupleTag<?>>containsInAnyOrder(mainOut, valueAsString, valueAsInt));
    assertThat(secondApplication.getAll().keySet(), Matchers.<TupleTag<?>>containsInAnyOrder(mainOut, valueAsString, valueAsInt));
}
Also used : TupleTag(org.apache.beam.sdk.values.TupleTag) StringUtils.byteArrayToJsonString(org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString) Matchers.containsString(org.hamcrest.Matchers.containsString) UsesStatefulParDo(org.apache.beam.sdk.testing.UsesStatefulParDo) UsesTimersInParDo(org.apache.beam.sdk.testing.UsesTimersInParDo) PCollectionTuple(org.apache.beam.sdk.values.PCollectionTuple) Test(org.junit.Test)

Example 9 with TupleTag

use of org.apache.beam.sdk.values.TupleTag in project beam by apache.

the class ReplacementOutputs method tagged.

public static Map<PValue, ReplacementOutput> tagged(Map<TupleTag<?>, PValue> original, POutput replacement) {
    Map<TupleTag<?>, TaggedPValue> originalTags = new HashMap<>();
    for (Map.Entry<TupleTag<?>, PValue> originalValue : original.entrySet()) {
        originalTags.put(originalValue.getKey(), TaggedPValue.of(originalValue.getKey(), originalValue.getValue()));
    }
    ImmutableMap.Builder<PValue, ReplacementOutput> resultBuilder = ImmutableMap.builder();
    Set<TupleTag<?>> missingTags = new HashSet<>(originalTags.keySet());
    for (Map.Entry<TupleTag<?>, PValue> replacementValue : replacement.expand().entrySet()) {
        TaggedPValue mapped = originalTags.get(replacementValue.getKey());
        checkArgument(mapped != null, "Missing original output for Tag %s and Value %s Between original %s and replacement %s", replacementValue.getKey(), replacementValue.getValue(), original, replacement.expand());
        resultBuilder.put(replacementValue.getValue(), ReplacementOutput.of(mapped, TaggedPValue.of(replacementValue.getKey(), replacementValue.getValue())));
        missingTags.remove(replacementValue.getKey());
    }
    ImmutableMap<PValue, ReplacementOutput> result = resultBuilder.build();
    checkArgument(missingTags.isEmpty(), "Missing replacement for tags %s. Encountered tags: %s", missingTags, result.keySet());
    return result;
}
Also used : HashMap(java.util.HashMap) TupleTag(org.apache.beam.sdk.values.TupleTag) PValue(org.apache.beam.sdk.values.PValue) TaggedPValue(org.apache.beam.sdk.values.TaggedPValue) ImmutableMap(com.google.common.collect.ImmutableMap) ReplacementOutput(org.apache.beam.sdk.runners.PTransformOverrideFactory.ReplacementOutput) TaggedPValue(org.apache.beam.sdk.values.TaggedPValue) ImmutableMap(com.google.common.collect.ImmutableMap) HashMap(java.util.HashMap) Map(java.util.Map) HashSet(java.util.HashSet)

Example 10 with TupleTag

use of org.apache.beam.sdk.values.TupleTag in project beam by apache.

the class DoFnOperatorTest method testLateDroppingForStatefulFn.

@Test
public void testLateDroppingForStatefulFn() throws Exception {
    WindowingStrategy<Object, IntervalWindow> windowingStrategy = WindowingStrategy.of(FixedWindows.of(new Duration(10)));
    DoFn<Integer, String> fn = new DoFn<Integer, String>() {

        @StateId("state")
        private final StateSpec<ValueState<String>> stateSpec = StateSpecs.value(StringUtf8Coder.of());

        @ProcessElement
        public void processElement(ProcessContext context) {
            context.output(context.element().toString());
        }
    };
    WindowedValue.FullWindowedValueCoder<Integer> windowedValueCoder = WindowedValue.getFullCoder(VarIntCoder.of(), windowingStrategy.getWindowFn().windowCoder());
    TupleTag<String> outputTag = new TupleTag<>("main-output");
    DoFnOperator<Integer, String, WindowedValue<String>> doFnOperator = new DoFnOperator<>(fn, "stepName", windowedValueCoder, outputTag, Collections.<TupleTag<?>>emptyList(), new DoFnOperator.DefaultOutputManagerFactory<WindowedValue<String>>(), windowingStrategy, new HashMap<Integer, PCollectionView<?>>(), /* side-input mapping */
    Collections.<PCollectionView<?>>emptyList(), /* side inputs */
    PipelineOptionsFactory.as(FlinkPipelineOptions.class), VarIntCoder.of());
    OneInputStreamOperatorTestHarness<WindowedValue<Integer>, WindowedValue<String>> testHarness = new KeyedOneInputStreamOperatorTestHarness<>(doFnOperator, new KeySelector<WindowedValue<Integer>, Integer>() {

        @Override
        public Integer getKey(WindowedValue<Integer> integerWindowedValue) throws Exception {
            return integerWindowedValue.getValue();
        }
    }, new CoderTypeInformation<>(VarIntCoder.of()));
    testHarness.open();
    testHarness.processWatermark(0);
    IntervalWindow window1 = new IntervalWindow(new Instant(0), Duration.millis(10));
    // this should not be late
    testHarness.processElement(new StreamRecord<>(WindowedValue.of(13, new Instant(0), window1, PaneInfo.NO_FIRING)));
    assertThat(this.<String>stripStreamRecordFromWindowedValue(testHarness.getOutput()), contains(WindowedValue.of("13", new Instant(0), window1, PaneInfo.NO_FIRING)));
    testHarness.getOutput().clear();
    testHarness.processWatermark(9);
    // this should still not be considered late
    testHarness.processElement(new StreamRecord<>(WindowedValue.of(17, new Instant(0), window1, PaneInfo.NO_FIRING)));
    assertThat(this.<String>stripStreamRecordFromWindowedValue(testHarness.getOutput()), contains(WindowedValue.of("17", new Instant(0), window1, PaneInfo.NO_FIRING)));
    testHarness.getOutput().clear();
    testHarness.processWatermark(10);
    // this should now be considered late
    testHarness.processElement(new StreamRecord<>(WindowedValue.of(17, new Instant(0), window1, PaneInfo.NO_FIRING)));
    assertThat(this.<String>stripStreamRecordFromWindowedValue(testHarness.getOutput()), emptyIterable());
    testHarness.close();
}
Also used : TupleTag(org.apache.beam.sdk.values.TupleTag) FlinkPipelineOptions(org.apache.beam.runners.flink.FlinkPipelineOptions) DoFnOperator(org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator) KeyedOneInputStreamOperatorTestHarness(org.apache.flink.streaming.util.KeyedOneInputStreamOperatorTestHarness) StateSpec(org.apache.beam.sdk.state.StateSpec) WindowedValue(org.apache.beam.sdk.util.WindowedValue) IntervalWindow(org.apache.beam.sdk.transforms.windowing.IntervalWindow) Instant(org.joda.time.Instant) Duration(org.joda.time.Duration) PCollectionView(org.apache.beam.sdk.values.PCollectionView) DoFn(org.apache.beam.sdk.transforms.DoFn) Test(org.junit.Test)

Aggregations

TupleTag (org.apache.beam.sdk.values.TupleTag)185 Test (org.junit.Test)100 WindowedValue (org.apache.beam.sdk.util.WindowedValue)54 KV (org.apache.beam.sdk.values.KV)54 PCollectionTuple (org.apache.beam.sdk.values.PCollectionTuple)49 PCollection (org.apache.beam.sdk.values.PCollection)42 DoFn (org.apache.beam.sdk.transforms.DoFn)32 Instant (org.joda.time.Instant)32 SerializablePipelineOptions (org.apache.beam.runners.core.construction.SerializablePipelineOptions)30 Map (java.util.Map)29 Pipeline (org.apache.beam.sdk.Pipeline)29 PCollectionView (org.apache.beam.sdk.values.PCollectionView)29 HashMap (java.util.HashMap)27 Coder (org.apache.beam.sdk.coders.Coder)26 StreamRecordStripper.stripStreamRecordFromWindowedValue (org.apache.beam.runners.flink.translation.wrappers.streaming.StreamRecordStripper.stripStreamRecordFromWindowedValue)25 Matchers.containsString (org.hamcrest.Matchers.containsString)25 List (java.util.List)24 BoundedWindow (org.apache.beam.sdk.transforms.windowing.BoundedWindow)23 KvCoder (org.apache.beam.sdk.coders.KvCoder)22 KeyedOneInputStreamOperatorTestHarness (org.apache.flink.streaming.util.KeyedOneInputStreamOperatorTestHarness)22