Search in sources :

Example 86 with DoFn

use of org.apache.beam.sdk.transforms.DoFn in project components by Talend.

the class JmsOutputPTransformRuntime method expand.

@Override
public PDone expand(PCollection<Object> objectPCollection) {
    // TODO remove this method from PCollection<Object> to PCollection<IndexedRecord>, as the incoming type always PCollection<IndexedRecord>
    PCollection<IndexedRecord> indexedCollection = objectPCollection.apply("ExtractIndexedRecord", ParDo.of(new DoFn<Object, IndexedRecord>() {

        IndexedRecordConverter converter;

        @DoFn.ProcessElement
        public void processElement(ProcessContext c) throws Exception {
            if (c.element() == null) {
                return;
            }
            if (converter == null) {
                converter = new AvroRegistry().createIndexedRecordConverter(c.element().getClass());
            }
            c.output((IndexedRecord) converter.convertToAvro(c.element()));
        }
    }));
    indexedCollection.setCoder(LazyAvroCoder.of());
    PCollection<String> jmsCollection = indexedCollection.apply("IndexedRecordToJmsRecord", ParDo.of(new DoFn<IndexedRecord, String>() {

        @DoFn.ProcessElement
        public void processElement(ProcessContext c) throws Exception {
            c.output(c.element().get(0).toString());
        }
    }));
    datastoreRuntime = new JmsDatastoreRuntime();
    datastoreRuntime.initialize(null, properties.datasetRef.getReference().getDatastoreProperties());
    if (messageType.equals(JmsMessageType.QUEUE)) {
        return jmsCollection.apply(JmsIO.write().withConnectionFactory(datastoreRuntime.getConnectionFactory()).withQueue(properties.datasetRef.getReference().queueTopicName.getValue()));
    } else if (messageType.equals(JmsMessageType.TOPIC)) {
        return jmsCollection.apply(JmsIO.write().withConnectionFactory(datastoreRuntime.getConnectionFactory()).withTopic(properties.datasetRef.getReference().queueTopicName.getValue()));
    } else {
        throw new TalendRuntimeException(CommonErrorCodes.UNEXPECTED_ARGUMENT);
    }
}
Also used : TalendRuntimeException(org.talend.daikon.exception.TalendRuntimeException) DoFn(org.apache.beam.sdk.transforms.DoFn) AvroRegistry(org.talend.daikon.avro.AvroRegistry) IndexedRecord(org.apache.avro.generic.IndexedRecord) IndexedRecordConverter(org.talend.daikon.avro.converter.IndexedRecordConverter)

Example 87 with DoFn

use of org.apache.beam.sdk.transforms.DoFn in project component-runtime by Talend.

the class TalendIOTest method processorMulti.

@Test
public void processorMulti() {
    final PCollection<SampleLength> out = pipeline.apply(Create.of(new Sample("a"), new Sample("bb")).withCoder(JsonbCoder.of(Sample.class, PLUGIN))).apply(UUID.randomUUID().toString(), ParDo.of(new DoFn<Sample, JsonObject>() {

        @ProcessElement
        public void toData(final ProcessContext sample) {
            sample.output(JSONB.fromJson(JSONB.toJson(sample.element()), JsonObject.class));
        }
    })).setCoder(JsonpJsonObjectCoder.of(PLUGIN)).apply(new ViewsMappingTransform(emptyMap(), PLUGIN)).apply(TalendFn.asFn(new BaseTestProcessor() {

        @Override
        public void onNext(final InputFactory input, final OutputFactory factory) {
            factory.create(Branches.DEFAULT_BRANCH).emit(new SampleLength(JSONB.fromJson(input.read(Branches.DEFAULT_BRANCH).toString(), Sample.class).data.length()));
        }
    })).apply(ParDo.of(new DoFn<JsonObject, SampleLength>() {

        @ProcessElement
        public void onElement(final ProcessContext ctx) {
            ctx.output(JSONB.fromJson(ctx.element().getJsonArray("__default__").getJsonObject(0).toString(), SampleLength.class));
        }
    }));
    PAssert.that(out.apply(UUID.randomUUID().toString(), ParDo.of(new DoFn<SampleLength, Integer>() {

        @ProcessElement
        public void toInt(final ProcessContext pc) {
            pc.output(pc.element().len);
        }
    }))).containsInAnyOrder(1, 2);
    assertEquals(PipelineResult.State.DONE, pipeline.run().getState());
}
Also used : InputFactory(org.talend.sdk.component.runtime.output.InputFactory) DoFn(org.apache.beam.sdk.transforms.DoFn) JsonObject(javax.json.JsonObject) ViewsMappingTransform(org.talend.sdk.component.runtime.beam.transform.ViewsMappingTransform) OutputFactory(org.talend.sdk.component.runtime.output.OutputFactory) Test(org.junit.Test)

Example 88 with DoFn

use of org.apache.beam.sdk.transforms.DoFn in project component-runtime by Talend.

the class InMemoryQueueIOTest method output.

@Test(timeout = 60000)
public void output() {
    final Collection<JsonObject> objects = new ArrayList<>();
    try (final LoopState state = LoopState.newTracker(null)) {
        pipeline.apply(Create.of(IntStream.range(0, 5).mapToObj(RowStruct::new).collect(toList()))).setCoder(SerializableCoder.of(RowStruct.class)).apply(ParDo.of(new DoFn<RowStruct, JsonObject>() {

            @ProcessElement
            public void onElement(final ProcessContext context) {
                final JsonObject object = ComponentManager.instance().getJsonpBuilderFactory().createObjectBuilder().add("id", context.element().id).build();
                context.output(object);
            }
        })).setCoder(JsonpJsonObjectCoder.of(null)).apply(InMemoryQueueIO.to(state));
        pipeline.run().waitUntilFinish();
        JsonObject next;
        do {
            next = state.next();
            if (next != null) {
                objects.add(next);
            }
        } while (next != null);
    }
    assertEquals(5, objects.size());
    assertEquals(IntStream.range(0, 5).boxed().collect(toSet()), objects.stream().mapToInt(o -> o.getInt("id")).boxed().collect(toSet()));
}
Also used : IntStream(java.util.stream.IntStream) DoFn(org.apache.beam.sdk.transforms.DoFn) JsonObject(javax.json.JsonObject) SerializableCoder(org.apache.beam.sdk.coders.SerializableCoder) Collection(java.util.Collection) PipelineResult(org.apache.beam.sdk.PipelineResult) Test(org.junit.Test) PipelineOptionsFactory(org.apache.beam.sdk.options.PipelineOptionsFactory) Serializable(java.io.Serializable) ArrayList(java.util.ArrayList) TimeUnit(java.util.concurrent.TimeUnit) Collectors.toList(java.util.stream.Collectors.toList) Rule(org.junit.Rule) ParDo(org.apache.beam.sdk.transforms.ParDo) Create(org.apache.beam.sdk.transforms.Create) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Data(lombok.Data) Thread.sleep(java.lang.Thread.sleep) Assertions.assertEquals(org.junit.jupiter.api.Assertions.assertEquals) ComponentManager(org.talend.sdk.component.runtime.manager.ComponentManager) AllArgsConstructor(lombok.AllArgsConstructor) JsonpJsonObjectCoder(org.talend.sdk.component.runtime.beam.coder.JsonpJsonObjectCoder) Collectors.toSet(java.util.stream.Collectors.toSet) CopyOnWriteArrayList(java.util.concurrent.CopyOnWriteArrayList) DoFn(org.apache.beam.sdk.transforms.DoFn) ArrayList(java.util.ArrayList) CopyOnWriteArrayList(java.util.concurrent.CopyOnWriteArrayList) JsonObject(javax.json.JsonObject) Test(org.junit.Test)

Example 89 with DoFn

use of org.apache.beam.sdk.transforms.DoFn in project beam by apache.

the class PTransformMatchersTest method parDoWithFnTypeWithMatchingType.

@Test
public void parDoWithFnTypeWithMatchingType() {
    DoFn<Object, Object> fn = new DoFn<Object, Object>() {

        @ProcessElement
        public void process(ProcessContext ctxt) {
        }
    };
    AppliedPTransform<?, ?, ?> parDoSingle = getAppliedTransform(ParDo.of(fn));
    AppliedPTransform<?, ?, ?> parDoMulti = getAppliedTransform(ParDo.of(fn).withOutputTags(new TupleTag<>(), TupleTagList.empty()));
    PTransformMatcher matcher = PTransformMatchers.parDoWithFnType(fn.getClass());
    assertThat(matcher.matches(parDoSingle), is(true));
    assertThat(matcher.matches(parDoMulti), is(true));
}
Also used : DoFn(org.apache.beam.sdk.transforms.DoFn) PTransformMatcher(org.apache.beam.sdk.runners.PTransformMatcher) TupleTag(org.apache.beam.sdk.values.TupleTag) Test(org.junit.Test)

Example 90 with DoFn

use of org.apache.beam.sdk.transforms.DoFn in project beam by apache.

the class DoFnOperatorTest method testLateDroppingForStatefulFn.

@Test
public void testLateDroppingForStatefulFn() throws Exception {
    WindowingStrategy<Object, IntervalWindow> windowingStrategy = WindowingStrategy.of(FixedWindows.of(Duration.millis(10)));
    DoFn<Integer, String> fn = new DoFn<Integer, String>() {

        @StateId("state")
        private final StateSpec<ValueState<String>> stateSpec = StateSpecs.value(StringUtf8Coder.of());

        @ProcessElement
        public void processElement(ProcessContext context) {
            context.output(context.element().toString());
        }
    };
    VarIntCoder keyCoder = VarIntCoder.of();
    Coder<WindowedValue<Integer>> inputCoder = WindowedValue.getFullCoder(keyCoder, windowingStrategy.getWindowFn().windowCoder());
    Coder<WindowedValue<String>> outputCoder = WindowedValue.getFullCoder(StringUtf8Coder.of(), windowingStrategy.getWindowFn().windowCoder());
    KeySelector<WindowedValue<Integer>, ByteBuffer> keySelector = e -> FlinkKeyUtils.encodeKey(e.getValue(), keyCoder);
    TupleTag<String> outputTag = new TupleTag<>("main-output");
    DoFnOperator<Integer, String> doFnOperator = new DoFnOperator<>(fn, "stepName", inputCoder, Collections.emptyMap(), outputTag, Collections.emptyList(), new DoFnOperator.MultiOutputOutputManagerFactory<>(outputTag, outputCoder, new SerializablePipelineOptions(FlinkPipelineOptions.defaults())), windowingStrategy, new HashMap<>(), /* side-input mapping */
    Collections.emptyList(), /* side inputs */
    FlinkPipelineOptions.defaults(), keyCoder, /* key coder */
    keySelector, DoFnSchemaInformation.create(), Collections.emptyMap());
    OneInputStreamOperatorTestHarness<WindowedValue<Integer>, WindowedValue<String>> testHarness = new KeyedOneInputStreamOperatorTestHarness<>(doFnOperator, keySelector, new CoderTypeInformation<>(FlinkKeyUtils.ByteBufferCoder.of(), FlinkPipelineOptions.defaults()));
    testHarness.open();
    testHarness.processWatermark(0);
    IntervalWindow window1 = new IntervalWindow(new Instant(0), Duration.millis(10));
    // this should not be late
    testHarness.processElement(new StreamRecord<>(WindowedValue.of(13, new Instant(0), window1, PaneInfo.NO_FIRING)));
    assertThat(stripStreamRecordFromWindowedValue(testHarness.getOutput()), contains(WindowedValue.of("13", new Instant(0), window1, PaneInfo.NO_FIRING)));
    testHarness.getOutput().clear();
    testHarness.processWatermark(9);
    // this should still not be considered late
    testHarness.processElement(new StreamRecord<>(WindowedValue.of(17, new Instant(0), window1, PaneInfo.NO_FIRING)));
    assertThat(stripStreamRecordFromWindowedValue(testHarness.getOutput()), contains(WindowedValue.of("17", new Instant(0), window1, PaneInfo.NO_FIRING)));
    testHarness.getOutput().clear();
    testHarness.processWatermark(10);
    // this should now be considered late
    testHarness.processElement(new StreamRecord<>(WindowedValue.of(17, new Instant(0), window1, PaneInfo.NO_FIRING)));
    assertThat(stripStreamRecordFromWindowedValue(testHarness.getOutput()), emptyIterable());
    testHarness.close();
}
Also used : StateSpec(org.apache.beam.sdk.state.StateSpec) Arrays(java.util.Arrays) StateNamespace(org.apache.beam.runners.core.StateNamespace) SerializablePipelineOptions(org.apache.beam.runners.core.construction.SerializablePipelineOptions) TimestampCombiner(org.apache.beam.sdk.transforms.windowing.TimestampCombiner) WindowedValue(org.apache.beam.sdk.util.WindowedValue) StreamRecordStripper.stripStreamRecordFromWindowedValue(org.apache.beam.runners.flink.translation.wrappers.streaming.StreamRecordStripper.stripStreamRecordFromWindowedValue) IsIterableContainingInOrder.contains(org.hamcrest.collection.IsIterableContainingInOrder.contains) FlinkPipelineOptions(org.apache.beam.runners.flink.FlinkPipelineOptions) TimerSpecs(org.apache.beam.sdk.state.TimerSpecs) DoFnRunner(org.apache.beam.runners.core.DoFnRunner) FlinkMetricContainer(org.apache.beam.runners.flink.metrics.FlinkMetricContainer) StepContext(org.apache.beam.runners.core.StepContext) ValueState(org.apache.beam.sdk.state.ValueState) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) KeyedOneInputStreamOperatorTestHarness(org.apache.flink.streaming.util.KeyedOneInputStreamOperatorTestHarness) TimerInternals(org.apache.beam.runners.core.TimerInternals) ByteBuffer(java.nio.ByteBuffer) DoFnSchemaInformation(org.apache.beam.sdk.transforms.DoFnSchemaInformation) OneInputStreamOperatorTestHarness(org.apache.flink.streaming.util.OneInputStreamOperatorTestHarness) TypeFactory(com.fasterxml.jackson.databind.type.TypeFactory) Create(org.apache.beam.sdk.transforms.Create) TwoInputStreamOperatorTestHarness(org.apache.flink.streaming.util.TwoInputStreamOperatorTestHarness) Iterables(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables) LRUMap(com.fasterxml.jackson.databind.util.LRUMap) Window(org.apache.beam.sdk.transforms.windowing.Window) GlobalWindow(org.apache.beam.sdk.transforms.windowing.GlobalWindow) FluentIterable(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.FluentIterable) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) CoderTypeInformation(org.apache.beam.runners.flink.translation.types.CoderTypeInformation) KvCoder(org.apache.beam.sdk.coders.KvCoder) KeySelector(org.apache.flink.api.java.functions.KeySelector) KeyedTwoInputStreamOperatorTestHarness(org.apache.flink.streaming.util.KeyedTwoInputStreamOperatorTestHarness) PaneInfo(org.apache.beam.sdk.transforms.windowing.PaneInfo) FullWindowedValueCoder(org.apache.beam.sdk.util.WindowedValue.FullWindowedValueCoder) OutputTag(org.apache.flink.util.OutputTag) VarLongCoder(org.apache.beam.sdk.coders.VarLongCoder) OperatorSubtaskState(org.apache.flink.runtime.checkpoint.OperatorSubtaskState) Collectors(java.util.stream.Collectors) Matchers.instanceOf(org.hamcrest.Matchers.instanceOf) Objects(java.util.Objects) List(java.util.List) WatermarkHoldState(org.apache.beam.sdk.state.WatermarkHoldState) Matchers.containsInAnyOrder(org.hamcrest.Matchers.containsInAnyOrder) Timer(org.apache.beam.sdk.state.Timer) Matchers.equalTo(org.hamcrest.Matchers.equalTo) Optional(java.util.Optional) ImmutableList(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList) Matchers.greaterThan(org.hamcrest.Matchers.greaterThan) Matchers.is(org.hamcrest.Matchers.is) StateTag(org.apache.beam.runners.core.StateTag) WindowingStrategy(org.apache.beam.sdk.values.WindowingStrategy) StatefulDoFnRunner(org.apache.beam.runners.core.StatefulDoFnRunner) Whitebox(org.powermock.reflect.Whitebox) KV(org.apache.beam.sdk.values.KV) Assert.assertThrows(org.junit.Assert.assertThrows) Duration(org.joda.time.Duration) RunWith(org.junit.runner.RunWith) Coder(org.apache.beam.sdk.coders.Coder) HashMap(java.util.HashMap) View(org.apache.beam.sdk.transforms.View) StateNamespaces(org.apache.beam.runners.core.StateNamespaces) Supplier(java.util.function.Supplier) StateTags(org.apache.beam.runners.core.StateTags) ArrayList(java.util.ArrayList) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) RawUnionValue(org.apache.beam.sdk.transforms.join.RawUnionValue) StreamRecord(org.apache.flink.streaming.runtime.streamrecord.StreamRecord) TimerSpec(org.apache.beam.sdk.state.TimerSpec) CoderTypeSerializer(org.apache.beam.runners.flink.translation.types.CoderTypeSerializer) TupleTag(org.apache.beam.sdk.values.TupleTag) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) Pipeline(org.apache.beam.sdk.Pipeline) Nullable(org.checkerframework.checker.nullness.qual.Nullable) Before(org.junit.Before) DoFn(org.apache.beam.sdk.transforms.DoFn) PCollectionViewTesting(org.apache.beam.sdk.testing.PCollectionViewTesting) FixedWindows(org.apache.beam.sdk.transforms.windowing.FixedWindows) Function(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Function) Test(org.junit.Test) JUnit4(org.junit.runners.JUnit4) PCollection(org.apache.beam.sdk.values.PCollection) Mockito(org.mockito.Mockito) Matchers.emptyIterable(org.hamcrest.Matchers.emptyIterable) StateSpecs(org.apache.beam.sdk.state.StateSpecs) PCollectionView(org.apache.beam.sdk.values.PCollectionView) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) Instant(org.joda.time.Instant) VarIntCoder(org.apache.beam.sdk.coders.VarIntCoder) IntervalWindow(org.apache.beam.sdk.transforms.windowing.IntervalWindow) Collections(java.util.Collections) TimeDomain(org.apache.beam.sdk.state.TimeDomain) Assert.assertEquals(org.junit.Assert.assertEquals) VarIntCoder(org.apache.beam.sdk.coders.VarIntCoder) TupleTag(org.apache.beam.sdk.values.TupleTag) KeyedOneInputStreamOperatorTestHarness(org.apache.flink.streaming.util.KeyedOneInputStreamOperatorTestHarness) StateSpec(org.apache.beam.sdk.state.StateSpec) WindowedValue(org.apache.beam.sdk.util.WindowedValue) StreamRecordStripper.stripStreamRecordFromWindowedValue(org.apache.beam.runners.flink.translation.wrappers.streaming.StreamRecordStripper.stripStreamRecordFromWindowedValue) SerializablePipelineOptions(org.apache.beam.runners.core.construction.SerializablePipelineOptions) IntervalWindow(org.apache.beam.sdk.transforms.windowing.IntervalWindow) Instant(org.joda.time.Instant) ByteBuffer(java.nio.ByteBuffer) DoFn(org.apache.beam.sdk.transforms.DoFn) Test(org.junit.Test)

Aggregations

DoFn (org.apache.beam.sdk.transforms.DoFn)154 Test (org.junit.Test)98 Pipeline (org.apache.beam.sdk.Pipeline)60 KV (org.apache.beam.sdk.values.KV)45 TupleTag (org.apache.beam.sdk.values.TupleTag)28 StateSpec (org.apache.beam.sdk.state.StateSpec)26 Instant (org.joda.time.Instant)26 ArrayList (java.util.ArrayList)23 TestPipeline (org.apache.beam.sdk.testing.TestPipeline)23 BoundedWindow (org.apache.beam.sdk.transforms.windowing.BoundedWindow)22 PCollection (org.apache.beam.sdk.values.PCollection)21 TimerSpec (org.apache.beam.sdk.state.TimerSpec)19 WindowedValue (org.apache.beam.sdk.util.WindowedValue)18 PCollectionView (org.apache.beam.sdk.values.PCollectionView)18 HashMap (java.util.HashMap)17 Coder (org.apache.beam.sdk.coders.Coder)17 List (java.util.List)16 Map (java.util.Map)14 ValueState (org.apache.beam.sdk.state.ValueState)14 RunnerApi (org.apache.beam.model.pipeline.v1.RunnerApi)13