Search in sources :

Example 11 with DataStream

use of org.apache.flink.streaming.api.datastream.DataStream in project flink by apache.

the class CEPITCase method testSimpleAfterMatchSkip.

@Test
public void testSimpleAfterMatchSkip() throws Exception {
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(envConfiguration);
    DataStream<Tuple2<Integer, String>> input = env.fromElements(new Tuple2<>(1, "a"), new Tuple2<>(2, "a"), new Tuple2<>(3, "a"), new Tuple2<>(4, "a"));
    Pattern<Tuple2<Integer, String>, ?> pattern = Pattern.<Tuple2<Integer, String>>begin("start", AfterMatchSkipStrategy.skipPastLastEvent()).where(new SimpleCondition<Tuple2<Integer, String>>() {

        @Override
        public boolean filter(Tuple2<Integer, String> rec) throws Exception {
            return rec.f1.equals("a");
        }
    }).times(2);
    PatternStream<Tuple2<Integer, String>> pStream = CEP.pattern(input, pattern).inProcessingTime();
    DataStream<Tuple2<Integer, String>> result = pStream.select(new PatternSelectFunction<Tuple2<Integer, String>, Tuple2<Integer, String>>() {

        @Override
        public Tuple2<Integer, String> select(Map<String, List<Tuple2<Integer, String>>> pattern) throws Exception {
            return pattern.get("start").get(0);
        }
    });
    List<Tuple2<Integer, String>> resultList = new ArrayList<>();
    DataStreamUtils.collect(result).forEachRemaining(resultList::add);
    resultList.sort(Comparator.comparing(tuple2 -> tuple2.toString()));
    List<Tuple2<Integer, String>> expected = Arrays.asList(Tuple2.of(1, "a"), Tuple2.of(3, "a"));
    assertEquals(expected, resultList);
}
Also used : Arrays(java.util.Arrays) Tuple2(org.apache.flink.api.java.tuple.Tuple2) RichIterativeCondition(org.apache.flink.cep.pattern.conditions.RichIterativeCondition) Either(org.apache.flink.types.Either) RunWith(org.junit.runner.RunWith) Watermark(org.apache.flink.streaming.api.watermark.Watermark) MapStateDescriptor(org.apache.flink.api.common.state.MapStateDescriptor) DataStreamSource(org.apache.flink.streaming.api.datastream.DataStreamSource) MapFunction(org.apache.flink.api.common.functions.MapFunction) ArrayList(java.util.ArrayList) NFA(org.apache.flink.cep.nfa.NFA) Collector(org.apache.flink.util.Collector) Duration(java.time.Duration) Map(java.util.Map) LongSerializer(org.apache.flink.api.common.typeutils.base.LongSerializer) Pattern(org.apache.flink.cep.pattern.Pattern) AssignerWithPunctuatedWatermarks(org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks) Parameterized(org.junit.runners.Parameterized) AbstractTestBase(org.apache.flink.test.util.AbstractTestBase) Types(org.apache.flink.api.common.typeinfo.Types) Time(org.apache.flink.streaming.api.windowing.time.Time) KeySelector(org.apache.flink.api.java.functions.KeySelector) Collection(java.util.Collection) Configuration(org.apache.flink.configuration.Configuration) DataStreamUtils(org.apache.flink.streaming.api.datastream.DataStreamUtils) OutputTag(org.apache.flink.util.OutputTag) Test(org.junit.Test) DataStream(org.apache.flink.streaming.api.datastream.DataStream) List(java.util.List) AfterMatchSkipStrategy(org.apache.flink.cep.nfa.aftermatch.AfterMatchSkipStrategy) CEPCacheOptions(org.apache.flink.cep.configuration.CEPCacheOptions) SimpleCondition(org.apache.flink.cep.pattern.conditions.SimpleCondition) Comparator(java.util.Comparator) Assert.assertEquals(org.junit.Assert.assertEquals) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) SimpleCondition(org.apache.flink.cep.pattern.conditions.SimpleCondition) ArrayList(java.util.ArrayList) Tuple2(org.apache.flink.api.java.tuple.Tuple2) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) ArrayList(java.util.ArrayList) List(java.util.List) Test(org.junit.Test)

Example 12 with DataStream

use of org.apache.flink.streaming.api.datastream.DataStream in project flink by apache.

the class SavepointWriterTest method testCustomStateBackend.

@Test(expected = CustomStateBackendFactory.ExpectedException.class)
public void testCustomStateBackend() throws Exception {
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    Configuration configuration = new Configuration();
    configuration.set(StateBackendOptions.STATE_BACKEND, CustomStateBackendFactory.class.getCanonicalName());
    configuration.set(ExecutionOptions.RUNTIME_MODE, RuntimeExecutionMode.BATCH);
    env.configure(configuration);
    DataStream<String> input = env.fromElements("");
    StateBootstrapTransformation<String> transformation = OperatorTransformation.bootstrapWith(input).keyBy(element -> element).transform(new Bootstrapper());
    SavepointWriter.newSavepoint(128).withOperator("uid", transformation).write("file:///tmp/path");
    env.execute();
}
Also used : CustomStateBackendFactory(org.apache.flink.state.api.utils.CustomStateBackendFactory) DataStream(org.apache.flink.streaming.api.datastream.DataStream) KeyedStateBootstrapFunction(org.apache.flink.state.api.functions.KeyedStateBootstrapFunction) Configuration(org.apache.flink.configuration.Configuration) Test(org.junit.Test) ExecutionOptions(org.apache.flink.configuration.ExecutionOptions) RuntimeExecutionMode(org.apache.flink.api.common.RuntimeExecutionMode) StateBackendOptions(org.apache.flink.configuration.StateBackendOptions) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) Configuration(org.apache.flink.configuration.Configuration) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) CustomStateBackendFactory(org.apache.flink.state.api.utils.CustomStateBackendFactory) Test(org.junit.Test)

Example 13 with DataStream

use of org.apache.flink.streaming.api.datastream.DataStream in project flink by apache.

the class SavepointWriterWindowITCase method testTumbleWindowWithEvictor.

@Test
public void testTumbleWindowWithEvictor() throws Exception {
    final String savepointPath = getTempDirPath(new AbstractID().toHexString());
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.setStateBackend(stateBackend);
    env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);
    DataStream<Tuple2<String, Integer>> bootstrapData = env.fromCollection(WORDS).map(word -> Tuple2.of(word, 1)).returns(TUPLE_TYPE_INFO).assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>noWatermarks().withTimestampAssigner((record, ts) -> 2L));
    WindowedStateTransformation<Tuple2<String, Integer>, String, TimeWindow> transformation = OperatorTransformation.bootstrapWith(bootstrapData).keyBy(tuple -> tuple.f0, Types.STRING).window(TumblingEventTimeWindows.of(Time.milliseconds(5))).evictor(CountEvictor.of(1));
    SavepointWriter.newSavepoint(stateBackend, 128).withOperator(UID, windowBootstrap.bootstrap(transformation)).write(savepointPath);
    env.execute("write state");
    WindowedStream<Tuple2<String, Integer>, String, TimeWindow> stream = env.addSource(new MaxWatermarkSource<Tuple2<String, Integer>>()).returns(TUPLE_TYPE_INFO).keyBy(tuple -> tuple.f0).window(TumblingEventTimeWindows.of(Time.milliseconds(5))).evictor(CountEvictor.of(1));
    DataStream<Tuple2<String, Integer>> windowed = windowStream.window(stream).uid(UID);
    CompletableFuture<Collection<Tuple2<String, Integer>>> future = collector.collect(windowed);
    submitJob(savepointPath, env);
    Collection<Tuple2<String, Integer>> results = future.get();
    Assert.assertThat("Incorrect results from bootstrapped windows", results, EVICTOR_MATCHER);
}
Also used : Arrays(java.util.Arrays) Tuple3(org.apache.flink.api.java.tuple.Tuple3) Tuple2(org.apache.flink.api.java.tuple.Tuple2) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) RunWith(org.junit.runner.RunWith) CompletableFuture(java.util.concurrent.CompletableFuture) CountEvictor(org.apache.flink.streaming.api.windowing.evictors.CountEvictor) EmbeddedRocksDBStateBackend(org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend) TypeHint(org.apache.flink.api.common.typeinfo.TypeHint) ArrayList(java.util.ArrayList) AggregateFunction(org.apache.flink.api.common.functions.AggregateFunction) StateBackend(org.apache.flink.runtime.state.StateBackend) StreamCollector(org.apache.flink.streaming.util.StreamCollector) WindowedStream(org.apache.flink.streaming.api.datastream.WindowedStream) Collector(org.apache.flink.util.Collector) TimeWindow(org.apache.flink.streaming.api.windowing.windows.TimeWindow) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) ReduceFunction(org.apache.flink.api.common.functions.ReduceFunction) Parameterized(org.junit.runners.Parameterized) AbstractTestBase(org.apache.flink.test.util.AbstractTestBase) Types(org.apache.flink.api.common.typeinfo.Types) Time(org.apache.flink.streaming.api.windowing.time.Time) Iterator(java.util.Iterator) AbstractID(org.apache.flink.util.AbstractID) Collection(java.util.Collection) SingleOutputStreamOperator(org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator) SlidingEventTimeWindows(org.apache.flink.streaming.api.windowing.assigners.SlidingEventTimeWindows) Matchers(org.hamcrest.Matchers) WatermarkStrategy(org.apache.flink.api.common.eventtime.WatermarkStrategy) Test(org.junit.Test) MaxWatermarkSource(org.apache.flink.state.api.utils.MaxWatermarkSource) ProcessWindowFunction(org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction) Collectors(java.util.stream.Collectors) DataStream(org.apache.flink.streaming.api.datastream.DataStream) WindowFunction(org.apache.flink.streaming.api.functions.windowing.WindowFunction) List(java.util.List) Rule(org.junit.Rule) TumblingEventTimeWindows(org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows) ClusterClient(org.apache.flink.client.program.ClusterClient) HashMapStateBackend(org.apache.flink.runtime.state.hashmap.HashMapStateBackend) Matcher(org.hamcrest.Matcher) SerializedThrowable(org.apache.flink.util.SerializedThrowable) Optional(java.util.Optional) Assert(org.junit.Assert) RuntimeExecutionMode(org.apache.flink.api.common.RuntimeExecutionMode) SavepointRestoreSettings(org.apache.flink.runtime.jobgraph.SavepointRestoreSettings) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) TimeWindow(org.apache.flink.streaming.api.windowing.windows.TimeWindow) Tuple2(org.apache.flink.api.java.tuple.Tuple2) Collection(java.util.Collection) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) AbstractID(org.apache.flink.util.AbstractID) Test(org.junit.Test)

Example 14 with DataStream

use of org.apache.flink.streaming.api.datastream.DataStream in project flink by apache.

the class SavepointWriterWindowITCase method testSlideWindowWithEvictor.

@Test
public void testSlideWindowWithEvictor() throws Exception {
    final String savepointPath = getTempDirPath(new AbstractID().toHexString());
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.setStateBackend(stateBackend);
    env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);
    DataStream<Tuple2<String, Integer>> bootstrapData = env.fromCollection(WORDS).map(word -> Tuple2.of(word, 1)).returns(TUPLE_TYPE_INFO).assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>noWatermarks().withTimestampAssigner((record, ts) -> 2L));
    WindowedStateTransformation<Tuple2<String, Integer>, String, TimeWindow> transformation = OperatorTransformation.bootstrapWith(bootstrapData).keyBy(tuple -> tuple.f0, Types.STRING).window(SlidingEventTimeWindows.of(Time.milliseconds(5), Time.milliseconds(1))).evictor(CountEvictor.of(1));
    SavepointWriter.newSavepoint(stateBackend, 128).withOperator(UID, windowBootstrap.bootstrap(transformation)).write(savepointPath);
    env.execute("write state");
    WindowedStream<Tuple2<String, Integer>, String, TimeWindow> stream = env.addSource(new MaxWatermarkSource<Tuple2<String, Integer>>()).returns(TUPLE_TYPE_INFO).keyBy(tuple -> tuple.f0).window(SlidingEventTimeWindows.of(Time.milliseconds(5), Time.milliseconds(1))).evictor(CountEvictor.of(1));
    DataStream<Tuple2<String, Integer>> windowed = windowStream.window(stream).uid(UID);
    CompletableFuture<Collection<Tuple2<String, Integer>>> future = collector.collect(windowed);
    submitJob(savepointPath, env);
    Collection<Tuple2<String, Integer>> results = future.get().stream().distinct().collect(Collectors.toList());
    Assert.assertThat("Incorrect results from bootstrapped windows", results, EVICTOR_MATCHER);
}
Also used : Arrays(java.util.Arrays) Tuple3(org.apache.flink.api.java.tuple.Tuple3) Tuple2(org.apache.flink.api.java.tuple.Tuple2) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) RunWith(org.junit.runner.RunWith) CompletableFuture(java.util.concurrent.CompletableFuture) CountEvictor(org.apache.flink.streaming.api.windowing.evictors.CountEvictor) EmbeddedRocksDBStateBackend(org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend) TypeHint(org.apache.flink.api.common.typeinfo.TypeHint) ArrayList(java.util.ArrayList) AggregateFunction(org.apache.flink.api.common.functions.AggregateFunction) StateBackend(org.apache.flink.runtime.state.StateBackend) StreamCollector(org.apache.flink.streaming.util.StreamCollector) WindowedStream(org.apache.flink.streaming.api.datastream.WindowedStream) Collector(org.apache.flink.util.Collector) TimeWindow(org.apache.flink.streaming.api.windowing.windows.TimeWindow) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) ReduceFunction(org.apache.flink.api.common.functions.ReduceFunction) Parameterized(org.junit.runners.Parameterized) AbstractTestBase(org.apache.flink.test.util.AbstractTestBase) Types(org.apache.flink.api.common.typeinfo.Types) Time(org.apache.flink.streaming.api.windowing.time.Time) Iterator(java.util.Iterator) AbstractID(org.apache.flink.util.AbstractID) Collection(java.util.Collection) SingleOutputStreamOperator(org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator) SlidingEventTimeWindows(org.apache.flink.streaming.api.windowing.assigners.SlidingEventTimeWindows) Matchers(org.hamcrest.Matchers) WatermarkStrategy(org.apache.flink.api.common.eventtime.WatermarkStrategy) Test(org.junit.Test) MaxWatermarkSource(org.apache.flink.state.api.utils.MaxWatermarkSource) ProcessWindowFunction(org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction) Collectors(java.util.stream.Collectors) DataStream(org.apache.flink.streaming.api.datastream.DataStream) WindowFunction(org.apache.flink.streaming.api.functions.windowing.WindowFunction) List(java.util.List) Rule(org.junit.Rule) TumblingEventTimeWindows(org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows) ClusterClient(org.apache.flink.client.program.ClusterClient) HashMapStateBackend(org.apache.flink.runtime.state.hashmap.HashMapStateBackend) Matcher(org.hamcrest.Matcher) SerializedThrowable(org.apache.flink.util.SerializedThrowable) Optional(java.util.Optional) Assert(org.junit.Assert) RuntimeExecutionMode(org.apache.flink.api.common.RuntimeExecutionMode) SavepointRestoreSettings(org.apache.flink.runtime.jobgraph.SavepointRestoreSettings) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) TimeWindow(org.apache.flink.streaming.api.windowing.windows.TimeWindow) Tuple2(org.apache.flink.api.java.tuple.Tuple2) Collection(java.util.Collection) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) AbstractID(org.apache.flink.util.AbstractID) Test(org.junit.Test)

Example 15 with DataStream

use of org.apache.flink.streaming.api.datastream.DataStream in project flink by apache.

the class SavepointWriterWindowITCase method testTumbleWindow.

@Test
public void testTumbleWindow() throws Exception {
    final String savepointPath = getTempDirPath(new AbstractID().toHexString());
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.setStateBackend(stateBackend);
    env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);
    DataStream<Tuple2<String, Integer>> bootstrapData = env.fromCollection(WORDS).map(word -> Tuple2.of(word, 1)).returns(TUPLE_TYPE_INFO).assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>noWatermarks().withTimestampAssigner((record, ts) -> 2L));
    WindowedStateTransformation<Tuple2<String, Integer>, String, TimeWindow> transformation = OperatorTransformation.bootstrapWith(bootstrapData).keyBy(tuple -> tuple.f0, Types.STRING).window(TumblingEventTimeWindows.of(Time.milliseconds(5)));
    SavepointWriter.newSavepoint(stateBackend, 128).withOperator(UID, windowBootstrap.bootstrap(transformation)).write(savepointPath);
    env.execute("write state");
    WindowedStream<Tuple2<String, Integer>, String, TimeWindow> stream = env.addSource(new MaxWatermarkSource<Tuple2<String, Integer>>()).returns(TUPLE_TYPE_INFO).keyBy(tuple -> tuple.f0).window(TumblingEventTimeWindows.of(Time.milliseconds(5)));
    DataStream<Tuple2<String, Integer>> windowed = windowStream.window(stream).uid(UID);
    CompletableFuture<Collection<Tuple2<String, Integer>>> future = collector.collect(windowed);
    submitJob(savepointPath, env);
    Collection<Tuple2<String, Integer>> results = future.get();
    Assert.assertThat("Incorrect results from bootstrapped windows", results, STANDARD_MATCHER);
}
Also used : Arrays(java.util.Arrays) Tuple3(org.apache.flink.api.java.tuple.Tuple3) Tuple2(org.apache.flink.api.java.tuple.Tuple2) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) RunWith(org.junit.runner.RunWith) CompletableFuture(java.util.concurrent.CompletableFuture) CountEvictor(org.apache.flink.streaming.api.windowing.evictors.CountEvictor) EmbeddedRocksDBStateBackend(org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend) TypeHint(org.apache.flink.api.common.typeinfo.TypeHint) ArrayList(java.util.ArrayList) AggregateFunction(org.apache.flink.api.common.functions.AggregateFunction) StateBackend(org.apache.flink.runtime.state.StateBackend) StreamCollector(org.apache.flink.streaming.util.StreamCollector) WindowedStream(org.apache.flink.streaming.api.datastream.WindowedStream) Collector(org.apache.flink.util.Collector) TimeWindow(org.apache.flink.streaming.api.windowing.windows.TimeWindow) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) ReduceFunction(org.apache.flink.api.common.functions.ReduceFunction) Parameterized(org.junit.runners.Parameterized) AbstractTestBase(org.apache.flink.test.util.AbstractTestBase) Types(org.apache.flink.api.common.typeinfo.Types) Time(org.apache.flink.streaming.api.windowing.time.Time) Iterator(java.util.Iterator) AbstractID(org.apache.flink.util.AbstractID) Collection(java.util.Collection) SingleOutputStreamOperator(org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator) SlidingEventTimeWindows(org.apache.flink.streaming.api.windowing.assigners.SlidingEventTimeWindows) Matchers(org.hamcrest.Matchers) WatermarkStrategy(org.apache.flink.api.common.eventtime.WatermarkStrategy) Test(org.junit.Test) MaxWatermarkSource(org.apache.flink.state.api.utils.MaxWatermarkSource) ProcessWindowFunction(org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction) Collectors(java.util.stream.Collectors) DataStream(org.apache.flink.streaming.api.datastream.DataStream) WindowFunction(org.apache.flink.streaming.api.functions.windowing.WindowFunction) List(java.util.List) Rule(org.junit.Rule) TumblingEventTimeWindows(org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows) ClusterClient(org.apache.flink.client.program.ClusterClient) HashMapStateBackend(org.apache.flink.runtime.state.hashmap.HashMapStateBackend) Matcher(org.hamcrest.Matcher) SerializedThrowable(org.apache.flink.util.SerializedThrowable) Optional(java.util.Optional) Assert(org.junit.Assert) RuntimeExecutionMode(org.apache.flink.api.common.RuntimeExecutionMode) SavepointRestoreSettings(org.apache.flink.runtime.jobgraph.SavepointRestoreSettings) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) TimeWindow(org.apache.flink.streaming.api.windowing.windows.TimeWindow) MaxWatermarkSource(org.apache.flink.state.api.utils.MaxWatermarkSource) Tuple2(org.apache.flink.api.java.tuple.Tuple2) Collection(java.util.Collection) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) AbstractID(org.apache.flink.util.AbstractID) Test(org.junit.Test)

Aggregations

DataStream (org.apache.flink.streaming.api.datastream.DataStream)87 StreamExecutionEnvironment (org.apache.flink.streaming.api.environment.StreamExecutionEnvironment)78 Test (org.junit.Test)70 List (java.util.List)62 Collector (org.apache.flink.util.Collector)60 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)50 SingleOutputStreamOperator (org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator)48 Arrays (java.util.Arrays)46 ArrayList (java.util.ArrayList)40 TypeInformation (org.apache.flink.api.common.typeinfo.TypeInformation)40 Assert.assertEquals (org.junit.Assert.assertEquals)38 WatermarkStrategy (org.apache.flink.api.common.eventtime.WatermarkStrategy)36 Configuration (org.apache.flink.configuration.Configuration)36 Assert.assertTrue (org.junit.Assert.assertTrue)33 BasicTypeInfo (org.apache.flink.api.common.typeinfo.BasicTypeInfo)32 StreamOperator (org.apache.flink.streaming.api.operators.StreamOperator)32 Types (org.apache.flink.api.common.typeinfo.Types)31 Assert (org.junit.Assert)31 ReduceFunction (org.apache.flink.api.common.functions.ReduceFunction)29 JobGraph (org.apache.flink.runtime.jobgraph.JobGraph)29