Search in sources :

Example 21 with ReduceFunction

use of org.apache.flink.api.common.functions.ReduceFunction in project flink by apache.

the class StateDescriptorPassingTest method testReduceWindowAllState.

@Test
public void testReduceWindowAllState() {
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.registerTypeWithKryoSerializer(File.class, JavaSerializer.class);
    // simulate ingestion time
    DataStream<File> src = env.fromElements(new File("/")).assignTimestampsAndWatermarks(WatermarkStrategy.<File>forMonotonousTimestamps().withTimestampAssigner((file, ts) -> System.currentTimeMillis()));
    SingleOutputStreamOperator<?> result = src.windowAll(TumblingEventTimeWindows.of(Time.milliseconds(1000))).reduce(new ReduceFunction<File>() {

        @Override
        public File reduce(File value1, File value2) {
            return null;
        }
    });
    validateStateDescriptorConfigured(result);
}
Also used : Kryo(com.esotericsoftware.kryo.Kryo) Collector(org.apache.flink.util.Collector) TimeWindow(org.apache.flink.streaming.api.windowing.windows.TimeWindow) ProcessAllWindowFunction(org.apache.flink.streaming.api.functions.windowing.ProcessAllWindowFunction) ListStateDescriptor(org.apache.flink.api.common.state.ListStateDescriptor) ReduceFunction(org.apache.flink.api.common.functions.ReduceFunction) JavaSerializer(com.esotericsoftware.kryo.serializers.JavaSerializer) Time(org.apache.flink.streaming.api.windowing.time.Time) TypeSerializer(org.apache.flink.api.common.typeutils.TypeSerializer) KeySelector(org.apache.flink.api.java.functions.KeySelector) StateDescriptor(org.apache.flink.api.common.state.StateDescriptor) KryoSerializer(org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer) SingleOutputStreamOperator(org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator) WindowOperator(org.apache.flink.streaming.runtime.operators.windowing.WindowOperator) Assert.assertTrue(org.junit.Assert.assertTrue) WatermarkStrategy(org.apache.flink.api.common.eventtime.WatermarkStrategy) Test(org.junit.Test) ProcessWindowFunction(org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction) OneInputTransformation(org.apache.flink.streaming.api.transformations.OneInputTransformation) File(java.io.File) DataStream(org.apache.flink.streaming.api.datastream.DataStream) WindowFunction(org.apache.flink.streaming.api.functions.windowing.WindowFunction) TumblingEventTimeWindows(org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows) AllWindowFunction(org.apache.flink.streaming.api.functions.windowing.AllWindowFunction) ListSerializer(org.apache.flink.api.common.typeutils.base.ListSerializer) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) File(java.io.File) Test(org.junit.Test)

Example 22 with ReduceFunction

use of org.apache.flink.api.common.functions.ReduceFunction in project flink by apache.

the class StreamingRuntimeContextTest method testReducingStateInstantiation.

@Test
public void testReducingStateInstantiation() throws Exception {
    final ExecutionConfig config = new ExecutionConfig();
    config.registerKryoType(Path.class);
    final AtomicReference<Object> descriptorCapture = new AtomicReference<>();
    StreamingRuntimeContext context = createRuntimeContext(descriptorCapture, config);
    @SuppressWarnings("unchecked") ReduceFunction<TaskInfo> reducer = (ReduceFunction<TaskInfo>) mock(ReduceFunction.class);
    ReducingStateDescriptor<TaskInfo> descr = new ReducingStateDescriptor<>("name", reducer, TaskInfo.class);
    context.getReducingState(descr);
    StateDescriptor<?, ?> descrIntercepted = (StateDescriptor<?, ?>) descriptorCapture.get();
    TypeSerializer<?> serializer = descrIntercepted.getSerializer();
    // check that the Path class is really registered, i.e., the execution config was applied
    assertTrue(serializer instanceof KryoSerializer);
    assertTrue(((KryoSerializer<?>) serializer).getKryo().getRegistration(Path.class).getId() > 0);
}
Also used : Path(org.apache.flink.core.fs.Path) ReducingStateDescriptor(org.apache.flink.api.common.state.ReducingStateDescriptor) ReduceFunction(org.apache.flink.api.common.functions.ReduceFunction) AtomicReference(java.util.concurrent.atomic.AtomicReference) ExecutionConfig(org.apache.flink.api.common.ExecutionConfig) KryoSerializer(org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer) TaskInfo(org.apache.flink.api.common.TaskInfo) ReducingStateDescriptor(org.apache.flink.api.common.state.ReducingStateDescriptor) ListStateDescriptor(org.apache.flink.api.common.state.ListStateDescriptor) MapStateDescriptor(org.apache.flink.api.common.state.MapStateDescriptor) AggregatingStateDescriptor(org.apache.flink.api.common.state.AggregatingStateDescriptor) StateDescriptor(org.apache.flink.api.common.state.StateDescriptor) ValueStateDescriptor(org.apache.flink.api.common.state.ValueStateDescriptor) Test(org.junit.Test)

Example 23 with ReduceFunction

use of org.apache.flink.api.common.functions.ReduceFunction in project flink by apache.

the class ReinterpretDataStreamAsKeyedStreamITCase method testReinterpretAsKeyedStream.

/**
 * This test checks that reinterpreting a data stream to a keyed stream works as expected. This
 * test consists of two jobs. The first job materializes a keyBy into files, one files per
 * partition. The second job opens the files created by the first jobs as sources (doing the
 * correct assignment of files to partitions) and reinterprets the sources as keyed, because we
 * know they have been partitioned in a keyBy from the first job.
 */
@Test
public void testReinterpretAsKeyedStream() throws Exception {
    final int maxParallelism = 8;
    final int numEventsPerInstance = 100;
    final int parallelism = 3;
    final int numTotalEvents = numEventsPerInstance * parallelism;
    final int numUniqueKeys = 100;
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.setMaxParallelism(maxParallelism);
    env.setParallelism(parallelism);
    env.enableCheckpointing(100);
    env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 0L));
    final List<File> partitionFiles = new ArrayList<>(parallelism);
    for (int i = 0; i < parallelism; ++i) {
        File partitionFile = temporaryFolder.newFile();
        partitionFiles.add(i, partitionFile);
    }
    env.addSource(new RandomTupleSource(numEventsPerInstance, numUniqueKeys)).keyBy(0).addSink(new ToPartitionFileSink(partitionFiles));
    env.execute();
    DataStream<Tuple2<Integer, Integer>> source = env.addSource(new FromPartitionFileSource(partitionFiles)).assignTimestampsAndWatermarks(IngestionTimeWatermarkStrategy.create());
    DataStreamUtils.reinterpretAsKeyedStream(source, (KeySelector<Tuple2<Integer, Integer>, Integer>) value -> value.f0, TypeInformation.of(Integer.class)).window(TumblingEventTimeWindows.of(Time.seconds(// test that also timers and aggregated state work as
    1))).reduce((ReduceFunction<Tuple2<Integer, Integer>>) (value1, value2) -> new Tuple2<>(value1.f0, value1.f1 + value2.f1)).addSink(new ValidatingSink(numTotalEvents)).setParallelism(1);
    env.execute();
}
Also used : DataInputStream(java.io.DataInputStream) BufferedInputStream(java.io.BufferedInputStream) WatermarkGenerator(org.apache.flink.api.common.eventtime.WatermarkGenerator) Tuple2(org.apache.flink.api.java.tuple.Tuple2) Random(java.util.Random) TimestampAssigner(org.apache.flink.api.common.eventtime.TimestampAssigner) RestartStrategies(org.apache.flink.api.common.restartstrategy.RestartStrategies) FunctionSnapshotContext(org.apache.flink.runtime.state.FunctionSnapshotContext) BufferedOutputStream(java.io.BufferedOutputStream) ArrayList(java.util.ArrayList) ListState(org.apache.flink.api.common.state.ListState) DataOutputStream(java.io.DataOutputStream) CheckpointListener(org.apache.flink.api.common.state.CheckpointListener) RichParallelSourceFunction(org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction) ListStateDescriptor(org.apache.flink.api.common.state.ListStateDescriptor) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) ReduceFunction(org.apache.flink.api.common.functions.ReduceFunction) Time(org.apache.flink.streaming.api.windowing.time.Time) KeySelector(org.apache.flink.api.java.functions.KeySelector) CheckpointedFunction(org.apache.flink.streaming.api.checkpoint.CheckpointedFunction) FunctionInitializationContext(org.apache.flink.runtime.state.FunctionInitializationContext) Configuration(org.apache.flink.configuration.Configuration) DataStreamUtils(org.apache.flink.streaming.api.datastream.DataStreamUtils) AscendingTimestampsWatermarks(org.apache.flink.api.common.eventtime.AscendingTimestampsWatermarks) FileOutputStream(java.io.FileOutputStream) WatermarkStrategy(org.apache.flink.api.common.eventtime.WatermarkStrategy) Test(org.junit.Test) FileInputStream(java.io.FileInputStream) Preconditions(org.apache.flink.util.Preconditions) File(java.io.File) WatermarkGeneratorSupplier(org.apache.flink.api.common.eventtime.WatermarkGeneratorSupplier) RichSinkFunction(org.apache.flink.streaming.api.functions.sink.RichSinkFunction) DataStream(org.apache.flink.streaming.api.datastream.DataStream) List(java.util.List) Rule(org.junit.Rule) TumblingEventTimeWindows(org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows) ParallelSourceFunction(org.apache.flink.streaming.api.functions.source.ParallelSourceFunction) Assert(org.junit.Assert) TimestampAssignerSupplier(org.apache.flink.api.common.eventtime.TimestampAssignerSupplier) TemporaryFolder(org.junit.rules.TemporaryFolder) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) ArrayList(java.util.ArrayList) Tuple2(org.apache.flink.api.java.tuple.Tuple2) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) File(java.io.File) Test(org.junit.Test)

Example 24 with ReduceFunction

use of org.apache.flink.api.common.functions.ReduceFunction in project flink by apache.

the class IterationWithAllReducerITCase method testProgram.

@Override
protected void testProgram() throws Exception {
    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(4);
    DataSet<String> initialInput = env.fromElements("1", "1", "1", "1", "1", "1", "1", "1");
    IterativeDataSet<String> iteration = initialInput.iterate(5).name("Loop");
    DataSet<String> sumReduce = iteration.reduce(new ReduceFunction<String>() {

        @Override
        public String reduce(String value1, String value2) throws Exception {
            return value1;
        }
    }).name("Compute sum (Reduce)");
    List<String> result = iteration.closeWith(sumReduce).collect();
    compareResultAsText(result, EXPECTED);
}
Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) ReduceFunction(org.apache.flink.api.common.functions.ReduceFunction)

Example 25 with ReduceFunction

use of org.apache.flink.api.common.functions.ReduceFunction in project flink by apache.

the class WordCountSubclassPOJOITCase method testProgram.

@Override
protected void testProgram() throws Exception {
    final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    DataSet<String> text = env.readTextFile(textPath);
    DataSet<WCBase> counts = text.flatMap(new Tokenizer()).groupBy("word").reduce(new ReduceFunction<WCBase>() {

        private static final long serialVersionUID = 1L;

        public WCBase reduce(WCBase value1, WCBase value2) {
            WC wc1 = (WC) value1;
            WC wc2 = (WC) value2;
            return new WC(value1.word, wc1.secretCount + wc2.secretCount);
        }
    }).map(new MapFunction<WCBase, WCBase>() {

        @Override
        public WCBase map(WCBase value) throws Exception {
            WC wc = (WC) value;
            wc.count = wc.secretCount;
            return wc;
        }
    });
    counts.writeAsText(resultPath);
    env.execute("WordCount with custom data types example");
}
Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) ReduceFunction(org.apache.flink.api.common.functions.ReduceFunction)

Aggregations

ReduceFunction (org.apache.flink.api.common.functions.ReduceFunction)26 Test (org.junit.Test)16 ExecutionConfig (org.apache.flink.api.common.ExecutionConfig)8 ExecutionEnvironment (org.apache.flink.api.java.ExecutionEnvironment)8 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)8 StreamExecutionEnvironment (org.apache.flink.streaming.api.environment.StreamExecutionEnvironment)7 Configuration (org.apache.flink.configuration.Configuration)6 File (java.io.File)5 ListStateDescriptor (org.apache.flink.api.common.state.ListStateDescriptor)5 KryoSerializer (org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer)5 ArrayList (java.util.ArrayList)4 WatermarkStrategy (org.apache.flink.api.common.eventtime.WatermarkStrategy)4 RichReduceFunction (org.apache.flink.api.common.functions.RichReduceFunction)4 TypeSerializer (org.apache.flink.api.common.typeutils.TypeSerializer)4 KeySelector (org.apache.flink.api.java.functions.KeySelector)4 DataStream (org.apache.flink.streaming.api.datastream.DataStream)4 HashMap (java.util.HashMap)3 List (java.util.List)3 TaskInfo (org.apache.flink.api.common.TaskInfo)3 MapStateDescriptor (org.apache.flink.api.common.state.MapStateDescriptor)3