Examples with MapFunction - org.apache.flink.api.common.functions.MapFunction

Example 11 with MapFunction

use of org.apache.flink.api.common.functions.MapFunction in project flink by apache.

the class CheckpointingCustomKvStateProgram method main.

public static void main(String[] args) throws Exception {
    final String jarFile = args[0];
    final String host = args[1];
    final int port = Integer.parseInt(args[2]);
    final String checkpointPath = args[3];
    final String outputPath = args[4];
    final int parallelism = 1;
    StreamExecutionEnvironment env = StreamExecutionEnvironment.createRemoteEnvironment(host, port, jarFile);
    env.setParallelism(parallelism);
    env.getConfig().disableSysoutLogging();
    env.enableCheckpointing(100);
    env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1000));
    env.setStateBackend(new FsStateBackend(checkpointPath));
    DataStream<Integer> source = env.addSource(new InfiniteIntegerSource());
    source.map(new MapFunction<Integer, Tuple2<Integer, Integer>>() {

        private static final long serialVersionUID = 1L;

        @Override
        public Tuple2<Integer, Integer> map(Integer value) throws Exception {
            return new Tuple2<>(ThreadLocalRandom.current().nextInt(parallelism), value);
        }
    }).keyBy(new KeySelector<Tuple2<Integer, Integer>, Integer>() {

        private static final long serialVersionUID = 1L;

        @Override
        public Integer getKey(Tuple2<Integer, Integer> value) throws Exception {
            return value.f0;
        }
    }).flatMap(new ReducingStateFlatMap()).writeAsText(outputPath, FileSystem.WriteMode.OVERWRITE);
    env.execute();
}

Also used : RichFlatMapFunction(org.apache.flink.api.common.functions.RichFlatMapFunction) MapFunction(org.apache.flink.api.common.functions.MapFunction) SuccessException(org.apache.flink.test.util.SuccessException) IOException(java.io.IOException) Tuple2(org.apache.flink.api.java.tuple.Tuple2) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) FsStateBackend(org.apache.flink.runtime.state.filesystem.FsStateBackend)

Example 12 with MapFunction

use of org.apache.flink.api.common.functions.MapFunction in project flink by apache.

the class CustomInputSplitProgram method main.

public static void main(String[] args) throws Exception {
    final String[] jarFile = (args[0].equals("")) ? null : new String[] { args[0] };
    final URL[] classpath = (args[1].equals("")) ? null : new URL[] { new URL(args[1]) };
    final String host = args[2];
    final int port = Integer.parseInt(args[3]);
    final int parallelism = Integer.parseInt(args[4]);
    RemoteEnvironment env = new RemoteEnvironment(host, port, null, jarFile, classpath);
    env.setParallelism(parallelism);
    env.getConfig().disableSysoutLogging();
    DataSet<Integer> data = env.createInput(new CustomInputFormat());
    data.map(new MapFunction<Integer, Tuple2<Integer, Double>>() {

        @Override
        public Tuple2<Integer, Double> map(Integer value) {
            return new Tuple2<Integer, Double>(value, value * 0.5);
        }
    }).output(new DiscardingOutputFormat<Tuple2<Integer, Double>>());
    env.execute();
}

Also used : RemoteEnvironment(org.apache.flink.api.java.RemoteEnvironment) MapFunction(org.apache.flink.api.common.functions.MapFunction) URL(java.net.URL) Tuple2(org.apache.flink.api.java.tuple.Tuple2)

Example 13 with MapFunction

use of org.apache.flink.api.common.functions.MapFunction in project flink by apache.

the class StreamingCustomInputSplitProgram method main.

public static void main(String[] args) throws Exception {
    final String jarFile = args[0];
    final String host = args[1];
    final int port = Integer.parseInt(args[2]);
    final int parallelism = Integer.parseInt(args[3]);
    Configuration config = new Configuration();
    config.setString(ConfigConstants.AKKA_ASK_TIMEOUT, "5 s");
    StreamExecutionEnvironment env = StreamExecutionEnvironment.createRemoteEnvironment(host, port, config, jarFile);
    env.getConfig().disableSysoutLogging();
    env.setParallelism(parallelism);
    DataStream<Integer> data = env.createInput(new CustomInputFormat());
    data.map(new MapFunction<Integer, Tuple2<Integer, Double>>() {

        @Override
        public Tuple2<Integer, Double> map(Integer value) throws Exception {
            return new Tuple2<Integer, Double>(value, value * 0.5);
        }
    }).addSink(new NoOpSink());
    env.execute();
}

Also used : Configuration(org.apache.flink.configuration.Configuration) Tuple2(org.apache.flink.api.java.tuple.Tuple2) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) MapFunction(org.apache.flink.api.common.functions.MapFunction)

Example 14 with MapFunction

use of org.apache.flink.api.common.functions.MapFunction in project flink by apache.

the class PregelCompilerTest method testPregelCompilerWithBroadcastVariable.

@SuppressWarnings("serial")
@Test
public void testPregelCompilerWithBroadcastVariable() {
    try {
        final String BC_VAR_NAME = "borat variable";
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(DEFAULT_PARALLELISM);
        // compose test program
        {
            DataSet<Long> bcVar = env.fromElements(1L);
            DataSet<Vertex<Long, Long>> initialVertices = env.fromElements(new Tuple2<>(1L, 1L), new Tuple2<>(2L, 2L)).map(new Tuple2ToVertexMap<Long, Long>());
            DataSet<Edge<Long, NullValue>> edges = env.fromElements(new Tuple2<>(1L, 2L)).map(new MapFunction<Tuple2<Long, Long>, Edge<Long, NullValue>>() {

                public Edge<Long, NullValue> map(Tuple2<Long, Long> edge) {
                    return new Edge<>(edge.f0, edge.f1, NullValue.getInstance());
                }
            });
            Graph<Long, Long, NullValue> graph = Graph.fromDataSet(initialVertices, edges, env);
            VertexCentricConfiguration parameters = new VertexCentricConfiguration();
            parameters.addBroadcastSet(BC_VAR_NAME, bcVar);
            DataSet<Vertex<Long, Long>> result = graph.runVertexCentricIteration(new CCCompute(), null, 100, parameters).getVertices();
            result.output(new DiscardingOutputFormat<Vertex<Long, Long>>());
        }
        Plan p = env.createProgramPlan("Pregel Connected Components");
        OptimizedPlan op = compileNoStats(p);
        // check the sink
        SinkPlanNode sink = op.getDataSinks().iterator().next();
        assertEquals(ShipStrategyType.FORWARD, sink.getInput().getShipStrategy());
        assertEquals(DEFAULT_PARALLELISM, sink.getParallelism());
        // check the iteration
        WorksetIterationPlanNode iteration = (WorksetIterationPlanNode) sink.getInput().getSource();
        assertEquals(DEFAULT_PARALLELISM, iteration.getParallelism());
        // check the solution set delta
        PlanNode ssDelta = iteration.getSolutionSetDeltaPlanNode();
        assertTrue(ssDelta instanceof SingleInputPlanNode);
        SingleInputPlanNode ssFlatMap = (SingleInputPlanNode) ((SingleInputPlanNode) (ssDelta)).getInput().getSource();
        assertEquals(DEFAULT_PARALLELISM, ssFlatMap.getParallelism());
        assertEquals(ShipStrategyType.FORWARD, ssFlatMap.getInput().getShipStrategy());
        // check the computation coGroup
        DualInputPlanNode computationCoGroup = (DualInputPlanNode) (ssFlatMap.getInput().getSource());
        assertEquals(DEFAULT_PARALLELISM, computationCoGroup.getParallelism());
        assertEquals(ShipStrategyType.FORWARD, computationCoGroup.getInput1().getShipStrategy());
        assertEquals(ShipStrategyType.PARTITION_HASH, computationCoGroup.getInput2().getShipStrategy());
        assertTrue(computationCoGroup.getInput2().getTempMode().isCached());
        assertEquals(new FieldList(0), computationCoGroup.getInput2().getShipStrategyKeys());
        // check that the initial partitioning is pushed out of the loop
        assertEquals(ShipStrategyType.PARTITION_HASH, iteration.getInput1().getShipStrategy());
        assertEquals(new FieldList(0), iteration.getInput1().getShipStrategyKeys());
    } catch (Exception e) {
        System.err.println(e.getMessage());
        e.printStackTrace();
        fail(e.getMessage());
    }
}

Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) Tuple2ToVertexMap(org.apache.flink.graph.utils.Tuple2ToVertexMap) DataSet(org.apache.flink.api.java.DataSet) WorksetIterationPlanNode(org.apache.flink.optimizer.plan.WorksetIterationPlanNode) MapFunction(org.apache.flink.api.common.functions.MapFunction) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) DiscardingOutputFormat(org.apache.flink.api.java.io.DiscardingOutputFormat) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) FieldList(org.apache.flink.api.common.operators.util.FieldList) SingleInputPlanNode(org.apache.flink.optimizer.plan.SingleInputPlanNode) DualInputPlanNode(org.apache.flink.optimizer.plan.DualInputPlanNode) NullValue(org.apache.flink.types.NullValue) Graph(org.apache.flink.graph.Graph) WorksetIterationPlanNode(org.apache.flink.optimizer.plan.WorksetIterationPlanNode) DualInputPlanNode(org.apache.flink.optimizer.plan.DualInputPlanNode) PlanNode(org.apache.flink.optimizer.plan.PlanNode) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) SingleInputPlanNode(org.apache.flink.optimizer.plan.SingleInputPlanNode) Tuple2(org.apache.flink.api.java.tuple.Tuple2) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) Edge(org.apache.flink.graph.Edge) Test(org.junit.Test)

Example 15 with MapFunction

use of org.apache.flink.api.common.functions.MapFunction in project flink by apache.

the class SpargelCompilerTest method testSpargelCompilerWithBroadcastVariable.

@SuppressWarnings("serial")
@Test
public void testSpargelCompilerWithBroadcastVariable() {
    try {
        final String BC_VAR_NAME = "borat variable";
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(DEFAULT_PARALLELISM);
        // compose test program
        {
            DataSet<Long> bcVar = env.fromElements(1L);
            DataSet<Vertex<Long, Long>> initialVertices = env.fromElements(new Tuple2<>(1L, 1L), new Tuple2<>(2L, 2L)).map(new Tuple2ToVertexMap<Long, Long>());
            DataSet<Edge<Long, NullValue>> edges = env.fromElements(new Tuple2<>(1L, 2L)).map(new MapFunction<Tuple2<Long, Long>, Edge<Long, NullValue>>() {

                public Edge<Long, NullValue> map(Tuple2<Long, Long> edge) {
                    return new Edge<>(edge.f0, edge.f1, NullValue.getInstance());
                }
            });
            Graph<Long, Long, NullValue> graph = Graph.fromDataSet(initialVertices, edges, env);
            ScatterGatherConfiguration parameters = new ScatterGatherConfiguration();
            parameters.addBroadcastSetForScatterFunction(BC_VAR_NAME, bcVar);
            parameters.addBroadcastSetForGatherFunction(BC_VAR_NAME, bcVar);
            DataSet<Vertex<Long, Long>> result = graph.runScatterGatherIteration(new ConnectedComponents.CCMessenger<Long, Long>(BasicTypeInfo.LONG_TYPE_INFO), new ConnectedComponents.CCUpdater<Long, Long>(), 100).getVertices();
            result.output(new DiscardingOutputFormat<Vertex<Long, Long>>());
        }
        Plan p = env.createProgramPlan("Spargel Connected Components");
        OptimizedPlan op = compileNoStats(p);
        // check the sink
        SinkPlanNode sink = op.getDataSinks().iterator().next();
        assertEquals(ShipStrategyType.FORWARD, sink.getInput().getShipStrategy());
        assertEquals(DEFAULT_PARALLELISM, sink.getParallelism());
        // check the iteration
        WorksetIterationPlanNode iteration = (WorksetIterationPlanNode) sink.getInput().getSource();
        assertEquals(DEFAULT_PARALLELISM, iteration.getParallelism());
        // check the solution set join and the delta
        PlanNode ssDelta = iteration.getSolutionSetDeltaPlanNode();
        // this is only true if the update functions preserves the partitioning
        assertTrue(ssDelta instanceof DualInputPlanNode);
        DualInputPlanNode ssJoin = (DualInputPlanNode) ssDelta;
        assertEquals(DEFAULT_PARALLELISM, ssJoin.getParallelism());
        assertEquals(ShipStrategyType.PARTITION_HASH, ssJoin.getInput1().getShipStrategy());
        assertEquals(new FieldList(0), ssJoin.getInput1().getShipStrategyKeys());
        // check the workset set join
        DualInputPlanNode edgeJoin = (DualInputPlanNode) ssJoin.getInput1().getSource();
        assertEquals(DEFAULT_PARALLELISM, edgeJoin.getParallelism());
        assertEquals(ShipStrategyType.PARTITION_HASH, edgeJoin.getInput1().getShipStrategy());
        assertEquals(ShipStrategyType.FORWARD, edgeJoin.getInput2().getShipStrategy());
        assertTrue(edgeJoin.getInput1().getTempMode().isCached());
        assertEquals(new FieldList(0), edgeJoin.getInput1().getShipStrategyKeys());
        // check that the initial partitioning is pushed out of the loop
        assertEquals(ShipStrategyType.PARTITION_HASH, iteration.getInput1().getShipStrategy());
        assertEquals(ShipStrategyType.PARTITION_HASH, iteration.getInput2().getShipStrategy());
        assertEquals(new FieldList(0), iteration.getInput1().getShipStrategyKeys());
        assertEquals(new FieldList(0), iteration.getInput2().getShipStrategyKeys());
    } catch (Exception e) {
        System.err.println(e.getMessage());
        e.printStackTrace();
        fail(e.getMessage());
    }
}

Also used : ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) Tuple2ToVertexMap(org.apache.flink.graph.utils.Tuple2ToVertexMap) DataSet(org.apache.flink.api.java.DataSet) WorksetIterationPlanNode(org.apache.flink.optimizer.plan.WorksetIterationPlanNode) MapFunction(org.apache.flink.api.common.functions.MapFunction) Plan(org.apache.flink.api.common.Plan) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) DiscardingOutputFormat(org.apache.flink.api.java.io.DiscardingOutputFormat) OptimizedPlan(org.apache.flink.optimizer.plan.OptimizedPlan) FieldList(org.apache.flink.api.common.operators.util.FieldList) DualInputPlanNode(org.apache.flink.optimizer.plan.DualInputPlanNode) NullValue(org.apache.flink.types.NullValue) Graph(org.apache.flink.graph.Graph) WorksetIterationPlanNode(org.apache.flink.optimizer.plan.WorksetIterationPlanNode) DualInputPlanNode(org.apache.flink.optimizer.plan.DualInputPlanNode) PlanNode(org.apache.flink.optimizer.plan.PlanNode) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) ConnectedComponents(org.apache.flink.graph.library.ConnectedComponents) Tuple2(org.apache.flink.api.java.tuple.Tuple2) SinkPlanNode(org.apache.flink.optimizer.plan.SinkPlanNode) Edge(org.apache.flink.graph.Edge) Test(org.junit.Test)

Aggregations

MapFunction (org.apache.flink.api.common.functions.MapFunction)48 Test (org.junit.Test)31 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)29 ExecutionEnvironment (org.apache.flink.api.java.ExecutionEnvironment)19 StreamExecutionEnvironment (org.apache.flink.streaming.api.environment.StreamExecutionEnvironment)19 Configuration (org.apache.flink.configuration.Configuration)10 FlatMapFunction (org.apache.flink.api.common.functions.FlatMapFunction)9 Plan (org.apache.flink.api.common.Plan)8 RichMapFunction (org.apache.flink.api.common.functions.RichMapFunction)8 OptimizedPlan (org.apache.flink.optimizer.plan.OptimizedPlan)8 RichFlatMapFunction (org.apache.flink.api.common.functions.RichFlatMapFunction)7 JobGraph (org.apache.flink.runtime.jobgraph.JobGraph)7 DiscardingOutputFormat (org.apache.flink.api.java.io.DiscardingOutputFormat)6 Edge (org.apache.flink.graph.Edge)6 SinkPlanNode (org.apache.flink.optimizer.plan.SinkPlanNode)6 NullValue (org.apache.flink.types.NullValue)6 FilterFunction (org.apache.flink.api.common.functions.FilterFunction)5 FieldList (org.apache.flink.api.common.operators.util.FieldList)5 DataSet (org.apache.flink.api.java.DataSet)5 Tuple1 (org.apache.flink.api.java.tuple.Tuple1)5