Search in sources :

Example 1 with HadoopInputFormat

use of org.apache.flink.api.java.hadoop.mapred.HadoopInputFormat in project flink by apache.

the class HadoopMapredCompatWordCount method main.

public static void main(String[] args) throws Exception {
    if (args.length < 2) {
        System.err.println("Usage: WordCount <input path> <result path>");
        return;
    }
    final String inputPath = args[0];
    final String outputPath = args[1];
    final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    // Set up the Hadoop Input Format
    HadoopInputFormat<LongWritable, Text> hadoopInputFormat = new HadoopInputFormat<LongWritable, Text>(new TextInputFormat(), LongWritable.class, Text.class, new JobConf());
    TextInputFormat.addInputPath(hadoopInputFormat.getJobConf(), new Path(inputPath));
    // Create a Flink job with it
    DataSet<Tuple2<LongWritable, Text>> text = env.createInput(hadoopInputFormat);
    DataSet<Tuple2<Text, LongWritable>> words = text.flatMap(new HadoopMapFunction<LongWritable, Text, Text, LongWritable>(new Tokenizer())).groupBy(0).reduceGroup(new HadoopReduceCombineFunction<Text, LongWritable, Text, LongWritable>(new Counter(), new Counter()));
    // Set up Hadoop Output Format
    HadoopOutputFormat<Text, LongWritable> hadoopOutputFormat = new HadoopOutputFormat<Text, LongWritable>(new TextOutputFormat<Text, LongWritable>(), new JobConf());
    hadoopOutputFormat.getJobConf().set("mapred.textoutputformat.separator", " ");
    TextOutputFormat.setOutputPath(hadoopOutputFormat.getJobConf(), new Path(outputPath));
    // Output & Execute
    words.output(hadoopOutputFormat).setParallelism(1);
    env.execute("Hadoop Compat WordCount");
}
Also used : Path(org.apache.hadoop.fs.Path) ExecutionEnvironment(org.apache.flink.api.java.ExecutionEnvironment) Text(org.apache.hadoop.io.Text) HadoopOutputFormat(org.apache.flink.api.java.hadoop.mapred.HadoopOutputFormat) TextInputFormat(org.apache.hadoop.mapred.TextInputFormat) HadoopInputFormat(org.apache.flink.api.java.hadoop.mapred.HadoopInputFormat) Tuple2(org.apache.flink.api.java.tuple.Tuple2) LongWritable(org.apache.hadoop.io.LongWritable) JobConf(org.apache.hadoop.mapred.JobConf)

Aggregations

ExecutionEnvironment (org.apache.flink.api.java.ExecutionEnvironment)1 HadoopInputFormat (org.apache.flink.api.java.hadoop.mapred.HadoopInputFormat)1 HadoopOutputFormat (org.apache.flink.api.java.hadoop.mapred.HadoopOutputFormat)1 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)1 Path (org.apache.hadoop.fs.Path)1 LongWritable (org.apache.hadoop.io.LongWritable)1 Text (org.apache.hadoop.io.Text)1 JobConf (org.apache.hadoop.mapred.JobConf)1 TextInputFormat (org.apache.hadoop.mapred.TextInputFormat)1