Search in sources :

Example 1 with WordCountFileSpout

use of org.apache.flink.storm.wordcount.operators.WordCountFileSpout in project flink by apache.

the class SpoutSourceWordCount method getTextDataStream.

private static DataStream<String> getTextDataStream(final StreamExecutionEnvironment env) {
    if (fileOutput) {
        // read the text file from given input path
        final String[] tokens = textPath.split(":");
        final String localFile = tokens[tokens.length - 1];
        return env.addSource(new SpoutWrapper<String>(new WordCountFileSpout(localFile), new String[] { Utils.DEFAULT_STREAM_ID }, -1), TypeExtractor.getForClass(String.class)).setParallelism(1);
    }
    return env.addSource(new SpoutWrapper<String>(new WordCountInMemorySpout(), new String[] { Utils.DEFAULT_STREAM_ID }, -1), TypeExtractor.getForClass(String.class)).setParallelism(1);
}
Also used : WordCountFileSpout(org.apache.flink.storm.wordcount.operators.WordCountFileSpout) SpoutWrapper(org.apache.flink.storm.wrappers.SpoutWrapper) WordCountInMemorySpout(org.apache.flink.storm.wordcount.operators.WordCountInMemorySpout)

Example 2 with WordCountFileSpout

use of org.apache.flink.storm.wordcount.operators.WordCountFileSpout in project flink by apache.

the class WordCountTopology method buildTopology.

public static TopologyBuilder buildTopology(boolean indexOrName) {
    final TopologyBuilder builder = new TopologyBuilder();
    // get input data
    if (fileInputOutput) {
        // read the text file from given input path
        final String[] tokens = textPath.split(":");
        final String inputFile = tokens[tokens.length - 1];
        // inserting NullTerminatingSpout only required to stabilize integration test
        builder.setSpout(spoutId, new NullTerminatingSpout(new WordCountFileSpout(inputFile)));
    } else {
        builder.setSpout(spoutId, new WordCountInMemorySpout());
    }
    if (indexOrName) {
        // split up the lines in pairs (2-tuples) containing: (word,1)
        builder.setBolt(tokenierzerId, new BoltTokenizer(), 4).shuffleGrouping(spoutId);
        // group by the tuple field "0" and sum up tuple field "1"
        builder.setBolt(counterId, new BoltCounter(), 4).fieldsGrouping(tokenierzerId, new Fields(BoltTokenizer.ATTRIBUTE_WORD));
    } else {
        // split up the lines in pairs (2-tuples) containing: (word,1)
        builder.setBolt(tokenierzerId, new BoltTokenizerByName(), 4).shuffleGrouping(spoutId);
        // group by the tuple field "0" and sum up tuple field "1"
        builder.setBolt(counterId, new BoltCounterByName(), 4).fieldsGrouping(tokenierzerId, new Fields(BoltTokenizerByName.ATTRIBUTE_WORD));
    }
    // emit result
    if (fileInputOutput) {
        // read the text file from given input path
        final String[] tokens = outputPath.split(":");
        final String outputFile = tokens[tokens.length - 1];
        builder.setBolt(sinkId, new BoltFileSink(outputFile, formatter)).shuffleGrouping(counterId);
    } else {
        builder.setBolt(sinkId, new BoltPrintSink(formatter), 4).shuffleGrouping(counterId);
    }
    return builder;
}
Also used : WordCountFileSpout(org.apache.flink.storm.wordcount.operators.WordCountFileSpout) BoltCounter(org.apache.flink.storm.wordcount.operators.BoltCounter) BoltCounterByName(org.apache.flink.storm.wordcount.operators.BoltCounterByName) TopologyBuilder(org.apache.storm.topology.TopologyBuilder) WordCountInMemorySpout(org.apache.flink.storm.wordcount.operators.WordCountInMemorySpout) BoltPrintSink(org.apache.flink.storm.util.BoltPrintSink) NullTerminatingSpout(org.apache.flink.storm.util.NullTerminatingSpout) BoltFileSink(org.apache.flink.storm.util.BoltFileSink) Fields(org.apache.storm.tuple.Fields) BoltTokenizer(org.apache.flink.storm.wordcount.operators.BoltTokenizer) BoltTokenizerByName(org.apache.flink.storm.wordcount.operators.BoltTokenizerByName)

Aggregations

WordCountFileSpout (org.apache.flink.storm.wordcount.operators.WordCountFileSpout)2 WordCountInMemorySpout (org.apache.flink.storm.wordcount.operators.WordCountInMemorySpout)2 BoltFileSink (org.apache.flink.storm.util.BoltFileSink)1 BoltPrintSink (org.apache.flink.storm.util.BoltPrintSink)1 NullTerminatingSpout (org.apache.flink.storm.util.NullTerminatingSpout)1 BoltCounter (org.apache.flink.storm.wordcount.operators.BoltCounter)1 BoltCounterByName (org.apache.flink.storm.wordcount.operators.BoltCounterByName)1 BoltTokenizer (org.apache.flink.storm.wordcount.operators.BoltTokenizer)1 BoltTokenizerByName (org.apache.flink.storm.wordcount.operators.BoltTokenizerByName)1 SpoutWrapper (org.apache.flink.storm.wrappers.SpoutWrapper)1 TopologyBuilder (org.apache.storm.topology.TopologyBuilder)1 Fields (org.apache.storm.tuple.Fields)1