Search in sources :

Example 1 with HashingPartitioner

use of edu.iu.dsc.tws.tset.fn.HashingPartitioner in project twister2 by DSC-SPIDAL.

the class BTKeyedGatherExample method buildTaskGraph.

@Override
public ComputeGraphBuilder buildTaskGraph() {
    List<Integer> taskStages = jobParameters.getTaskStages();
    int sourceParallelism = taskStages.get(0);
    int sinkParallelism = taskStages.get(1);
    MessageType keyType = MessageTypes.INTEGER;
    MessageType dataType = MessageTypes.INTEGER_ARRAY;
    String edge = "edge";
    BaseSource g = new SourceTask(edge, true);
    ICompute r = new KeyedGatherGroupedSinkTask();
    computeGraphBuilder.addSource(SOURCE, g, sourceParallelism);
    computeConnection = computeGraphBuilder.addCompute(SINK, r, sinkParallelism);
    computeConnection.keyedGather(SOURCE).viaEdge(edge).withKeyType(keyType).withTaskPartitioner(new HashingPartitioner()).withDataType(dataType);
    return computeGraphBuilder;
}
Also used : BaseSource(edu.iu.dsc.tws.api.compute.nodes.BaseSource) HashingPartitioner(edu.iu.dsc.tws.tset.fn.HashingPartitioner) ICompute(edu.iu.dsc.tws.api.compute.nodes.ICompute) MessageType(edu.iu.dsc.tws.api.comms.messaging.types.MessageType)

Example 2 with HashingPartitioner

use of edu.iu.dsc.tws.tset.fn.HashingPartitioner in project twister2 by DSC-SPIDAL.

the class WordCount method execute.

@Override
public void execute(WorkerEnvironment workerEnvironment) {
    StreamingEnvironment cEnv = TSetEnvironment.initStreaming(workerEnvironment);
    // create source and aggregator
    cEnv.createSource(new SourceFunc<String>() {

        // sample words
        private List<String> sampleWords = new ArrayList<>();

        // the random used to pick he words
        private Random random;

        @Override
        public void prepare(TSetContext context) {
            this.random = new Random();
            RandomString randomString = new RandomString(MAX_CHARS, random, RandomString.ALPHANUM);
            for (int i = 0; i < NO_OF_SAMPLE_WORDS; i++) {
                sampleWords.add(randomString.nextRandomSizeString());
            }
        }

        @Override
        public boolean hasNext() {
            return true;
        }

        @Override
        public String next() {
            return sampleWords.get(random.nextInt(sampleWords.size()));
        }
    }, 4).partition(new HashingPartitioner<>()).sink(new SinkFunc<String>() {

        // keep track of the counts
        private Map<String, Integer> counts = new HashMap<>();

        private TSetContext context;

        @Override
        public void prepare(TSetContext context) {
            this.context = context;
        }

        @Override
        public boolean add(String word) {
            int count = 1;
            if (counts.containsKey(word)) {
                count = counts.get(word);
                count++;
            }
            counts.put(word, count);
            LOG.log(Level.INFO, String.format("%d Word %s count %s", context.getIndex(), word, count));
            return true;
        }
    });
    // start executing the streaming graph
    cEnv.run();
}
Also used : HashMap(java.util.HashMap) ArrayList(java.util.ArrayList) RandomString(edu.iu.dsc.tws.examples.utils.RandomString) RandomString(edu.iu.dsc.tws.examples.utils.RandomString) StreamingEnvironment(edu.iu.dsc.tws.tset.env.StreamingEnvironment) TSetContext(edu.iu.dsc.tws.api.tset.TSetContext) Random(java.util.Random) HashingPartitioner(edu.iu.dsc.tws.tset.fn.HashingPartitioner)

Example 3 with HashingPartitioner

use of edu.iu.dsc.tws.tset.fn.HashingPartitioner in project twister2 by DSC-SPIDAL.

the class FileBasedWordCount method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    int sourcePar = (int) env.getConfig().get("PAR");
    // read the file line by line by using a single worker
    SourceTSet<String> lines = env.createSource(new WordCountFileSource(), 1);
    // distribute the lines among the workers and performs a flatmap operation to extract words
    ComputeTSet<String> words = lines.partition(new HashingPartitioner<>(), sourcePar).flatmap((FlatMapFunc<String, String>) (l, collector) -> {
        StringTokenizer itr = new StringTokenizer(l);
        while (itr.hasMoreTokens()) {
            collector.collect(itr.nextToken());
        }
    });
    // attach count as 1 for each word
    KeyedTSet<String, Integer> groupedWords = words.mapToTuple(w -> new Tuple<>(w, 1));
    // performs reduce by key at each worker
    KeyedReduceTLink<String, Integer> keyedReduce = groupedWords.keyedReduce(Integer::sum);
    // gather the results to worker0 (there is a dummy map op here to pass the values to edges)
    // and write to a file
    keyedReduce.map(i -> i).gather().forEach(new WordcountFileWriter());
}
Also used : Twister2Job(edu.iu.dsc.tws.api.Twister2Job) URL(java.net.URL) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) Options(org.apache.commons.cli.Options) LocalTextInputPartitioner(edu.iu.dsc.tws.data.api.formatters.LocalTextInputPartitioner) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) FlatMapFunc(edu.iu.dsc.tws.api.tset.fn.FlatMapFunc) KeyedTSet(edu.iu.dsc.tws.tset.sets.batch.KeyedTSet) JobConfig(edu.iu.dsc.tws.api.JobConfig) StandardCopyOption(java.nio.file.StandardCopyOption) Level(java.util.logging.Level) DefaultParser(org.apache.commons.cli.DefaultParser) FileInputSplit(edu.iu.dsc.tws.data.api.splits.FileInputSplit) HashingPartitioner(edu.iu.dsc.tws.tset.fn.HashingPartitioner) InputSplit(edu.iu.dsc.tws.data.fs.io.InputSplit) StringTokenizer(java.util.StringTokenizer) Map(java.util.Map) CommandLine(org.apache.commons.cli.CommandLine) DataSource(edu.iu.dsc.tws.dataset.DataSource) TSetContext(edu.iu.dsc.tws.api.tset.TSetContext) BaseApplyFunc(edu.iu.dsc.tws.api.tset.fn.BaseApplyFunc) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) ComputeTSet(edu.iu.dsc.tws.tset.sets.batch.ComputeTSet) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) Files(java.nio.file.Files) CommandLineParser(org.apache.commons.cli.CommandLineParser) BufferedWriter(java.io.BufferedWriter) BaseSourceFunc(edu.iu.dsc.tws.api.tset.fn.BaseSourceFunc) FileWriter(java.io.FileWriter) IOException(java.io.IOException) Logger(java.util.logging.Logger) KeyedReduceTLink(edu.iu.dsc.tws.tset.links.batch.KeyedReduceTLink) File(java.io.File) Serializable(java.io.Serializable) Twister2Submitter(edu.iu.dsc.tws.rsched.job.Twister2Submitter) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) TreeMap(java.util.TreeMap) Paths(java.nio.file.Paths) Path(edu.iu.dsc.tws.api.data.Path) BufferedReader(java.io.BufferedReader) FileReader(java.io.FileReader) Twister2Worker(edu.iu.dsc.tws.api.resource.Twister2Worker) InputStream(java.io.InputStream) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) StringTokenizer(java.util.StringTokenizer) HashingPartitioner(edu.iu.dsc.tws.tset.fn.HashingPartitioner)

Aggregations

HashingPartitioner (edu.iu.dsc.tws.tset.fn.HashingPartitioner)3 TSetContext (edu.iu.dsc.tws.api.tset.TSetContext)2 JobConfig (edu.iu.dsc.tws.api.JobConfig)1 Twister2Job (edu.iu.dsc.tws.api.Twister2Job)1 MessageType (edu.iu.dsc.tws.api.comms.messaging.types.MessageType)1 Tuple (edu.iu.dsc.tws.api.comms.structs.Tuple)1 BaseSource (edu.iu.dsc.tws.api.compute.nodes.BaseSource)1 ICompute (edu.iu.dsc.tws.api.compute.nodes.ICompute)1 Path (edu.iu.dsc.tws.api.data.Path)1 Twister2Worker (edu.iu.dsc.tws.api.resource.Twister2Worker)1 WorkerEnvironment (edu.iu.dsc.tws.api.resource.WorkerEnvironment)1 BaseApplyFunc (edu.iu.dsc.tws.api.tset.fn.BaseApplyFunc)1 BaseSourceFunc (edu.iu.dsc.tws.api.tset.fn.BaseSourceFunc)1 FlatMapFunc (edu.iu.dsc.tws.api.tset.fn.FlatMapFunc)1 LocalTextInputPartitioner (edu.iu.dsc.tws.data.api.formatters.LocalTextInputPartitioner)1 FileInputSplit (edu.iu.dsc.tws.data.api.splits.FileInputSplit)1 InputSplit (edu.iu.dsc.tws.data.fs.io.InputSplit)1 DataSource (edu.iu.dsc.tws.dataset.DataSource)1 RandomString (edu.iu.dsc.tws.examples.utils.RandomString)1 ResourceAllocator (edu.iu.dsc.tws.rsched.core.ResourceAllocator)1