Search in sources :

Example 1 with FlatMapFunc

use of edu.iu.dsc.tws.api.tset.fn.FlatMapFunc in project twister2 by DSC-SPIDAL.

the class FullGraphRunExample method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    SourceTSet<Integer> src = dummySource(env, COUNT, PARALLELISM);
    src.direct().flatmap((FlatMapFunc<Integer, Object>) (integer, collector) -> LOG.info("dir= " + integer));
    src.reduce(Integer::sum).flatmap((FlatMapFunc<Integer, Object>) (integer, collector) -> LOG.info("red= " + integer));
// env.run();
}
Also used : WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) HashMap(java.util.HashMap) Config(edu.iu.dsc.tws.api.config.Config) FlatMapFunc(edu.iu.dsc.tws.api.tset.fn.FlatMapFunc) Logger(java.util.logging.Logger) JobConfig(edu.iu.dsc.tws.api.JobConfig) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment)

Example 2 with FlatMapFunc

use of edu.iu.dsc.tws.api.tset.fn.FlatMapFunc in project twister2 by DSC-SPIDAL.

the class FileBasedWordCount method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    int sourcePar = (int) env.getConfig().get("PAR");
    // read the file line by line by using a single worker
    SourceTSet<String> lines = env.createSource(new WordCountFileSource(), 1);
    // distribute the lines among the workers and performs a flatmap operation to extract words
    ComputeTSet<String> words = lines.partition(new HashingPartitioner<>(), sourcePar).flatmap((FlatMapFunc<String, String>) (l, collector) -> {
        StringTokenizer itr = new StringTokenizer(l);
        while (itr.hasMoreTokens()) {
            collector.collect(itr.nextToken());
        }
    });
    // attach count as 1 for each word
    KeyedTSet<String, Integer> groupedWords = words.mapToTuple(w -> new Tuple<>(w, 1));
    // performs reduce by key at each worker
    KeyedReduceTLink<String, Integer> keyedReduce = groupedWords.keyedReduce(Integer::sum);
    // gather the results to worker0 (there is a dummy map op here to pass the values to edges)
    // and write to a file
    keyedReduce.map(i -> i).gather().forEach(new WordcountFileWriter());
}
Also used : Twister2Job(edu.iu.dsc.tws.api.Twister2Job) URL(java.net.URL) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) Options(org.apache.commons.cli.Options) LocalTextInputPartitioner(edu.iu.dsc.tws.data.api.formatters.LocalTextInputPartitioner) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) FlatMapFunc(edu.iu.dsc.tws.api.tset.fn.FlatMapFunc) KeyedTSet(edu.iu.dsc.tws.tset.sets.batch.KeyedTSet) JobConfig(edu.iu.dsc.tws.api.JobConfig) StandardCopyOption(java.nio.file.StandardCopyOption) Level(java.util.logging.Level) DefaultParser(org.apache.commons.cli.DefaultParser) FileInputSplit(edu.iu.dsc.tws.data.api.splits.FileInputSplit) HashingPartitioner(edu.iu.dsc.tws.tset.fn.HashingPartitioner) InputSplit(edu.iu.dsc.tws.data.fs.io.InputSplit) StringTokenizer(java.util.StringTokenizer) Map(java.util.Map) CommandLine(org.apache.commons.cli.CommandLine) DataSource(edu.iu.dsc.tws.dataset.DataSource) TSetContext(edu.iu.dsc.tws.api.tset.TSetContext) BaseApplyFunc(edu.iu.dsc.tws.api.tset.fn.BaseApplyFunc) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) ComputeTSet(edu.iu.dsc.tws.tset.sets.batch.ComputeTSet) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) Files(java.nio.file.Files) CommandLineParser(org.apache.commons.cli.CommandLineParser) BufferedWriter(java.io.BufferedWriter) BaseSourceFunc(edu.iu.dsc.tws.api.tset.fn.BaseSourceFunc) FileWriter(java.io.FileWriter) IOException(java.io.IOException) Logger(java.util.logging.Logger) KeyedReduceTLink(edu.iu.dsc.tws.tset.links.batch.KeyedReduceTLink) File(java.io.File) Serializable(java.io.Serializable) Twister2Submitter(edu.iu.dsc.tws.rsched.job.Twister2Submitter) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) TreeMap(java.util.TreeMap) Paths(java.nio.file.Paths) Path(edu.iu.dsc.tws.api.data.Path) BufferedReader(java.io.BufferedReader) FileReader(java.io.FileReader) Twister2Worker(edu.iu.dsc.tws.api.resource.Twister2Worker) InputStream(java.io.InputStream) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) StringTokenizer(java.util.StringTokenizer) HashingPartitioner(edu.iu.dsc.tws.tset.fn.HashingPartitioner)

Aggregations

JobConfig (edu.iu.dsc.tws.api.JobConfig)2 WorkerEnvironment (edu.iu.dsc.tws.api.resource.WorkerEnvironment)2 FlatMapFunc (edu.iu.dsc.tws.api.tset.fn.FlatMapFunc)2 ResourceAllocator (edu.iu.dsc.tws.rsched.core.ResourceAllocator)2 BatchEnvironment (edu.iu.dsc.tws.tset.env.BatchEnvironment)2 TSetEnvironment (edu.iu.dsc.tws.tset.env.TSetEnvironment)2 SourceTSet (edu.iu.dsc.tws.tset.sets.batch.SourceTSet)2 Logger (java.util.logging.Logger)2 Twister2Job (edu.iu.dsc.tws.api.Twister2Job)1 Tuple (edu.iu.dsc.tws.api.comms.structs.Tuple)1 Config (edu.iu.dsc.tws.api.config.Config)1 Path (edu.iu.dsc.tws.api.data.Path)1 Twister2Worker (edu.iu.dsc.tws.api.resource.Twister2Worker)1 TSetContext (edu.iu.dsc.tws.api.tset.TSetContext)1 BaseApplyFunc (edu.iu.dsc.tws.api.tset.fn.BaseApplyFunc)1 BaseSourceFunc (edu.iu.dsc.tws.api.tset.fn.BaseSourceFunc)1 LocalTextInputPartitioner (edu.iu.dsc.tws.data.api.formatters.LocalTextInputPartitioner)1 FileInputSplit (edu.iu.dsc.tws.data.api.splits.FileInputSplit)1 InputSplit (edu.iu.dsc.tws.data.fs.io.InputSplit)1 DataSource (edu.iu.dsc.tws.dataset.DataSource)1