Examples with SourceTSet - edu.iu.dsc.tws.tset.sets.batch.SourceTSet

Example 21 with SourceTSet

use of edu.iu.dsc.tws.tset.sets.batch.SourceTSet in project twister2 by DSC-SPIDAL.

the class AllReduceExample method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    int start = env.getWorkerID() * 100;
    SourceTSet<Integer> src = dummySource(env, start, COUNT, PARALLELISM);
    LOG.info("test foreach");
    src.allReduce(Integer::sum).forEach(i -> LOG.info("foreach: " + i));
    LOG.info("test map");
    src.allReduce(Integer::sum).map(i -> i.toString() + "$$").direct().forEach(s -> LOG.info("map: " + s));
    LOG.info("test flat map");
    src.allReduce(Integer::sum).flatmap((i, c) -> c.collect(i.toString() + "$$")).direct().forEach(s -> LOG.info("flat:" + s));
    LOG.info("test compute");
    src.allReduce(Integer::sum).compute(i -> i * 2).direct().forEach(i -> LOG.info("comp: " + i));
    LOG.info("test computec");
    src.allReduce(Integer::sum).compute((ComputeCollectorFunc<Integer, String>) (input, output) -> output.collect("sum=" + input)).direct().forEach(s -> LOG.info("computec: " + s));
}

Also used : WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) ComputeCollectorFunc(edu.iu.dsc.tws.api.tset.fn.ComputeCollectorFunc) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) HashMap(java.util.HashMap) Config(edu.iu.dsc.tws.api.config.Config) Logger(java.util.logging.Logger) JobConfig(edu.iu.dsc.tws.api.JobConfig) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment)

Example 22 with SourceTSet

use of edu.iu.dsc.tws.tset.sets.batch.SourceTSet in project twister2 by DSC-SPIDAL.

the class ComputeExample method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    int start = env.getWorkerID() * 100;
    SourceTSet<Integer> src = dummySource(env, start, COUNT, PARALLELISM).setName("src").withSchema(PrimitiveSchemas.INTEGER);
    ComputeTSet<Integer> sum = src.direct().compute((ComputeFunc<Iterator<Integer>, Integer>) input -> {
        int s = 0;
        while (input.hasNext()) {
            s += input.next();
        }
        return s;
    }).withSchema(PrimitiveSchemas.INTEGER).setName("sum");
    sum.direct().forEach(data -> LOG.info("val: " + data));
    sum.reduce(Integer::sum).forEach(i -> LOG.info("red: " + i));
}

Also used : ComputeTSet(edu.iu.dsc.tws.tset.sets.batch.ComputeTSet) Iterator(java.util.Iterator) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) HashMap(java.util.HashMap) Config(edu.iu.dsc.tws.api.config.Config) Logger(java.util.logging.Logger) JobConfig(edu.iu.dsc.tws.api.JobConfig) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) ComputeFunc(edu.iu.dsc.tws.api.tset.fn.ComputeFunc) PrimitiveSchemas(edu.iu.dsc.tws.api.tset.schema.PrimitiveSchemas) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) Iterator(java.util.Iterator)

Example 23 with SourceTSet

use of edu.iu.dsc.tws.tset.sets.batch.SourceTSet in project twister2 by DSC-SPIDAL.

the class FileBasedWordCount method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    int sourcePar = (int) env.getConfig().get("PAR");
    // read the file line by line by using a single worker
    SourceTSet<String> lines = env.createSource(new WordCountFileSource(), 1);
    // distribute the lines among the workers and performs a flatmap operation to extract words
    ComputeTSet<String> words = lines.partition(new HashingPartitioner<>(), sourcePar).flatmap((FlatMapFunc<String, String>) (l, collector) -> {
        StringTokenizer itr = new StringTokenizer(l);
        while (itr.hasMoreTokens()) {
            collector.collect(itr.nextToken());
        }
    });
    // attach count as 1 for each word
    KeyedTSet<String, Integer> groupedWords = words.mapToTuple(w -> new Tuple<>(w, 1));
    // performs reduce by key at each worker
    KeyedReduceTLink<String, Integer> keyedReduce = groupedWords.keyedReduce(Integer::sum);
    // gather the results to worker0 (there is a dummy map op here to pass the values to edges)
    // and write to a file
    keyedReduce.map(i -> i).gather().forEach(new WordcountFileWriter());
}

Also used : Twister2Job(edu.iu.dsc.tws.api.Twister2Job) URL(java.net.URL) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) Options(org.apache.commons.cli.Options) LocalTextInputPartitioner(edu.iu.dsc.tws.data.api.formatters.LocalTextInputPartitioner) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) FlatMapFunc(edu.iu.dsc.tws.api.tset.fn.FlatMapFunc) KeyedTSet(edu.iu.dsc.tws.tset.sets.batch.KeyedTSet) JobConfig(edu.iu.dsc.tws.api.JobConfig) StandardCopyOption(java.nio.file.StandardCopyOption) Level(java.util.logging.Level) DefaultParser(org.apache.commons.cli.DefaultParser) FileInputSplit(edu.iu.dsc.tws.data.api.splits.FileInputSplit) HashingPartitioner(edu.iu.dsc.tws.tset.fn.HashingPartitioner) InputSplit(edu.iu.dsc.tws.data.fs.io.InputSplit) StringTokenizer(java.util.StringTokenizer) Map(java.util.Map) CommandLine(org.apache.commons.cli.CommandLine) DataSource(edu.iu.dsc.tws.dataset.DataSource) TSetContext(edu.iu.dsc.tws.api.tset.TSetContext) BaseApplyFunc(edu.iu.dsc.tws.api.tset.fn.BaseApplyFunc) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) ComputeTSet(edu.iu.dsc.tws.tset.sets.batch.ComputeTSet) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) Files(java.nio.file.Files) CommandLineParser(org.apache.commons.cli.CommandLineParser) BufferedWriter(java.io.BufferedWriter) BaseSourceFunc(edu.iu.dsc.tws.api.tset.fn.BaseSourceFunc) FileWriter(java.io.FileWriter) IOException(java.io.IOException) Logger(java.util.logging.Logger) KeyedReduceTLink(edu.iu.dsc.tws.tset.links.batch.KeyedReduceTLink) File(java.io.File) Serializable(java.io.Serializable) Twister2Submitter(edu.iu.dsc.tws.rsched.job.Twister2Submitter) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) TreeMap(java.util.TreeMap) Paths(java.nio.file.Paths) Path(edu.iu.dsc.tws.api.data.Path) BufferedReader(java.io.BufferedReader) FileReader(java.io.FileReader) Twister2Worker(edu.iu.dsc.tws.api.resource.Twister2Worker) InputStream(java.io.InputStream) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) StringTokenizer(java.util.StringTokenizer) HashingPartitioner(edu.iu.dsc.tws.tset.fn.HashingPartitioner)

Aggregations

WorkerEnvironment (edu.iu.dsc.tws.api.resource.WorkerEnvironment)23 TSetEnvironment (edu.iu.dsc.tws.tset.env.TSetEnvironment)23 SourceTSet (edu.iu.dsc.tws.tset.sets.batch.SourceTSet)23 Logger (java.util.logging.Logger)23 JobConfig (edu.iu.dsc.tws.api.JobConfig)22 BatchEnvironment (edu.iu.dsc.tws.tset.env.BatchEnvironment)22 ResourceAllocator (edu.iu.dsc.tws.rsched.core.ResourceAllocator)20 Config (edu.iu.dsc.tws.api.config.Config)19 HashMap (java.util.HashMap)19 Tuple (edu.iu.dsc.tws.api.comms.structs.Tuple)13 Iterator (java.util.Iterator)13 ComputeFunc (edu.iu.dsc.tws.api.tset.fn.ComputeFunc)12 ComputeCollectorFunc (edu.iu.dsc.tws.api.tset.fn.ComputeCollectorFunc)11 Twister2Job (edu.iu.dsc.tws.api.Twister2Job)6 Twister2Submitter (edu.iu.dsc.tws.rsched.job.Twister2Submitter)6 ComputeTSet (edu.iu.dsc.tws.tset.sets.batch.ComputeTSet)6 SinkTSet (edu.iu.dsc.tws.tset.sets.batch.SinkTSet)6 Twister2Worker (edu.iu.dsc.tws.api.resource.Twister2Worker)5 MapFunc (edu.iu.dsc.tws.api.tset.fn.MapFunc)5 SinkFunc (edu.iu.dsc.tws.api.tset.fn.SinkFunc)5