Search in sources :

Example 1 with KeyedTSet

use of edu.iu.dsc.tws.tset.sets.batch.KeyedTSet in project twister2 by DSC-SPIDAL.

the class BranchingExample method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    int para = 2;
    SourceTSet<Integer> src = dummySource(env, COUNT, para).setName("src0");
    KeyedTSet<Integer, Integer> left = src.mapToTuple(i -> new Tuple<>(i % 2, i)).setName("left");
    KeyedTSet<Integer, Integer> right = src.mapToTuple(i -> new Tuple<>(i % 2, i + 1)).setName("right");
    JoinTLink<Integer, Integer, Integer> join = left.join(right, CommunicationContext.JoinType.INNER, Integer::compareTo).setName("join");
    ComputeTSet<String> map = join.map(t -> "(" + t.getKey() + " " + t.getLeftValue() + " " + t.getRightValue() + ")").setName("map***");
    ComputeTSet<String> map1 = map.direct().map(s -> "###" + s).setName("map@@");
    ComputeTSet<String> union = map.union(map1).setName("union");
    union.direct().forEach(s -> LOG.info(s));
}
Also used : CommunicationContext(edu.iu.dsc.tws.api.comms.CommunicationContext) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) ComputeTSet(edu.iu.dsc.tws.tset.sets.batch.ComputeTSet) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) HashMap(java.util.HashMap) Config(edu.iu.dsc.tws.api.config.Config) Logger(java.util.logging.Logger) KeyedTSet(edu.iu.dsc.tws.tset.sets.batch.KeyedTSet) JobConfig(edu.iu.dsc.tws.api.JobConfig) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) JoinTLink(edu.iu.dsc.tws.tset.links.batch.JoinTLink) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple)

Example 2 with KeyedTSet

use of edu.iu.dsc.tws.tset.sets.batch.KeyedTSet in project twister2 by DSC-SPIDAL.

the class FileBasedWordCount method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    int sourcePar = (int) env.getConfig().get("PAR");
    // read the file line by line by using a single worker
    SourceTSet<String> lines = env.createSource(new WordCountFileSource(), 1);
    // distribute the lines among the workers and performs a flatmap operation to extract words
    ComputeTSet<String> words = lines.partition(new HashingPartitioner<>(), sourcePar).flatmap((FlatMapFunc<String, String>) (l, collector) -> {
        StringTokenizer itr = new StringTokenizer(l);
        while (itr.hasMoreTokens()) {
            collector.collect(itr.nextToken());
        }
    });
    // attach count as 1 for each word
    KeyedTSet<String, Integer> groupedWords = words.mapToTuple(w -> new Tuple<>(w, 1));
    // performs reduce by key at each worker
    KeyedReduceTLink<String, Integer> keyedReduce = groupedWords.keyedReduce(Integer::sum);
    // gather the results to worker0 (there is a dummy map op here to pass the values to edges)
    // and write to a file
    keyedReduce.map(i -> i).gather().forEach(new WordcountFileWriter());
}
Also used : Twister2Job(edu.iu.dsc.tws.api.Twister2Job) URL(java.net.URL) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) Options(org.apache.commons.cli.Options) LocalTextInputPartitioner(edu.iu.dsc.tws.data.api.formatters.LocalTextInputPartitioner) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) FlatMapFunc(edu.iu.dsc.tws.api.tset.fn.FlatMapFunc) KeyedTSet(edu.iu.dsc.tws.tset.sets.batch.KeyedTSet) JobConfig(edu.iu.dsc.tws.api.JobConfig) StandardCopyOption(java.nio.file.StandardCopyOption) Level(java.util.logging.Level) DefaultParser(org.apache.commons.cli.DefaultParser) FileInputSplit(edu.iu.dsc.tws.data.api.splits.FileInputSplit) HashingPartitioner(edu.iu.dsc.tws.tset.fn.HashingPartitioner) InputSplit(edu.iu.dsc.tws.data.fs.io.InputSplit) StringTokenizer(java.util.StringTokenizer) Map(java.util.Map) CommandLine(org.apache.commons.cli.CommandLine) DataSource(edu.iu.dsc.tws.dataset.DataSource) TSetContext(edu.iu.dsc.tws.api.tset.TSetContext) BaseApplyFunc(edu.iu.dsc.tws.api.tset.fn.BaseApplyFunc) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) ComputeTSet(edu.iu.dsc.tws.tset.sets.batch.ComputeTSet) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) Files(java.nio.file.Files) CommandLineParser(org.apache.commons.cli.CommandLineParser) BufferedWriter(java.io.BufferedWriter) BaseSourceFunc(edu.iu.dsc.tws.api.tset.fn.BaseSourceFunc) FileWriter(java.io.FileWriter) IOException(java.io.IOException) Logger(java.util.logging.Logger) KeyedReduceTLink(edu.iu.dsc.tws.tset.links.batch.KeyedReduceTLink) File(java.io.File) Serializable(java.io.Serializable) Twister2Submitter(edu.iu.dsc.tws.rsched.job.Twister2Submitter) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) TreeMap(java.util.TreeMap) Paths(java.nio.file.Paths) Path(edu.iu.dsc.tws.api.data.Path) BufferedReader(java.io.BufferedReader) FileReader(java.io.FileReader) Twister2Worker(edu.iu.dsc.tws.api.resource.Twister2Worker) InputStream(java.io.InputStream) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) StringTokenizer(java.util.StringTokenizer) HashingPartitioner(edu.iu.dsc.tws.tset.fn.HashingPartitioner)

Example 3 with KeyedTSet

use of edu.iu.dsc.tws.tset.sets.batch.KeyedTSet in project twister2 by DSC-SPIDAL.

the class JoinExample method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    int para = 2;
    int workerID = env.getWorkerID();
    SourceTSet<Integer> src0 = dummySource(env, COUNT, para).setName("src0");
    KeyedTSet<Integer, Integer> left = src0.mapToTuple(i -> new Tuple<>(i % 2, i)).setName("left");
    left.keyedDirect().forEach(i -> LOG.info(workerID + "L " + i.toString()));
    SourceTSet<Integer> src1 = dummySource(env, COUNT, para).setName("src1");
    KeyedTSet<Integer, Integer> right = src1.mapToTuple(i -> new Tuple<>(i % 2, i)).setName("right");
    right.keyedDirect().forEach(i -> LOG.info(workerID + "R " + i.toString()));
    JoinTLink<Integer, Integer, Integer> join = left.join(right, CommunicationContext.JoinType.INNER, Integer::compareTo).setName("join");
    join.forEach(t -> LOG.info(workerID + "out: " + t.toString()));
}
Also used : CommunicationContext(edu.iu.dsc.tws.api.comms.CommunicationContext) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) HashMap(java.util.HashMap) Config(edu.iu.dsc.tws.api.config.Config) Logger(java.util.logging.Logger) KeyedTSet(edu.iu.dsc.tws.tset.sets.batch.KeyedTSet) JobConfig(edu.iu.dsc.tws.api.JobConfig) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) JoinTLink(edu.iu.dsc.tws.tset.links.batch.JoinTLink) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple)

Aggregations

JobConfig (edu.iu.dsc.tws.api.JobConfig)3 Tuple (edu.iu.dsc.tws.api.comms.structs.Tuple)3 WorkerEnvironment (edu.iu.dsc.tws.api.resource.WorkerEnvironment)3 ResourceAllocator (edu.iu.dsc.tws.rsched.core.ResourceAllocator)3 BatchEnvironment (edu.iu.dsc.tws.tset.env.BatchEnvironment)3 TSetEnvironment (edu.iu.dsc.tws.tset.env.TSetEnvironment)3 KeyedTSet (edu.iu.dsc.tws.tset.sets.batch.KeyedTSet)3 SourceTSet (edu.iu.dsc.tws.tset.sets.batch.SourceTSet)3 Logger (java.util.logging.Logger)3 CommunicationContext (edu.iu.dsc.tws.api.comms.CommunicationContext)2 Config (edu.iu.dsc.tws.api.config.Config)2 JoinTLink (edu.iu.dsc.tws.tset.links.batch.JoinTLink)2 ComputeTSet (edu.iu.dsc.tws.tset.sets.batch.ComputeTSet)2 HashMap (java.util.HashMap)2 Twister2Job (edu.iu.dsc.tws.api.Twister2Job)1 Path (edu.iu.dsc.tws.api.data.Path)1 Twister2Worker (edu.iu.dsc.tws.api.resource.Twister2Worker)1 TSetContext (edu.iu.dsc.tws.api.tset.TSetContext)1 BaseApplyFunc (edu.iu.dsc.tws.api.tset.fn.BaseApplyFunc)1 BaseSourceFunc (edu.iu.dsc.tws.api.tset.fn.BaseSourceFunc)1