Search in sources :

Example 1 with SinkTSet

use of edu.iu.dsc.tws.tset.sets.batch.SinkTSet in project twister2 by DSC-SPIDAL.

the class ArrowTSetSourceExample method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    Config config = env.getConfig();
    String csvInputDirectory = config.getStringValue(DataObjectConstants.DINPUT_DIRECTORY);
    String arrowInputDirectory = config.getStringValue(DataObjectConstants.ARROW_DIRECTORY);
    String arrowFileName = config.getStringValue(DataObjectConstants.FILE_NAME);
    int workers = config.getIntegerValue(DataObjectConstants.WORKERS);
    int parallel = config.getIntegerValue(DataObjectConstants.PARALLELISM_VALUE);
    int dsize = config.getIntegerValue(DataObjectConstants.DSIZE);
    LOG.info("arrow input file:" + arrowFileName + "\t" + arrowInputDirectory + "\t" + csvInputDirectory + "\t" + workers + "\t" + parallel);
    Schema schema = makeSchema();
    SourceTSet<String[]> csvSource = env.createCSVSource(csvInputDirectory, dsize, parallel, "split");
    SinkTSet<Iterator<Integer>> sinkTSet = csvSource.direct().map((MapFunc<String[], Integer>) input -> Integer.parseInt(input[0])).direct().sink(new ArrowBasedSinkFunction<>(arrowInputDirectory, arrowFileName, schema.toJson()));
    env.run(sinkTSet);
    // Source Function Call
    env.createArrowSource(arrowInputDirectory, arrowFileName, parallel, schema.toJson()).direct().compute((ComputeFunc<Iterator<Object>, List<Integer>>) input -> {
        List<Integer> integers = new ArrayList<>();
        input.forEachRemaining(i -> integers.add((Integer) i));
        return integers;
    }).direct().forEach(s -> LOG.info("Integer Array Size:" + s.size() + "\tvalues:" + s));
}
Also used : Twister2Job(edu.iu.dsc.tws.api.Twister2Job) ArrowBasedSinkFunction(edu.iu.dsc.tws.tset.fn.impl.ArrowBasedSinkFunction) Schema(org.apache.arrow.vector.types.pojo.Schema) ArrowType(org.apache.arrow.vector.types.pojo.ArrowType) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) Options(org.apache.commons.cli.Options) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) HashMap(java.util.HashMap) Config(edu.iu.dsc.tws.api.config.Config) MapFunc(edu.iu.dsc.tws.api.tset.fn.MapFunc) JobConfig(edu.iu.dsc.tws.api.JobConfig) ArrayList(java.util.ArrayList) Level(java.util.logging.Level) DefaultParser(org.apache.commons.cli.DefaultParser) ImmutableList(com.google.common.collect.ImmutableList) CommandLine(org.apache.commons.cli.CommandLine) Iterator(java.util.Iterator) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) CommandLineParser(org.apache.commons.cli.CommandLineParser) FieldType(org.apache.arrow.vector.types.pojo.FieldType) SinkTSet(edu.iu.dsc.tws.tset.sets.batch.SinkTSet) Field(org.apache.arrow.vector.types.pojo.Field) Logger(java.util.logging.Logger) Utils(edu.iu.dsc.tws.examples.Utils) DataObjectConstants(edu.iu.dsc.tws.data.utils.DataObjectConstants) Serializable(java.io.Serializable) Twister2Submitter(edu.iu.dsc.tws.rsched.job.Twister2Submitter) List(java.util.List) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) ComputeFunc(edu.iu.dsc.tws.api.tset.fn.ComputeFunc) Twister2Worker(edu.iu.dsc.tws.api.resource.Twister2Worker) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) Config(edu.iu.dsc.tws.api.config.Config) JobConfig(edu.iu.dsc.tws.api.JobConfig) Schema(org.apache.arrow.vector.types.pojo.Schema) Iterator(java.util.Iterator) ArrayList(java.util.ArrayList) ImmutableList(com.google.common.collect.ImmutableList) List(java.util.List)

Example 2 with SinkTSet

use of edu.iu.dsc.tws.tset.sets.batch.SinkTSet in project twister2 by DSC-SPIDAL.

the class ReduceExample method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    int start = env.getWorkerID() * 100;
    SourceTSet<Integer> src = dummySource(env, start, COUNT, PARALLELISM);
    ReduceTLink<Integer> reduce = src.reduce(Integer::sum);
    LOG.info("test foreach");
    reduce.forEach(i -> LOG.info("foreach: " + i));
    LOG.info("test map");
    reduce.map(i -> i.toString() + "$$").withSchema(PrimitiveSchemas.STRING).direct().forEach(s -> LOG.info("map: " + s));
    LOG.info("test flat map");
    reduce.flatmap((i, c) -> c.collect(i.toString() + "##")).withSchema(PrimitiveSchemas.STRING).direct().forEach(s -> LOG.info("flat:" + s));
    LOG.info("test compute");
    reduce.compute((ComputeFunc<Integer, String>) input -> "sum=" + input).withSchema(PrimitiveSchemas.STRING).direct().forEach(s -> LOG.info("compute: " + s));
    LOG.info("test computec");
    reduce.compute((ComputeCollectorFunc<Integer, String>) (input, output) -> output.collect("sum=" + input)).withSchema(PrimitiveSchemas.STRING).direct().forEach(s -> LOG.info("computec: " + s));
    LOG.info("test map2tup");
    reduce.mapToTuple(i -> new Tuple<>(i, i.toString())).keyedDirect().forEach(i -> LOG.info("mapToTuple: " + i.toString()));
    LOG.info("test sink");
    SinkTSet<Integer> sink = reduce.sink((SinkFunc<Integer>) value -> {
        LOG.info("val =" + value);
        return true;
    });
    env.run(sink);
}
Also used : Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) ComputeCollectorFunc(edu.iu.dsc.tws.api.tset.fn.ComputeCollectorFunc) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) SinkTSet(edu.iu.dsc.tws.tset.sets.batch.SinkTSet) HashMap(java.util.HashMap) Config(edu.iu.dsc.tws.api.config.Config) Logger(java.util.logging.Logger) JobConfig(edu.iu.dsc.tws.api.JobConfig) SinkFunc(edu.iu.dsc.tws.api.tset.fn.SinkFunc) ReduceTLink(edu.iu.dsc.tws.tset.links.batch.ReduceTLink) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) ComputeFunc(edu.iu.dsc.tws.api.tset.fn.ComputeFunc) PrimitiveSchemas(edu.iu.dsc.tws.api.tset.schema.PrimitiveSchemas) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) ComputeCollectorFunc(edu.iu.dsc.tws.api.tset.fn.ComputeCollectorFunc) ComputeFunc(edu.iu.dsc.tws.api.tset.fn.ComputeFunc)

Example 3 with SinkTSet

use of edu.iu.dsc.tws.tset.sets.batch.SinkTSet in project twister2 by DSC-SPIDAL.

the class TSetGatherExample method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    super.execute(workerEnv);
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    // set the parallelism of source to task stage 0
    int srcPara = jobParameters.getTaskStages().get(0);
    int sinkPara = jobParameters.getTaskStages().get(1);
    SourceTSet<int[]> source = env.createSource(new TestBaseSource(), srcPara).setName("Source");
    GatherTLink<int[]> gather = source.gather();
    SinkTSet<Iterator<Tuple<Integer, int[]>>> sink = gather.sink((SinkFunc<Iterator<Tuple<Integer, int[]>>>) val -> {
        int[] value = null;
        while (val.hasNext()) {
            value = val.next().getValue();
        }
        experimentData.setOutput(value);
        LOG.info("Results " + Arrays.toString(value));
        try {
            verify(OperationNames.GATHER);
        } catch (VerificationException e) {
            LOG.info("Exception Message : " + e.getMessage());
        }
        return true;
    });
    env.run(sink);
}
Also used : Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) Arrays(java.util.Arrays) Iterator(java.util.Iterator) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) VerificationException(edu.iu.dsc.tws.examples.verification.VerificationException) GatherTLink(edu.iu.dsc.tws.tset.links.batch.GatherTLink) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) SinkTSet(edu.iu.dsc.tws.tset.sets.batch.SinkTSet) OperationNames(edu.iu.dsc.tws.api.compute.OperationNames) BaseTSetBatchWorker(edu.iu.dsc.tws.examples.tset.BaseTSetBatchWorker) Logger(java.util.logging.Logger) SinkFunc(edu.iu.dsc.tws.api.tset.fn.SinkFunc) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) Iterator(java.util.Iterator) VerificationException(edu.iu.dsc.tws.examples.verification.VerificationException)

Example 4 with SinkTSet

use of edu.iu.dsc.tws.tset.sets.batch.SinkTSet in project twister2 by DSC-SPIDAL.

the class Twister2TranslationContext method execute.

public void execute() {
    Map<String, CachedTSet> sideInputTSets = new HashMap<>();
    for (Map.Entry<String, BatchTSet<?>> sides : sideInputDataSets.entrySet()) {
        CachedTSet tempCache = (CachedTSet) sides.getValue().cache();
        sideInputTSets.put(sides.getKey(), tempCache);
    }
    for (TSet leaf : leaves) {
        SinkTSet sinkTSet = (SinkTSet) leaf.direct().sink(new Twister2SinkFunction());
        addInputs(sinkTSet, sideInputTSets);
        eval(sinkTSet);
    }
}
Also used : SinkTSet(edu.iu.dsc.tws.tset.sets.batch.SinkTSet) BatchTSet(edu.iu.dsc.tws.api.tset.sets.batch.BatchTSet) CachedTSet(edu.iu.dsc.tws.tset.sets.batch.CachedTSet) HashMap(java.util.HashMap) LinkedHashMap(java.util.LinkedHashMap) Twister2SinkFunction(org.apache.beam.runners.twister2.translators.functions.Twister2SinkFunction) CachedTSet(edu.iu.dsc.tws.tset.sets.batch.CachedTSet) BatchTSet(edu.iu.dsc.tws.api.tset.sets.batch.BatchTSet) ComputeTSet(edu.iu.dsc.tws.tset.sets.batch.ComputeTSet) SinkTSet(edu.iu.dsc.tws.tset.sets.batch.SinkTSet) TSet(edu.iu.dsc.tws.api.tset.sets.TSet) HashMap(java.util.HashMap) LinkedHashMap(java.util.LinkedHashMap) Map(java.util.Map)

Example 5 with SinkTSet

use of edu.iu.dsc.tws.tset.sets.batch.SinkTSet in project twister2 by DSC-SPIDAL.

the class HadoopTSet method execute.

@Override
public void execute(Config config, JobAPI.Job job, IWorkerController workerController, IPersistentVolume persistentVolume, IVolatileVolume volatileVolume) {
    int workerId = workerController.getWorkerInfo().getWorkerID();
    WorkerEnvironment workerEnv = WorkerEnvironment.init(config, job, workerController, persistentVolume, volatileVolume);
    BatchEnvironment tSetEnv = TSetEnvironment.initBatch(workerEnv);
    Configuration configuration = new Configuration();
    configuration.addResource(new Path(HdfsDataContext.getHdfsConfigDirectory(config)));
    configuration.set(TextInputFormat.INPUT_DIR, "/input4");
    SourceTSet<String> source = tSetEnv.createHadoopSource(configuration, TextInputFormat.class, 4, new MapFunc<Tuple<LongWritable, Text>, String>() {

        @Override
        public String map(Tuple<LongWritable, Text> input) {
            return input.getKey().toString() + " : " + input.getValue().toString();
        }
    });
    SinkTSet<Iterator<String>> sink = source.direct().sink((SinkFunc<Iterator<String>>) value -> {
        while (value.hasNext()) {
            String next = value.next();
            LOG.info("Received value: " + next);
        }
        return true;
    });
    tSetEnv.run(sink);
}
Also used : Path(org.apache.hadoop.fs.Path) Twister2Job(edu.iu.dsc.tws.api.Twister2Job) HdfsDataContext(edu.iu.dsc.tws.data.utils.HdfsDataContext) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) Text(org.apache.hadoop.io.Text) IPersistentVolume(edu.iu.dsc.tws.api.resource.IPersistentVolume) HashMap(java.util.HashMap) Config(edu.iu.dsc.tws.api.config.Config) MapFunc(edu.iu.dsc.tws.api.tset.fn.MapFunc) LongWritable(org.apache.hadoop.io.LongWritable) JobConfig(edu.iu.dsc.tws.api.JobConfig) TextInputFormat(org.apache.hadoop.mapreduce.lib.input.TextInputFormat) Configuration(org.apache.hadoop.conf.Configuration) Path(org.apache.hadoop.fs.Path) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) Iterator(java.util.Iterator) IVolatileVolume(edu.iu.dsc.tws.api.resource.IVolatileVolume) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) SinkTSet(edu.iu.dsc.tws.tset.sets.batch.SinkTSet) JobAPI(edu.iu.dsc.tws.proto.system.job.JobAPI) Logger(java.util.logging.Logger) SinkFunc(edu.iu.dsc.tws.api.tset.fn.SinkFunc) Serializable(java.io.Serializable) Twister2Submitter(edu.iu.dsc.tws.rsched.job.Twister2Submitter) IWorker(edu.iu.dsc.tws.api.resource.IWorker) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) IWorkerController(edu.iu.dsc.tws.api.resource.IWorkerController) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) Configuration(org.apache.hadoop.conf.Configuration) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) Text(org.apache.hadoop.io.Text) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) Iterator(java.util.Iterator) LongWritable(org.apache.hadoop.io.LongWritable) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple)

Aggregations

SinkTSet (edu.iu.dsc.tws.tset.sets.batch.SinkTSet)9 HashMap (java.util.HashMap)8 WorkerEnvironment (edu.iu.dsc.tws.api.resource.WorkerEnvironment)7 BatchEnvironment (edu.iu.dsc.tws.tset.env.BatchEnvironment)7 TSetEnvironment (edu.iu.dsc.tws.tset.env.TSetEnvironment)7 SourceTSet (edu.iu.dsc.tws.tset.sets.batch.SourceTSet)7 Logger (java.util.logging.Logger)7 JobConfig (edu.iu.dsc.tws.api.JobConfig)6 Config (edu.iu.dsc.tws.api.config.Config)6 ResourceAllocator (edu.iu.dsc.tws.rsched.core.ResourceAllocator)6 SinkFunc (edu.iu.dsc.tws.api.tset.fn.SinkFunc)5 Iterator (java.util.Iterator)5 Tuple (edu.iu.dsc.tws.api.comms.structs.Tuple)4 ComputeFunc (edu.iu.dsc.tws.api.tset.fn.ComputeFunc)4 Twister2Job (edu.iu.dsc.tws.api.Twister2Job)3 ComputeCollectorFunc (edu.iu.dsc.tws.api.tset.fn.ComputeCollectorFunc)3 MapFunc (edu.iu.dsc.tws.api.tset.fn.MapFunc)3 Twister2Submitter (edu.iu.dsc.tws.rsched.job.Twister2Submitter)3 ComputeTSet (edu.iu.dsc.tws.tset.sets.batch.ComputeTSet)3 Serializable (java.io.Serializable)3