Search in sources :

Example 1 with BatchEnvironment

use of edu.iu.dsc.tws.tset.env.BatchEnvironment in project twister2 by DSC-SPIDAL.

the class ReadSourceTranslatorBatch method translateNode.

@Override
public void translateNode(Read.Bounded<T> transform, Twister2BatchTranslationContext context) {
    BoundedSource<T> boundedSource = transform.getSource();
    Twister2BoundedSource<T> twister2BoundedSource = new Twister2BoundedSource<T>(boundedSource, context, context.getOptions());
    final TSetEnvironment tsetEnv = context.getEnvironment();
    // TODO: need to set paralliem value
    SourceTSet<WindowedValue<T>> sourceTSet = ((BatchEnvironment) tsetEnv).createSource(twister2BoundedSource, 1);
    PCollection<T> output = context.getOutput(transform);
    context.setOutputDataSet(output, sourceTSet);
}
Also used : Twister2BoundedSource(org.apache.beam.runners.twister2.translation.wrappers.Twister2BoundedSource) WindowedValue(org.apache.beam.sdk.util.WindowedValue) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment)

Example 2 with BatchEnvironment

use of edu.iu.dsc.tws.tset.env.BatchEnvironment in project twister2 by DSC-SPIDAL.

the class PythonWorker method execute.

public void execute(WorkerEnvironment workerEnvironment) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnvironment);
    execute(env);
}
Also used : BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment)

Example 3 with BatchEnvironment

use of edu.iu.dsc.tws.tset.env.BatchEnvironment in project twister2 by DSC-SPIDAL.

the class ArrowTSetSourceExample method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    Config config = env.getConfig();
    String csvInputDirectory = config.getStringValue(DataObjectConstants.DINPUT_DIRECTORY);
    String arrowInputDirectory = config.getStringValue(DataObjectConstants.ARROW_DIRECTORY);
    String arrowFileName = config.getStringValue(DataObjectConstants.FILE_NAME);
    int workers = config.getIntegerValue(DataObjectConstants.WORKERS);
    int parallel = config.getIntegerValue(DataObjectConstants.PARALLELISM_VALUE);
    int dsize = config.getIntegerValue(DataObjectConstants.DSIZE);
    LOG.info("arrow input file:" + arrowFileName + "\t" + arrowInputDirectory + "\t" + csvInputDirectory + "\t" + workers + "\t" + parallel);
    Schema schema = makeSchema();
    SourceTSet<String[]> csvSource = env.createCSVSource(csvInputDirectory, dsize, parallel, "split");
    SinkTSet<Iterator<Integer>> sinkTSet = csvSource.direct().map((MapFunc<String[], Integer>) input -> Integer.parseInt(input[0])).direct().sink(new ArrowBasedSinkFunction<>(arrowInputDirectory, arrowFileName, schema.toJson()));
    env.run(sinkTSet);
    // Source Function Call
    env.createArrowSource(arrowInputDirectory, arrowFileName, parallel, schema.toJson()).direct().compute((ComputeFunc<Iterator<Object>, List<Integer>>) input -> {
        List<Integer> integers = new ArrayList<>();
        input.forEachRemaining(i -> integers.add((Integer) i));
        return integers;
    }).direct().forEach(s -> LOG.info("Integer Array Size:" + s.size() + "\tvalues:" + s));
}
Also used : Twister2Job(edu.iu.dsc.tws.api.Twister2Job) ArrowBasedSinkFunction(edu.iu.dsc.tws.tset.fn.impl.ArrowBasedSinkFunction) Schema(org.apache.arrow.vector.types.pojo.Schema) ArrowType(org.apache.arrow.vector.types.pojo.ArrowType) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) Options(org.apache.commons.cli.Options) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) HashMap(java.util.HashMap) Config(edu.iu.dsc.tws.api.config.Config) MapFunc(edu.iu.dsc.tws.api.tset.fn.MapFunc) JobConfig(edu.iu.dsc.tws.api.JobConfig) ArrayList(java.util.ArrayList) Level(java.util.logging.Level) DefaultParser(org.apache.commons.cli.DefaultParser) ImmutableList(com.google.common.collect.ImmutableList) CommandLine(org.apache.commons.cli.CommandLine) Iterator(java.util.Iterator) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) CommandLineParser(org.apache.commons.cli.CommandLineParser) FieldType(org.apache.arrow.vector.types.pojo.FieldType) SinkTSet(edu.iu.dsc.tws.tset.sets.batch.SinkTSet) Field(org.apache.arrow.vector.types.pojo.Field) Logger(java.util.logging.Logger) Utils(edu.iu.dsc.tws.examples.Utils) DataObjectConstants(edu.iu.dsc.tws.data.utils.DataObjectConstants) Serializable(java.io.Serializable) Twister2Submitter(edu.iu.dsc.tws.rsched.job.Twister2Submitter) List(java.util.List) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) ComputeFunc(edu.iu.dsc.tws.api.tset.fn.ComputeFunc) Twister2Worker(edu.iu.dsc.tws.api.resource.Twister2Worker) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) Config(edu.iu.dsc.tws.api.config.Config) JobConfig(edu.iu.dsc.tws.api.JobConfig) Schema(org.apache.arrow.vector.types.pojo.Schema) Iterator(java.util.Iterator) ArrayList(java.util.ArrayList) ImmutableList(com.google.common.collect.ImmutableList) List(java.util.List)

Example 4 with BatchEnvironment

use of edu.iu.dsc.tws.tset.env.BatchEnvironment in project twister2 by DSC-SPIDAL.

the class ReduceExample method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    int start = env.getWorkerID() * 100;
    SourceTSet<Integer> src = dummySource(env, start, COUNT, PARALLELISM);
    ReduceTLink<Integer> reduce = src.reduce(Integer::sum);
    LOG.info("test foreach");
    reduce.forEach(i -> LOG.info("foreach: " + i));
    LOG.info("test map");
    reduce.map(i -> i.toString() + "$$").withSchema(PrimitiveSchemas.STRING).direct().forEach(s -> LOG.info("map: " + s));
    LOG.info("test flat map");
    reduce.flatmap((i, c) -> c.collect(i.toString() + "##")).withSchema(PrimitiveSchemas.STRING).direct().forEach(s -> LOG.info("flat:" + s));
    LOG.info("test compute");
    reduce.compute((ComputeFunc<Integer, String>) input -> "sum=" + input).withSchema(PrimitiveSchemas.STRING).direct().forEach(s -> LOG.info("compute: " + s));
    LOG.info("test computec");
    reduce.compute((ComputeCollectorFunc<Integer, String>) (input, output) -> output.collect("sum=" + input)).withSchema(PrimitiveSchemas.STRING).direct().forEach(s -> LOG.info("computec: " + s));
    LOG.info("test map2tup");
    reduce.mapToTuple(i -> new Tuple<>(i, i.toString())).keyedDirect().forEach(i -> LOG.info("mapToTuple: " + i.toString()));
    LOG.info("test sink");
    SinkTSet<Integer> sink = reduce.sink((SinkFunc<Integer>) value -> {
        LOG.info("val =" + value);
        return true;
    });
    env.run(sink);
}
Also used : Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) ComputeCollectorFunc(edu.iu.dsc.tws.api.tset.fn.ComputeCollectorFunc) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) SinkTSet(edu.iu.dsc.tws.tset.sets.batch.SinkTSet) HashMap(java.util.HashMap) Config(edu.iu.dsc.tws.api.config.Config) Logger(java.util.logging.Logger) JobConfig(edu.iu.dsc.tws.api.JobConfig) SinkFunc(edu.iu.dsc.tws.api.tset.fn.SinkFunc) ReduceTLink(edu.iu.dsc.tws.tset.links.batch.ReduceTLink) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) ComputeFunc(edu.iu.dsc.tws.api.tset.fn.ComputeFunc) PrimitiveSchemas(edu.iu.dsc.tws.api.tset.schema.PrimitiveSchemas) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) ComputeCollectorFunc(edu.iu.dsc.tws.api.tset.fn.ComputeCollectorFunc) ComputeFunc(edu.iu.dsc.tws.api.tset.fn.ComputeFunc)

Example 5 with BatchEnvironment

use of edu.iu.dsc.tws.tset.env.BatchEnvironment in project twister2 by DSC-SPIDAL.

the class KGatherUngroupedExample method execute.

@Override
public void execute(WorkerEnvironment workerEnv) {
    BatchEnvironment env = TSetEnvironment.initBatch(workerEnv);
    SourceTSet<Integer> src = dummySource(env, COUNT, PARALLELISM);
    KeyedGatherUngroupedTLink<Integer, Integer> klink = src.mapToTuple(i -> new Tuple<>(i % 4, i)).keyedGatherUngrouped();
    LOG.info("test foreach");
    klink.forEach((ApplyFunc<Tuple<Integer, Integer>>) data -> LOG.info(data.getKey() + " -> " + data.getValue()));
    LOG.info("test map");
    klink.map((MapFunc<Tuple<Integer, Integer>, String>) input -> input.getKey() + " -> " + input.getValue()).direct().forEach(s -> LOG.info("map: " + s));
    LOG.info("test compute");
    klink.compute((ComputeFunc<Iterator<Tuple<Integer, Integer>>, String>) input -> {
        StringBuilder sb = new StringBuilder();
        while (input.hasNext()) {
            Tuple<Integer, Integer> next = input.next();
            sb.append("[").append(next.getKey()).append("->").append(next.getValue()).append("]");
        }
        return sb.toString();
    }).direct().forEach(s -> LOG.info("compute: " + s));
    LOG.info("test computec");
    klink.compute((ComputeCollectorFunc<Iterator<Tuple<Integer, Integer>>, String>) (input, output) -> {
        while (input.hasNext()) {
            Tuple<Integer, Integer> next = input.next();
            output.collect(next.getKey() + " -> " + next.getValue() * 2);
        }
    }).direct().forEach(s -> LOG.info("computec: " + s));
}
Also used : Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) Iterator(java.util.Iterator) ComputeCollectorFunc(edu.iu.dsc.tws.api.tset.fn.ComputeCollectorFunc) SourceTSet(edu.iu.dsc.tws.tset.sets.batch.SourceTSet) ResourceAllocator(edu.iu.dsc.tws.rsched.core.ResourceAllocator) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) HashMap(java.util.HashMap) Config(edu.iu.dsc.tws.api.config.Config) MapFunc(edu.iu.dsc.tws.api.tset.fn.MapFunc) Logger(java.util.logging.Logger) JobConfig(edu.iu.dsc.tws.api.JobConfig) WorkerEnvironment(edu.iu.dsc.tws.api.resource.WorkerEnvironment) TSetEnvironment(edu.iu.dsc.tws.tset.env.TSetEnvironment) ComputeFunc(edu.iu.dsc.tws.api.tset.fn.ComputeFunc) ApplyFunc(edu.iu.dsc.tws.api.tset.fn.ApplyFunc) KeyedGatherUngroupedTLink(edu.iu.dsc.tws.tset.links.batch.KeyedGatherUngroupedTLink) BatchEnvironment(edu.iu.dsc.tws.tset.env.BatchEnvironment) Iterator(java.util.Iterator) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple)

Aggregations

BatchEnvironment (edu.iu.dsc.tws.tset.env.BatchEnvironment)59 Config (edu.iu.dsc.tws.api.config.Config)24 TSetEnvironment (edu.iu.dsc.tws.tset.env.TSetEnvironment)24 JobConfig (edu.iu.dsc.tws.api.JobConfig)23 WorkerEnvironment (edu.iu.dsc.tws.api.resource.WorkerEnvironment)23 Logger (java.util.logging.Logger)23 SourceTSet (edu.iu.dsc.tws.tset.sets.batch.SourceTSet)22 HashMap (java.util.HashMap)22 ResourceAllocator (edu.iu.dsc.tws.rsched.core.ResourceAllocator)21 Iterator (java.util.Iterator)21 Tuple (edu.iu.dsc.tws.api.comms.structs.Tuple)18 ComputeCollectorFunc (edu.iu.dsc.tws.api.tset.fn.ComputeCollectorFunc)12 ComputeFunc (edu.iu.dsc.tws.api.tset.fn.ComputeFunc)12 TSetContext (edu.iu.dsc.tws.api.tset.TSetContext)7 SinkTSet (edu.iu.dsc.tws.tset.sets.batch.SinkTSet)6 Twister2Job (edu.iu.dsc.tws.api.Twister2Job)5 MapFunc (edu.iu.dsc.tws.api.tset.fn.MapFunc)5 SinkFunc (edu.iu.dsc.tws.api.tset.fn.SinkFunc)5 Twister2Submitter (edu.iu.dsc.tws.rsched.job.Twister2Submitter)5 ComputeTSet (edu.iu.dsc.tws.tset.sets.batch.ComputeTSet)5