Search in sources :

Example 1 with DataSource

use of edu.iu.dsc.tws.dataset.DataSource in project twister2 by DSC-SPIDAL.

the class DataLoadingTask method prepare.

@Override
public void prepare(TSetContext context) {
    super.prepare(context);
    this.config = context.getConfig();
    this.parallelism = context.getParallelism();
    LOG.info(String.format("%d, %d, %d", context.getIndex(), this.svmJobParameters.getParallelism(), context.getParallelism()));
    // dimension is +1 features as the input data comes along with the label
    this.dimension = this.binaryBatchModel.getFeatures() + 1;
    if ("train".equalsIgnoreCase(this.dataType)) {
        this.dataSize = this.binaryBatchModel.getSamples();
        this.localPoints = new double[this.dataSize / parallelism][this.dimension];
        LOG.info(String.format("Data Size : %d, Array Shape [%d,%d]", this.dataSize, this.localPoints.length, this.dimension));
        this.source = new DataSource(config, new LocalFixedInputPartitioner(new Path(this.svmJobParameters.getTrainingDataDir()), this.parallelism, config, dataSize), this.parallelism);
    }
    if ("test".equalsIgnoreCase(this.dataType)) {
        this.dataSize = this.svmJobParameters.getTestingSamples();
        this.localPoints = new double[this.dataSize / parallelism][this.dimension];
        this.source = new DataSource(config, new LocalFixedInputPartitioner(new Path(this.svmJobParameters.getTestingDataDir()), this.parallelism, config, dataSize), this.parallelism);
    }
}
Also used : Path(edu.iu.dsc.tws.api.data.Path) LocalFixedInputPartitioner(edu.iu.dsc.tws.data.api.formatters.LocalFixedInputPartitioner) DataSource(edu.iu.dsc.tws.dataset.DataSource)

Example 2 with DataSource

use of edu.iu.dsc.tws.dataset.DataSource in project twister2 by DSC-SPIDAL.

the class KMeansDataGeneratorTest method testUniqueSchedules1.

@Test
public void testUniqueSchedules1() throws IOException {
    Config config = getConfig();
    String dinputDirectory = "/tmp/testdinput";
    int numFiles = 1;
    int dsize = 20;
    int dimension = 2;
    int parallelismValue = 2;
    KMeansDataGenerator.generateData("txt", new Path(dinputDirectory), numFiles, dsize, 100, dimension, config);
    ComputeGraphBuilder computeGraphBuilder = ComputeGraphBuilder.newBuilder(config);
    computeGraphBuilder.setTaskGraphName("kmeans");
    DataObjectSource sourceTask = new DataObjectSource("direct", dinputDirectory);
    DataObjectSink sinkTask = new DataObjectSink();
    computeGraphBuilder.addSource("source", sourceTask, parallelismValue);
    ComputeConnection computeConnection1 = computeGraphBuilder.addCompute("sink", sinkTask, parallelismValue);
    computeConnection1.direct("source").viaEdge("direct").withDataType(MessageTypes.OBJECT);
    computeGraphBuilder.setMode(OperationMode.BATCH);
    LocalTextInputPartitioner localTextInputPartitioner = new LocalTextInputPartitioner(new Path(dinputDirectory), parallelismValue, config);
    DataSource<String, ?> source = new DataSource<>(config, localTextInputPartitioner, parallelismValue);
    InputSplit<String> inputSplit;
    for (int i = 0; i < parallelismValue; i++) {
        inputSplit = source.getNextSplit(i);
        Assert.assertNotNull(inputSplit);
    }
}
Also used : Path(edu.iu.dsc.tws.api.data.Path) LocalTextInputPartitioner(edu.iu.dsc.tws.data.api.formatters.LocalTextInputPartitioner) DataObjectSink(edu.iu.dsc.tws.task.dataobjects.DataObjectSink) Config(edu.iu.dsc.tws.api.config.Config) ComputeGraphBuilder(edu.iu.dsc.tws.task.impl.ComputeGraphBuilder) DataObjectSource(edu.iu.dsc.tws.task.dataobjects.DataObjectSource) ComputeConnection(edu.iu.dsc.tws.task.impl.ComputeConnection) DataSource(edu.iu.dsc.tws.dataset.DataSource) Test(org.junit.Test)

Example 3 with DataSource

use of edu.iu.dsc.tws.dataset.DataSource in project twister2 by DSC-SPIDAL.

the class TextBasedSourceFunction method prepare.

@Override
public void prepare(TSetContext context) {
    super.prepare(context);
    this.ctx = context;
    Config cfg = ctx.getConfig();
    if ("complete".equals(partitionerType)) {
        this.dataSource = new DataSource(cfg, new LocalCompleteCSVInputPartitioner(new Path(datainputDirectory), context.getParallelism(), dataSize, cfg), parallel);
    } else {
        this.dataSource = new DataSource(cfg, new LocalCSVInputPartitioner(new Path(datainputDirectory), parallel, dataSize, cfg), parallel);
    }
    this.dataSplit = this.dataSource.getNextSplit(context.getIndex());
}
Also used : Path(edu.iu.dsc.tws.api.data.Path) LocalCompleteCSVInputPartitioner(edu.iu.dsc.tws.data.api.formatters.LocalCompleteCSVInputPartitioner) Config(edu.iu.dsc.tws.api.config.Config) LocalCSVInputPartitioner(edu.iu.dsc.tws.data.api.formatters.LocalCSVInputPartitioner) DataSource(edu.iu.dsc.tws.dataset.DataSource)

Aggregations

Path (edu.iu.dsc.tws.api.data.Path)3 DataSource (edu.iu.dsc.tws.dataset.DataSource)3 Config (edu.iu.dsc.tws.api.config.Config)2 LocalCSVInputPartitioner (edu.iu.dsc.tws.data.api.formatters.LocalCSVInputPartitioner)1 LocalCompleteCSVInputPartitioner (edu.iu.dsc.tws.data.api.formatters.LocalCompleteCSVInputPartitioner)1 LocalFixedInputPartitioner (edu.iu.dsc.tws.data.api.formatters.LocalFixedInputPartitioner)1 LocalTextInputPartitioner (edu.iu.dsc.tws.data.api.formatters.LocalTextInputPartitioner)1 DataObjectSink (edu.iu.dsc.tws.task.dataobjects.DataObjectSink)1 DataObjectSource (edu.iu.dsc.tws.task.dataobjects.DataObjectSource)1 ComputeConnection (edu.iu.dsc.tws.task.impl.ComputeConnection)1 ComputeGraphBuilder (edu.iu.dsc.tws.task.impl.ComputeGraphBuilder)1 Test (org.junit.Test)1