Search in sources :

Example 6 with DataObject

use of edu.iu.dsc.tws.api.dataset.DataObject in project twister2 by DSC-SPIDAL.

the class PredictionSourceTask method getWeightVectorByWeightVectorObject.

public Object getWeightVectorByWeightVectorObject(int taskIndex, DataObject<?> datapointsDataObject) {
    Iterator<ArrayList> arrayListIterator = (Iterator<ArrayList>) datapointsDataObject.getPartition(taskIndex).getConsumer().next();
    List<Object> items = new ArrayList<>();
    while (arrayListIterator.hasNext()) {
        Object object = arrayListIterator.next();
        items.add(object);
    }
    return items;
}
Also used : ArrayList(java.util.ArrayList) Iterator(java.util.Iterator) DataObject(edu.iu.dsc.tws.api.dataset.DataObject)

Example 7 with DataObject

use of edu.iu.dsc.tws.api.dataset.DataObject in project twister2 by DSC-SPIDAL.

the class PredictionSourceTask method getDataPointsByDataObject.

public Object getDataPointsByDataObject(int taskIndex, DataObject<?> datapointsDataObject) {
    Iterator<ArrayList> arrayListIterator = (Iterator<ArrayList>) datapointsDataObject.getPartition(taskIndex).getConsumer().next();
    List<Object> items = new ArrayList<>();
    while (arrayListIterator.hasNext()) {
        Object object = arrayListIterator.next();
        items.add(object);
    }
    return items;
}
Also used : ArrayList(java.util.ArrayList) Iterator(java.util.Iterator) DataObject(edu.iu.dsc.tws.api.dataset.DataObject)

Example 8 with DataObject

use of edu.iu.dsc.tws.api.dataset.DataObject in project twister2 by DSC-SPIDAL.

the class SvmSgdAdvancedRunner method executeTrainingDataLoadingTaskGraph.

/**
 * This method loads the training data in a distributed mode
 * dataStreamerParallelism is the amount of parallelism used
 * in loaded the data in parallel.
 *
 * @return twister2 DataObject containing the training data
 */
public DataObject<Object> executeTrainingDataLoadingTaskGraph() {
    DataObject<Object> data = null;
    DataObjectSource sourceTask = new DataObjectSource(Context.TWISTER2_DIRECT_EDGE, this.svmJobParameters.getTrainingDataDir());
    DataObjectSink sinkTask = new DataObjectSink();
    trainingBuilder.addSource(Constants.SimpleGraphConfig.DATA_OBJECT_SOURCE, sourceTask, dataStreamerParallelism);
    ComputeConnection firstGraphComputeConnection = trainingBuilder.addCompute(Constants.SimpleGraphConfig.DATA_OBJECT_SINK, sinkTask, dataStreamerParallelism);
    firstGraphComputeConnection.direct(Constants.SimpleGraphConfig.DATA_OBJECT_SOURCE).viaEdge(Context.TWISTER2_DIRECT_EDGE).withDataType(MessageTypes.OBJECT);
    trainingBuilder.setMode(OperationMode.BATCH);
    ComputeGraph datapointsTaskGraph = trainingBuilder.build();
    datapointsTaskGraph.setGraphName("training-data-loading-graph");
    ExecutionPlan firstGraphExecutionPlan = taskExecutor.plan(datapointsTaskGraph);
    taskExecutor.execute(datapointsTaskGraph, firstGraphExecutionPlan);
    data = taskExecutor.getOutput(datapointsTaskGraph, firstGraphExecutionPlan, Constants.SimpleGraphConfig.DATA_OBJECT_SINK);
    if (data == null) {
        throw new NullPointerException("Something Went Wrong in Loading Training Data");
    } else {
        LOG.info("Training Data Total Partitions : " + data.getPartitions().length);
    }
    return data;
}
Also used : DataObjectSink(edu.iu.dsc.tws.task.dataobjects.DataObjectSink) ExecutionPlan(edu.iu.dsc.tws.api.compute.executor.ExecutionPlan) ComputeGraph(edu.iu.dsc.tws.api.compute.graph.ComputeGraph) DataObject(edu.iu.dsc.tws.api.dataset.DataObject) DataObjectSource(edu.iu.dsc.tws.task.dataobjects.DataObjectSource) ComputeConnection(edu.iu.dsc.tws.task.impl.ComputeConnection)

Example 9 with DataObject

use of edu.iu.dsc.tws.api.dataset.DataObject in project twister2 by DSC-SPIDAL.

the class SvmSgdAdvancedRunner method executeWeightVectorLoadingTaskGraph.

/**
 * This method loads the training data in a distributed mode
 * dataStreamerParallelism is the amount of parallelism used
 * in loaded the data in parallel.
 *
 * @return twister2 DataObject containing the training data
 */
public DataObject<Object> executeWeightVectorLoadingTaskGraph() {
    DataObject<Object> data = null;
    DataObjectSource sourceTask = new DataObjectSource(Context.TWISTER2_DIRECT_EDGE, this.svmJobParameters.getWeightVectorDataDir());
    DataObjectSink sinkTask = new DataObjectSink();
    trainingBuilder.addSource(Constants.SimpleGraphConfig.DATA_OBJECT_SOURCE, sourceTask, dataStreamerParallelism);
    ComputeConnection firstGraphComputeConnection = trainingBuilder.addCompute(Constants.SimpleGraphConfig.DATA_OBJECT_SINK, sinkTask, dataStreamerParallelism);
    firstGraphComputeConnection.direct(Constants.SimpleGraphConfig.DATA_OBJECT_SOURCE).viaEdge(Context.TWISTER2_DIRECT_EDGE).withDataType(MessageTypes.OBJECT);
    trainingBuilder.setMode(OperationMode.BATCH);
    ComputeGraph datapointsTaskGraph = trainingBuilder.build();
    datapointsTaskGraph.setGraphName("weight-vector-loading-graph");
    ExecutionPlan firstGraphExecutionPlan = taskExecutor.plan(datapointsTaskGraph);
    taskExecutor.execute(datapointsTaskGraph, firstGraphExecutionPlan);
    data = taskExecutor.getOutput(datapointsTaskGraph, firstGraphExecutionPlan, Constants.SimpleGraphConfig.DATA_OBJECT_SINK);
    if (data == null) {
        throw new NullPointerException("Something Went Wrong in Loading Weight Vector");
    } else {
        LOG.info("Training Data Total Partitions : " + data.getPartitions().length);
    }
    return data;
}
Also used : DataObjectSink(edu.iu.dsc.tws.task.dataobjects.DataObjectSink) ExecutionPlan(edu.iu.dsc.tws.api.compute.executor.ExecutionPlan) ComputeGraph(edu.iu.dsc.tws.api.compute.graph.ComputeGraph) DataObject(edu.iu.dsc.tws.api.dataset.DataObject) DataObjectSource(edu.iu.dsc.tws.task.dataobjects.DataObjectSource) ComputeConnection(edu.iu.dsc.tws.task.impl.ComputeConnection)

Example 10 with DataObject

use of edu.iu.dsc.tws.api.dataset.DataObject in project twister2 by DSC-SPIDAL.

the class SvmSgdAdvancedRunner method retriveFinalTestingAccuracy.

/**
 * Calculates the final accuracy by taking the dataParallelism in to consideration
 * Here the parallelism is vital as we need to know the average accuracy produced by
 * each testing data set.
 *
 * @param finalRes DataObject which contains the final accuracy
 */
public double retriveFinalTestingAccuracy(DataObject<Object> finalRes) {
    double avgAcc = 0;
    Object o = finalRes.getPartitions()[0].getConsumer().next();
    if (o instanceof Double) {
        avgAcc = ((double) o) / dataStreamerParallelism;
        LOG.info(String.format("Testing Accuracy  : %f ", avgAcc));
    } else {
        LOG.severe("Something Went Wrong In Calculating Testing Accuracy");
    }
    return avgAcc;
}
Also used : DataObject(edu.iu.dsc.tws.api.dataset.DataObject)

Aggregations

DataObject (edu.iu.dsc.tws.api.dataset.DataObject)16 ExecutionPlan (edu.iu.dsc.tws.api.compute.executor.ExecutionPlan)10 ComputeGraph (edu.iu.dsc.tws.api.compute.graph.ComputeGraph)10 ComputeConnection (edu.iu.dsc.tws.task.impl.ComputeConnection)6 Iterator (java.util.Iterator)5 IExecutor (edu.iu.dsc.tws.api.compute.executor.IExecutor)4 Config (edu.iu.dsc.tws.api.config.Config)4 DataObjectSink (edu.iu.dsc.tws.task.dataobjects.DataObjectSink)4 DataObjectSource (edu.iu.dsc.tws.task.dataobjects.DataObjectSource)4 ArrayList (java.util.ArrayList)4 JobConfig (edu.iu.dsc.tws.api.JobConfig)3 DataPartition (edu.iu.dsc.tws.api.dataset.DataPartition)3 ComputeEnvironment (edu.iu.dsc.tws.task.ComputeEnvironment)3 TaskExecutor (edu.iu.dsc.tws.task.impl.TaskExecutor)3 INodeInstance (edu.iu.dsc.tws.api.compute.executor.INodeInstance)2 Receptor (edu.iu.dsc.tws.api.compute.modifiers.Receptor)2 INode (edu.iu.dsc.tws.api.compute.nodes.INode)2 EmptyDataObject (edu.iu.dsc.tws.api.dataset.EmptyDataObject)2 Twister2RuntimeException (edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException)2 DataObjectImpl (edu.iu.dsc.tws.dataset.DataObjectImpl)2