Search in sources :

Example 31 with ComputeConnection

use of edu.iu.dsc.tws.task.impl.ComputeConnection in project twister2 by DSC-SPIDAL.

the class TaskGraphBuildTest method createGraphWithEdgeName.

private ComputeGraph createGraphWithEdgeName(String edgeName) {
    TestSource testSource = new TestSource();
    TestSink1 testCompute = new TestSink1();
    TestSink2 testSink = new TestSink2();
    ComputeGraphBuilder computeGraphBuilder = ComputeGraphBuilder.newBuilder(getConfig());
    computeGraphBuilder.addSource("source", testSource, 4);
    ComputeConnection computeConnection = computeGraphBuilder.addCompute("compute", testCompute, 4);
    computeConnection.partition("source").viaEdge(edgeName).withDataType(MessageTypes.OBJECT);
    ComputeConnection rc = computeGraphBuilder.addCompute("sink", testSink, 1);
    rc.allreduce("compute").viaEdge(edgeName).withReductionFunction(new Aggregator()).withDataType(MessageTypes.OBJECT);
    ComputeGraph graph = computeGraphBuilder.build();
    return graph;
}
Also used : ComputeGraph(edu.iu.dsc.tws.api.compute.graph.ComputeGraph) ComputeGraphBuilder(edu.iu.dsc.tws.task.impl.ComputeGraphBuilder) ComputeConnection(edu.iu.dsc.tws.task.impl.ComputeConnection)

Example 32 with ComputeConnection

use of edu.iu.dsc.tws.task.impl.ComputeConnection in project twister2 by DSC-SPIDAL.

the class MultiStageGraph method execute.

@Override
public void execute() {
    GeneratorTask g = new GeneratorTask();
    ReduceTask rt = new ReduceTask();
    PartitionTask r = new PartitionTask();
    ComputeGraphBuilder builder = ComputeGraphBuilder.newBuilder(config);
    builder.addSource("source", g, 4);
    ComputeConnection pc = builder.addCompute("compute", r, 4);
    pc.partition("source").viaEdge("partition-edge").withDataType(MessageTypes.OBJECT);
    ComputeConnection rc = builder.addCompute("sink", rt, 1);
    rc.reduce("compute").viaEdge("compute-edge").withReductionFunction((object1, object2) -> object1);
    builder.setMode(OperationMode.BATCH);
    ComputeGraph graph = builder.build();
    graph.setGraphName("MultiTaskGraph");
    ExecutionPlan plan = taskExecutor.plan(graph);
    taskExecutor.execute(graph, plan);
}
Also used : ExecutionPlan(edu.iu.dsc.tws.api.compute.executor.ExecutionPlan) ComputeGraph(edu.iu.dsc.tws.api.compute.graph.ComputeGraph) ComputeGraphBuilder(edu.iu.dsc.tws.task.impl.ComputeGraphBuilder) ComputeConnection(edu.iu.dsc.tws.task.impl.ComputeConnection)

Example 33 with ComputeConnection

use of edu.iu.dsc.tws.task.impl.ComputeConnection in project twister2 by DSC-SPIDAL.

the class TGUtils method generateGenericDataPointLoader.

public static ComputeGraph generateGenericDataPointLoader(int samples, int parallelism, int numOfFeatures, String dataSourcePathStr, String dataObjectSourceStr, String dataObjectComputeStr, String dataObjectSinkStr, String graphName, Config config, OperationMode opMode) {
    SVMDataObjectSource<String, TextInputSplit> sourceTask = new SVMDataObjectSource(Context.TWISTER2_DIRECT_EDGE, dataSourcePathStr, samples);
    IterativeSVMDataObjectCompute dataObjectCompute = new IterativeSVMDataObjectCompute(Context.TWISTER2_DIRECT_EDGE, parallelism, samples, numOfFeatures, DELIMITER);
    IterativeSVMDataObjectDirectSink iterativeSVMPrimaryDataObjectDirectSink = new IterativeSVMDataObjectDirectSink();
    ComputeGraphBuilder datapointsComputeGraphBuilder = ComputeGraphBuilder.newBuilder(config);
    datapointsComputeGraphBuilder.addSource(dataObjectSourceStr, sourceTask, parallelism);
    ComputeConnection datapointComputeConnection = datapointsComputeGraphBuilder.addCompute(dataObjectComputeStr, dataObjectCompute, parallelism);
    ComputeConnection computeConnectionSink = datapointsComputeGraphBuilder.addCompute(dataObjectSinkStr, iterativeSVMPrimaryDataObjectDirectSink, parallelism);
    datapointComputeConnection.direct(dataObjectSourceStr).viaEdge(Context.TWISTER2_DIRECT_EDGE).withDataType(MessageTypes.OBJECT);
    computeConnectionSink.direct(dataObjectComputeStr).viaEdge(Context.TWISTER2_DIRECT_EDGE).withDataType(MessageTypes.OBJECT);
    datapointsComputeGraphBuilder.setMode(opMode);
    datapointsComputeGraphBuilder.setTaskGraphName(graphName);
    // Build the first taskgraph
    return datapointsComputeGraphBuilder.build();
}
Also used : TextInputSplit(edu.iu.dsc.tws.data.api.splits.TextInputSplit) IterativeSVMDataObjectDirectSink(edu.iu.dsc.tws.examples.ml.svm.data.IterativeSVMDataObjectDirectSink) IterativeSVMDataObjectCompute(edu.iu.dsc.tws.examples.ml.svm.data.IterativeSVMDataObjectCompute) ComputeGraphBuilder(edu.iu.dsc.tws.task.impl.ComputeGraphBuilder) SVMDataObjectSource(edu.iu.dsc.tws.examples.ml.svm.data.SVMDataObjectSource) ComputeConnection(edu.iu.dsc.tws.task.impl.ComputeConnection)

Example 34 with ComputeConnection

use of edu.iu.dsc.tws.task.impl.ComputeConnection in project twister2 by DSC-SPIDAL.

the class SvmSgdAdvancedRunner method executeTrainingDataLoadingTaskGraph.

/**
 * This method loads the training data in a distributed mode
 * dataStreamerParallelism is the amount of parallelism used
 * in loaded the data in parallel.
 *
 * @return twister2 DataObject containing the training data
 */
public DataObject<Object> executeTrainingDataLoadingTaskGraph() {
    DataObject<Object> data = null;
    DataObjectSource sourceTask = new DataObjectSource(Context.TWISTER2_DIRECT_EDGE, this.svmJobParameters.getTrainingDataDir());
    DataObjectSink sinkTask = new DataObjectSink();
    trainingBuilder.addSource(Constants.SimpleGraphConfig.DATA_OBJECT_SOURCE, sourceTask, dataStreamerParallelism);
    ComputeConnection firstGraphComputeConnection = trainingBuilder.addCompute(Constants.SimpleGraphConfig.DATA_OBJECT_SINK, sinkTask, dataStreamerParallelism);
    firstGraphComputeConnection.direct(Constants.SimpleGraphConfig.DATA_OBJECT_SOURCE).viaEdge(Context.TWISTER2_DIRECT_EDGE).withDataType(MessageTypes.OBJECT);
    trainingBuilder.setMode(OperationMode.BATCH);
    ComputeGraph datapointsTaskGraph = trainingBuilder.build();
    datapointsTaskGraph.setGraphName("training-data-loading-graph");
    ExecutionPlan firstGraphExecutionPlan = taskExecutor.plan(datapointsTaskGraph);
    taskExecutor.execute(datapointsTaskGraph, firstGraphExecutionPlan);
    data = taskExecutor.getOutput(datapointsTaskGraph, firstGraphExecutionPlan, Constants.SimpleGraphConfig.DATA_OBJECT_SINK);
    if (data == null) {
        throw new NullPointerException("Something Went Wrong in Loading Training Data");
    } else {
        LOG.info("Training Data Total Partitions : " + data.getPartitions().length);
    }
    return data;
}
Also used : DataObjectSink(edu.iu.dsc.tws.task.dataobjects.DataObjectSink) ExecutionPlan(edu.iu.dsc.tws.api.compute.executor.ExecutionPlan) ComputeGraph(edu.iu.dsc.tws.api.compute.graph.ComputeGraph) DataObject(edu.iu.dsc.tws.api.dataset.DataObject) DataObjectSource(edu.iu.dsc.tws.task.dataobjects.DataObjectSource) ComputeConnection(edu.iu.dsc.tws.task.impl.ComputeConnection)

Example 35 with ComputeConnection

use of edu.iu.dsc.tws.task.impl.ComputeConnection in project twister2 by DSC-SPIDAL.

the class SvmSgdAdvancedRunner method executeIterativeTrainingGraph.

/**
 * This method executes the iterative training graph
 * Training is done in parallel depending on the parallelism factor given
 * In this implementation the data loading parallelism and data computing or
 * training parallelism is same. It is the general model to keep them equal. But
 * you can increase the parallelism the way you want. But it is adviced to keep these
 * values equal. Dynamic parallelism in training is not yet tested fully in Twister2 Framework.
 *
 * @return Twister2 DataObject{@literal <double[]>} containing the reduced weight vector
 */
public DataObject<double[]> executeIterativeTrainingGraph() {
    DataObject<double[]> trainedWeight = null;
    dataStreamer = new InputDataStreamer(this.operationMode, svmJobParameters.isDummy(), this.binaryBatchModel);
    iterativeSVMCompute = new IterativeSVMCompute(this.binaryBatchModel, this.operationMode);
    svmReduce = new SVMReduce(this.operationMode);
    trainingBuilder.addSource(Constants.SimpleGraphConfig.DATASTREAMER_SOURCE, dataStreamer, dataStreamerParallelism);
    ComputeConnection svmComputeConnection = trainingBuilder.addCompute(Constants.SimpleGraphConfig.SVM_COMPUTE, iterativeSVMCompute, svmComputeParallelism);
    ComputeConnection svmReduceConnection = trainingBuilder.addCompute(Constants.SimpleGraphConfig.SVM_REDUCE, svmReduce, reduceParallelism);
    svmComputeConnection.direct(Constants.SimpleGraphConfig.DATASTREAMER_SOURCE).viaEdge(Constants.SimpleGraphConfig.DATA_EDGE).withDataType(MessageTypes.OBJECT);
    // svmReduceConnection
    // .reduce(Constants.SimpleGraphConfig.SVM_COMPUTE, Constants.SimpleGraphConfig.REDUCE_EDGE,
    // new ReduceAggregator(), DataType.OBJECT);
    svmReduceConnection.allreduce(Constants.SimpleGraphConfig.SVM_COMPUTE).viaEdge(Constants.SimpleGraphConfig.REDUCE_EDGE).withReductionFunction(new ReduceAggregator()).withDataType(MessageTypes.OBJECT);
    trainingBuilder.setMode(operationMode);
    ComputeGraph graph = trainingBuilder.build();
    graph.setGraphName("training-graph");
    ExecutionPlan plan = taskExecutor.plan(graph);
    IExecutor ex = taskExecutor.createExecution(graph, plan);
    // iteration is being decoupled from the computation task
    for (int i = 0; i < this.binaryBatchModel.getIterations(); i++) {
        taskExecutor.addInput(graph, plan, Constants.SimpleGraphConfig.DATASTREAMER_SOURCE, Constants.SimpleGraphConfig.INPUT_DATA, trainingData);
        taskExecutor.addInput(graph, plan, Constants.SimpleGraphConfig.DATASTREAMER_SOURCE, Constants.SimpleGraphConfig.INPUT_WEIGHT_VECTOR, inputWeightVector);
        inputWeightVector = taskExecutor.getOutput(graph, plan, Constants.SimpleGraphConfig.SVM_REDUCE);
        ex.execute();
    }
    ex.closeExecution();
    LOG.info("Task Graph Executed !!! ");
    if (workerId == 0) {
        trainedWeight = retrieveWeightVectorFromTaskGraph(graph, plan);
        this.trainedWeightVector = trainedWeight;
    }
    return trainedWeight;
}
Also used : SVMReduce(edu.iu.dsc.tws.examples.ml.svm.aggregate.SVMReduce) ReduceAggregator(edu.iu.dsc.tws.examples.ml.svm.aggregate.ReduceAggregator) ExecutionPlan(edu.iu.dsc.tws.api.compute.executor.ExecutionPlan) IterativeSVMCompute(edu.iu.dsc.tws.examples.ml.svm.compute.IterativeSVMCompute) ComputeGraph(edu.iu.dsc.tws.api.compute.graph.ComputeGraph) IExecutor(edu.iu.dsc.tws.api.compute.executor.IExecutor) InputDataStreamer(edu.iu.dsc.tws.examples.ml.svm.streamer.InputDataStreamer) ComputeConnection(edu.iu.dsc.tws.task.impl.ComputeConnection)

Aggregations

ComputeConnection (edu.iu.dsc.tws.task.impl.ComputeConnection)65 ComputeGraphBuilder (edu.iu.dsc.tws.task.impl.ComputeGraphBuilder)55 ComputeGraph (edu.iu.dsc.tws.api.compute.graph.ComputeGraph)40 TaskSchedulerClassTest (edu.iu.dsc.tws.tsched.utils.TaskSchedulerClassTest)16 ExecutionPlan (edu.iu.dsc.tws.api.compute.executor.ExecutionPlan)13 DataFlowGraph (edu.iu.dsc.tws.task.cdfw.DataFlowGraph)8 DataObject (edu.iu.dsc.tws.api.dataset.DataObject)6 GraphDataSource (edu.iu.dsc.tws.graphapi.partition.GraphDataSource)6 DataObjectSource (edu.iu.dsc.tws.task.dataobjects.DataObjectSource)6 DataObjectSink (edu.iu.dsc.tws.task.dataobjects.DataObjectSink)5 ReduceAggregator (edu.iu.dsc.tws.examples.ml.svm.aggregate.ReduceAggregator)4 ConnectedSink (edu.iu.dsc.tws.task.cdfw.task.ConnectedSink)4 SVMReduce (edu.iu.dsc.tws.examples.ml.svm.aggregate.SVMReduce)3 DataFileReplicatedReadSource (edu.iu.dsc.tws.task.dataobjects.DataFileReplicatedReadSource)3 IExecutor (edu.iu.dsc.tws.api.compute.executor.IExecutor)2 Config (edu.iu.dsc.tws.api.config.Config)2 TextInputSplit (edu.iu.dsc.tws.data.api.splits.TextInputSplit)2 IterativeAccuracyReduceFunction (edu.iu.dsc.tws.examples.ml.svm.aggregate.IterativeAccuracyReduceFunction)2 IterativeSVMCompute (edu.iu.dsc.tws.examples.ml.svm.compute.IterativeSVMCompute)2 SVMCompute (edu.iu.dsc.tws.examples.ml.svm.compute.SVMCompute)2