Search in sources :

Example 31 with Twister2RuntimeException

use of edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException in project twister2 by DSC-SPIDAL.

the class WorkerManager method execute.

/**
 * Execute IWorker
 * return false if IWorker fails fully after retries
 * return true if execution successful
 * throw an exception if execution fails and the worker needs to be restarted from jvm
 */
public boolean execute() {
    while (JobProgress.getWorkerExecuteCount() < maxRetries) {
        LOG.info("Waiting on the init barrier before starting IWorker: " + workerID + " with restartCount: " + workerController.workerRestartCount() + " and with re-executionCount: " + JobProgress.getWorkerExecuteCount());
        try {
            workerController.waitOnInitBarrier();
            firstInitBarrierProceeded = true;
        } catch (TimeoutException e) {
            throw new Twister2RuntimeException("Could not pass through the init barrier", e);
        }
        LOG.fine("Proceeded through INIT barrier. Starting Worker: " + workerID);
        JobProgressImpl.setJobStatus(JobProgress.JobStatus.EXECUTING);
        JobProgressImpl.increaseWorkerExecuteCount();
        JobProgressImpl.setRestartedWorkers(restartedWorkers.values());
        try {
            managedWorker.execute(config, job, workerController, persistentVolume, volatileVolume);
        } catch (JobFaultyException cue) {
            // a worker in the cluster should have failed
            // we will try to re-execute this worker
            JobProgressImpl.setJobStatus(JobProgress.JobStatus.FAULTY);
            LOG.warning("thrown JobFaultyException. Some workers should have failed.");
        }
        // we need to make sure whether that all workers finished successfully also
        if (JobProgress.isJobHealthy()) {
            try {
                // wait on the barrier indefinitely until all workers arrive
                // or the barrier is broken with with a job fault
                LOG.info("Worker completed, waiting for other workers to finish at the final barrier.");
                workerController.waitOnBarrier(Long.MAX_VALUE);
                LOG.info("Worker finished successfully");
                return true;
            } catch (TimeoutException e) {
                // this should never happen
                throw new Twister2RuntimeException("Could not pass through the final barrier", e);
            } catch (JobFaultyException e) {
                JobProgressImpl.setJobStatus(JobProgress.JobStatus.FAULTY);
                LOG.warning("thrown JobFaultyException. Some workers failed before finishing.");
            }
        }
    }
    LOG.info(String.format("Re-executed IWorker %d times and failed, we are exiting", maxRetries));
    return false;
}
Also used : Twister2RuntimeException(edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException) JobFaultyException(edu.iu.dsc.tws.api.exceptions.JobFaultyException) TimeoutException(edu.iu.dsc.tws.api.exceptions.TimeoutException)

Example 32 with Twister2RuntimeException

use of edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException in project twister2 by DSC-SPIDAL.

the class CDFWExecutor method submitGraph.

private void submitGraph(DataFlowGraph dataFlowgraph, Set<Integer> workerIDs) {
    if (driverState == DriverState.INITIALIZE || driverState == DriverState.JOB_FINISHED) {
        try {
            // build the schedule plan for the dataflow graph
            DataFlowGraph dataFlowGraph = buildCDFWSchedulePlan(dataFlowgraph, workerIDs);
            CDFWJobAPI.SubGraph job = buildCDFWJob(dataFlowGraph);
            // now submit the job
            submitJob(job);
            driverState = DriverState.JOB_SUBMITTED;
            // lets wait for another event
            waitForEvent(DriveEventType.FINISHED_JOB);
            driverState = DriverState.JOB_FINISHED;
        } catch (Exception e) {
            throw new Twister2RuntimeException("Driver is not initialized", e);
        }
    } else {
        throw new Twister2RuntimeException("Failed to submit job in this state: " + driverState);
    }
}
Also used : Twister2RuntimeException(edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException) CDFWJobAPI(edu.iu.dsc.tws.proto.system.job.CDFWJobAPI) Twister2RuntimeException(edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException)

Example 33 with Twister2RuntimeException

use of edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException in project twister2 by DSC-SPIDAL.

the class CDFWExecutor method executeCDFW.

/**
 * The executeCDFW method first call the schedule method to get the schedule list of the CDFW.
 * Then, it invokes the buildCDFWJob method to build the job object for the scheduled graphs.
 */
public void executeCDFW(DataFlowGraph... graph) {
    if (!(driverState == DriverState.JOB_FINISHED || driverState == DriverState.INITIALIZE)) {
        // now we need to send messages
        throw new RuntimeException("Invalid state to execute a job: " + driverState);
    }
    CDFWScheduler cdfwScheduler = new CDFWScheduler(this.executionEnv.getWorkerInfoList());
    Map<DataFlowGraph, Set<Integer>> scheduleGraphMap = cdfwScheduler.schedule(graph);
    ScheduledExecutorService executor = Executors.newScheduledThreadPool(scheduleGraphMap.size());
    for (Map.Entry<DataFlowGraph, Set<Integer>> entry : scheduleGraphMap.entrySet()) {
        CDFWExecutorTask cdfwSchedulerTask = new CDFWExecutorTask(entry.getKey(), entry.getValue());
        executor.submit(cdfwSchedulerTask);
    }
    try {
        executor.awaitTermination(1, TimeUnit.SECONDS);
    } catch (InterruptedException e) {
        throw new Twister2RuntimeException(e);
    } finally {
        executor.shutdown();
    }
}
Also used : ScheduledExecutorService(java.util.concurrent.ScheduledExecutorService) Twister2RuntimeException(edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException) Twister2RuntimeException(edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException) Set(java.util.Set) Map(java.util.Map)

Example 34 with Twister2RuntimeException

use of edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException in project twister2 by DSC-SPIDAL.

the class TaskExecutor method distributeData.

/**
 * This method distributes collected {@link DataPartition}s to the
 * intended {@link Receptor}s
 */
public static void distributeData(ExecutionPlan executionPlan, Map<String, DataObject> dataMap) {
    Map<Integer, INodeInstance> nodes = executionPlan.getNodes();
    if (nodes != null) {
        nodes.forEach((id, node) -> {
            INode task = node.getNode();
            if (task instanceof Receptor) {
                Set<String> receivableNames = ((Receptor) task).getReceivableNames();
                for (String receivableName : receivableNames) {
                    DataObject dataObject = dataMap.get(receivableName);
                    if (dataObject == null) {
                        throw new Twister2RuntimeException("Couldn't find input data" + receivableName + " for task " + node.getId());
                    }
                    DataPartition partition = dataObject.getPartition(node.getIndex());
                    if (partition == null) {
                        throw new Twister2RuntimeException("Couldn't find input data" + receivableName + " for task index " + node.getIndex() + " of task" + node.getId());
                    }
                    ((Receptor) task).add(receivableName, dataObject);
                    ((Receptor) task).add(receivableName, partition);
                }
            }
        });
    }
}
Also used : INode(edu.iu.dsc.tws.api.compute.nodes.INode) Twister2RuntimeException(edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException) DataObject(edu.iu.dsc.tws.api.dataset.DataObject) EmptyDataObject(edu.iu.dsc.tws.api.dataset.EmptyDataObject) Receptor(edu.iu.dsc.tws.api.compute.modifiers.Receptor) INodeInstance(edu.iu.dsc.tws.api.compute.executor.INodeInstance) DataPartition(edu.iu.dsc.tws.api.dataset.DataPartition)

Example 35 with Twister2RuntimeException

use of edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException in project twister2 by DSC-SPIDAL.

the class RowItrComputeCollectorOp method prepare.

@Override
public void prepare(Config cfg, TaskContext ctx) {
    super.prepare(cfg, ctx);
    runtime = WorkerEnvironment.getSharedValue(TableRuntime.TABLE_RUNTIME_CONF, TableRuntime.class);
    if (runtime == null) {
        throw new Twister2RuntimeException("Table runtime must be set");
    }
    schema = (RowSchema) ctx.getConfig(TSetConstants.OUTPUT_SCHEMA_KEY);
    tableMaxSize = cfg.getLongValue("twister2.table.max.size", tableMaxSize);
    builder = new ArrowTableBuilder(schema.toArrowSchema(), runtime.getRootAllocator());
    collectorImp = new CollectorImp();
}
Also used : Twister2RuntimeException(edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException) TableRuntime(edu.iu.dsc.tws.common.table.arrow.TableRuntime) ArrowTableBuilder(edu.iu.dsc.tws.common.table.ArrowTableBuilder)

Aggregations

Twister2RuntimeException (edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException)65 Twister2Exception (edu.iu.dsc.tws.api.exceptions.Twister2Exception)17 IOException (java.io.IOException)16 ArrayList (java.util.ArrayList)10 JobMasterAPI (edu.iu.dsc.tws.proto.jobmaster.JobMasterAPI)8 Path (edu.iu.dsc.tws.api.data.Path)7 Config (edu.iu.dsc.tws.api.config.Config)6 File (java.io.File)5 TimeoutException (edu.iu.dsc.tws.api.exceptions.TimeoutException)4 FileInputStream (java.io.FileInputStream)4 InvocationTargetException (java.lang.reflect.InvocationTargetException)4 InvalidProtocolBufferException (com.google.protobuf.InvalidProtocolBufferException)3 Twister2JobState (edu.iu.dsc.tws.api.scheduler.Twister2JobState)3 ArrowTableBuilder (edu.iu.dsc.tws.common.table.ArrowTableBuilder)3 TableRuntime (edu.iu.dsc.tws.common.table.arrow.TableRuntime)3 List (java.util.List)3 Map (java.util.Map)3 Logger (java.util.logging.Logger)3 MessageType (edu.iu.dsc.tws.api.comms.messaging.types.MessageType)2 TaskSchedulerException (edu.iu.dsc.tws.api.compute.exceptions.TaskSchedulerException)2