Search in sources :

Example 36 with Twister2Exception

use of edu.iu.dsc.tws.api.exceptions.Twister2Exception in project twister2 by DSC-SPIDAL.

the class NomadMasterStarter method launch.

/**
 * launch the job master
 *
 * @return false if setup fails
 */
public boolean launch() {
    // get the job working directory
    String jobWorkingDirectory = NomadContext.workingDirectory(config);
    LOG.log(Level.INFO, "job working directory ....." + jobWorkingDirectory);
    if (NomadContext.sharedFileSystem(config)) {
        if (!setupWorkingDirectory(job, jobWorkingDirectory)) {
            throw new RuntimeException("Failed to setup the directory");
        }
    }
    Config newConfig = Config.newBuilder().putAll(config).put(SchedulerContext.WORKING_DIRECTORY, jobWorkingDirectory).build();
    // now start the controller, which will get the resources from
    // slurm and start the job
    IController controller = new NomadController(true);
    controller.initialize(newConfig);
    // start the Job Master locally
    JobMaster jobMaster = null;
    Thread jmThread = null;
    if (JobMasterContext.jobMasterRunsInClient(config)) {
        try {
            int port = JobMasterContext.jobMasterPort(config);
            String hostAddress = JobMasterContext.jobMasterIP(config);
            if (hostAddress == null) {
                hostAddress = InetAddress.getLocalHost().getHostAddress();
            }
            LOG.log(Level.INFO, String.format("Starting the job manager: %s:%d", hostAddress, port));
            JobMasterAPI.NodeInfo jobMasterNodeInfo = NomadContext.getNodeInfo(config, hostAddress);
            IScalerPerCluster clusterScaler = new NullScaler();
            JobMasterAPI.JobMasterState initialState = JobMasterAPI.JobMasterState.JM_STARTED;
            NullTerminator nt = new NullTerminator();
            jobMaster = new JobMaster(config, hostAddress, nt, job, jobMasterNodeInfo, clusterScaler, initialState);
            jobMaster.addShutdownHook(true);
            jmThread = jobMaster.startJobMasterThreaded();
        } catch (UnknownHostException e) {
            LOG.log(Level.SEVERE, "Exception when getting local host address: ", e);
            throw new RuntimeException(e);
        } catch (Twister2Exception e) {
            LOG.log(Level.SEVERE, "Exception when starting Job master: ", e);
            throw new RuntimeException(e);
        }
    }
    boolean start = controller.start(job);
    // now lets wait on client
    if (JobMasterContext.jobMasterRunsInClient(config)) {
        try {
            if (jmThread != null) {
                jmThread.join();
            }
        } catch (InterruptedException ignore) {
        }
    }
    return start;
}
Also used : JobMaster(edu.iu.dsc.tws.master.server.JobMaster) Twister2Exception(edu.iu.dsc.tws.api.exceptions.Twister2Exception) IController(edu.iu.dsc.tws.api.scheduler.IController) UnknownHostException(java.net.UnknownHostException) NullScaler(edu.iu.dsc.tws.api.driver.NullScaler) Config(edu.iu.dsc.tws.api.config.Config) IScalerPerCluster(edu.iu.dsc.tws.api.driver.IScalerPerCluster) NomadController(edu.iu.dsc.tws.rsched.schedulers.nomad.NomadController) JobMasterAPI(edu.iu.dsc.tws.proto.jobmaster.JobMasterAPI) NullTerminator(edu.iu.dsc.tws.rsched.schedulers.NullTerminator)

Aggregations

Twister2Exception (edu.iu.dsc.tws.api.exceptions.Twister2Exception)36 Twister2RuntimeException (edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException)24 JobMasterAPI (edu.iu.dsc.tws.proto.jobmaster.JobMasterAPI)14 JobMaster (edu.iu.dsc.tws.master.server.JobMaster)7 NullTerminator (edu.iu.dsc.tws.rsched.schedulers.NullTerminator)5 IScalerPerCluster (edu.iu.dsc.tws.api.driver.IScalerPerCluster)4 NullScaler (edu.iu.dsc.tws.api.driver.NullScaler)4 UnknownHostException (java.net.UnknownHostException)4 Config (edu.iu.dsc.tws.api.config.Config)3 K8sScaler (edu.iu.dsc.tws.rsched.schedulers.k8s.driver.K8sScaler)3 InvalidProtocolBufferException (com.google.protobuf.InvalidProtocolBufferException)2 JobFaultyException (edu.iu.dsc.tws.api.exceptions.JobFaultyException)2 TimeoutException (edu.iu.dsc.tws.api.exceptions.TimeoutException)2 IController (edu.iu.dsc.tws.api.scheduler.IController)2 KubernetesController (edu.iu.dsc.tws.rsched.schedulers.k8s.KubernetesController)2 LinkedList (java.util.LinkedList)2 ChildData (org.apache.curator.framework.recipes.cache.ChildData)2 PathChildrenCache (org.apache.curator.framework.recipes.cache.PathChildrenCache)2 Twister2Job (edu.iu.dsc.tws.api.Twister2Job)1 StateStore (edu.iu.dsc.tws.api.checkpointing.StateStore)1