Search in sources :

Example 41 with Task

use of com.netflix.titus.api.jobmanager.model.job.Task in project titus-control-plane by Netflix.

the class MoveTaskBetweenJobsAction method apply.

@Override
public Observable<Map<String, List<ModelActionHolder>>> apply() {
    return Observable.defer(() -> {
        // Validate data
        Job<ServiceJobExt> jobFrom = engineFrom.getReferenceView().getEntity();
        Job<ServiceJobExt> jobTo = engineTo.getReferenceView().getEntity();
        EntityHolder taskFromReferenceHolder = engineFrom.getReferenceView().findChildById(taskId).orElseThrow(() -> JobManagerException.taskJobMismatch(taskId, jobFrom.getId()));
        if (jobFrom.getStatus().getState() != JobState.Accepted) {
            throw JobManagerException.unexpectedJobState(jobTo, JobState.Accepted);
        }
        Capacity capacityFrom = jobFrom.getJobDescriptor().getExtensions().getCapacity();
        if (capacityFrom.getMin() >= capacityFrom.getDesired()) {
            throw JobManagerException.belowMinCapacity(jobFrom, 1);
        }
        if (jobTo.getStatus().getState() != JobState.Accepted) {
            throw JobManagerException.unexpectedJobState(jobTo, JobState.Accepted);
        }
        Capacity capacityTo = jobTo.getJobDescriptor().getExtensions().getCapacity();
        if (capacityTo.getDesired() >= capacityTo.getMax()) {
            throw JobManagerException.aboveMaxCapacity(jobTo, 1);
        }
        Task taskFromReference = taskFromReferenceHolder.getEntity();
        Optional<EntityHolder> taskFromRunningHolder = engineFrom.getRunningView().findChildById(taskId);
        // Compute new model entities
        // Decrement job size by 1
        Job<ServiceJobExt> updatedJobFrom = nextVersion(JobFunctions.incrementJobSize(jobFrom, -1), versionSupplier);
        Job<ServiceJobExt> updatedJobTo = nextVersion(JobFunctions.incrementJobSize(jobTo, 1), versionSupplier);
        Task updatedReferenceTaskTo = VersionSuppliers.nextVersion(JobFunctions.moveTask(jobFrom.getId(), jobTo.getId(), taskFromReference), versionSupplier);
        // Move the task
        return titusStore.moveTask(updatedJobFrom, updatedJobTo, updatedReferenceTaskTo).andThen(Observable.fromCallable(() -> ImmutableMap.of(jobFrom.getId(), createModelUpdateActionsFrom(updatedJobFrom, updatedJobTo, taskFromReference, callMetadata), jobTo.getId(), createModelUpdateActionsTo(updatedJobFrom, updatedJobTo, updatedReferenceTaskTo, taskFromRunningHolder, callMetadata))));
    });
}
Also used : Task(com.netflix.titus.api.jobmanager.model.job.Task) Capacity(com.netflix.titus.api.jobmanager.model.job.Capacity) ServiceJobExt(com.netflix.titus.api.jobmanager.model.job.ext.ServiceJobExt) EntityHolder(com.netflix.titus.common.framework.reconciler.EntityHolder)

Example 42 with Task

use of com.netflix.titus.api.jobmanager.model.job.Task in project titus-control-plane by Netflix.

the class BatchDifferenceResolver method createNewTaskAction.

private Optional<TitusChangeAction> createNewTaskAction(BatchJobView refJobView, int taskIndex, Optional<EntityHolder> previousTask, List<String> unassignedIpAllocations, List<String> ebsVolumeIds) {
    // Safety check
    long numberOfNotFinishedTasks = refJobView.getJobHolder().getChildren().stream().filter(holder -> TaskState.isRunning(((Task) holder.getEntity()).getStatus().getState())).count();
    if (numberOfNotFinishedTasks >= refJobView.getRequiredSize()) {
        titusRuntime.getCodeInvariants().inconsistent("Batch job reconciler attempts to create too many tasks: jobId=%s, requiredSize=%s, current=%s", refJobView.getJob().getId(), refJobView.getRequiredSize(), numberOfNotFinishedTasks);
        return Optional.empty();
    }
    Map<String, String> taskContext = getTaskContext(previousTask, unassignedIpAllocations, ebsVolumeIds);
    JobDescriptor jobDescriptor = refJobView.getJob().getJobDescriptor();
    ApplicationSLA capacityGroupDescriptor = JobManagerUtil.getCapacityGroupDescriptor(jobDescriptor, capacityGroupService);
    String resourcePool = capacityGroupDescriptor.getResourcePool();
    taskContext = CollectionsExt.copyAndAdd(taskContext, ImmutableMap.of(TaskAttributes.TASK_ATTRIBUTES_RESOURCE_POOL, resourcePool, TaskAttributes.TASK_ATTRIBUTES_TIER, capacityGroupDescriptor.getTier().name()));
    TitusChangeAction storeAction = storeWriteRetryInterceptor.apply(createOrReplaceTaskAction(runtime, jobStore, refJobView.getJobHolder(), taskIndex, versionSupplier, clock, taskContext));
    return Optional.of(storeAction);
}
Also used : JobServiceRuntime(com.netflix.titus.master.jobmanager.service.JobServiceRuntime) TitusChangeAction(com.netflix.titus.master.jobmanager.service.common.action.TitusChangeAction) Task(com.netflix.titus.api.jobmanager.model.job.Task) CollectionsExt(com.netflix.titus.common.util.CollectionsExt) LoggerFactory(org.slf4j.LoggerFactory) RetryActionInterceptor(com.netflix.titus.master.jobmanager.service.common.interceptor.RetryActionInterceptor) RECONCILER_CALLMETADATA(com.netflix.titus.api.jobmanager.service.JobManagerConstants.RECONCILER_CALLMETADATA) FeatureActivationConfiguration(com.netflix.titus.api.FeatureActivationConfiguration) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) Map(java.util.Map) JobState(com.netflix.titus.api.jobmanager.model.job.JobState) BasicJobActions(com.netflix.titus.master.jobmanager.service.common.action.task.BasicJobActions) JobManagerConfiguration(com.netflix.titus.master.jobmanager.service.JobManagerConfiguration) Schedulers(rx.schedulers.Schedulers) JobStore(com.netflix.titus.api.jobmanager.store.JobStore) CallMetadata(com.netflix.titus.api.model.callmetadata.CallMetadata) JobManagerUtil(com.netflix.titus.master.jobmanager.service.JobManagerUtil) TaskRetryers(com.netflix.titus.master.jobmanager.service.common.action.TaskRetryers) Job(com.netflix.titus.api.jobmanager.model.job.Job) ImmutableMap(com.google.common.collect.ImmutableMap) TaskStatus(com.netflix.titus.api.jobmanager.model.job.TaskStatus) Set(java.util.Set) Scheduler(rx.Scheduler) DifferenceResolverUtils.getUnassignedIpAllocations(com.netflix.titus.master.jobmanager.service.common.DifferenceResolverUtils.getUnassignedIpAllocations) TaskState(com.netflix.titus.api.jobmanager.model.job.TaskState) List(java.util.List) VersionSupplier(com.netflix.titus.master.jobmanager.service.VersionSupplier) ReconciliationEngine(com.netflix.titus.common.framework.reconciler.ReconciliationEngine) Optional(java.util.Optional) JobManagerReconcilerEvent(com.netflix.titus.master.jobmanager.service.event.JobManagerReconcilerEvent) Clock(com.netflix.titus.common.util.time.Clock) KillInitiatedActions(com.netflix.titus.master.jobmanager.service.common.action.task.KillInitiatedActions) BatchJobTask(com.netflix.titus.api.jobmanager.model.job.BatchJobTask) ApplicationSlaManagementService(com.netflix.titus.master.service.management.ApplicationSlaManagementService) CreateOrReplaceBatchTaskActions.createOrReplaceTaskAction(com.netflix.titus.master.jobmanager.service.batch.action.CreateOrReplaceBatchTaskActions.createOrReplaceTaskAction) DifferenceResolverUtils(com.netflix.titus.master.jobmanager.service.common.DifferenceResolverUtils) Singleton(javax.inject.Singleton) ArrayList(java.util.ArrayList) HashSet(java.util.HashSet) Inject(javax.inject.Inject) BatchJobExt(com.netflix.titus.api.jobmanager.model.job.ext.BatchJobExt) ChangeAction(com.netflix.titus.common.framework.reconciler.ChangeAction) ApplicationSLA(com.netflix.titus.api.model.ApplicationSLA) DifferenceResolverUtils.getUnassignedEbsVolumes(com.netflix.titus.master.jobmanager.service.common.DifferenceResolverUtils.getUnassignedEbsVolumes) Named(javax.inject.Named) JobDescriptor(com.netflix.titus.api.jobmanager.model.job.JobDescriptor) Logger(org.slf4j.Logger) DifferenceResolverUtils.getTaskContext(com.netflix.titus.master.jobmanager.service.common.DifferenceResolverUtils.getTaskContext) Retryers(com.netflix.titus.common.util.retry.Retryers) EntityHolder(com.netflix.titus.common.framework.reconciler.EntityHolder) TimeUnit(java.util.concurrent.TimeUnit) TaskAttributes(com.netflix.titus.api.jobmanager.TaskAttributes) BasicTaskActions(com.netflix.titus.master.jobmanager.service.common.action.task.BasicTaskActions) TitusRuntime(com.netflix.titus.common.runtime.TitusRuntime) TokenBucket(com.netflix.titus.common.util.limiter.tokenbucket.TokenBucket) Collections(java.util.Collections) Task(com.netflix.titus.api.jobmanager.model.job.Task) BatchJobTask(com.netflix.titus.api.jobmanager.model.job.BatchJobTask) JobDescriptor(com.netflix.titus.api.jobmanager.model.job.JobDescriptor) ApplicationSLA(com.netflix.titus.api.model.ApplicationSLA) TitusChangeAction(com.netflix.titus.master.jobmanager.service.common.action.TitusChangeAction)

Example 43 with Task

use of com.netflix.titus.api.jobmanager.model.job.Task in project titus-control-plane by Netflix.

the class DifferenceResolverUtils method countActiveNotStartedTasks.

public static int countActiveNotStartedTasks(EntityHolder refJobHolder, EntityHolder runningJobHolder) {
    Set<String> pendingTaskIds = new HashSet<>();
    Consumer<EntityHolder> countingFun = jobHolder -> jobHolder.getChildren().forEach(taskHolder -> {
        TaskState state = ((Task) taskHolder.getEntity()).getStatus().getState();
        if (state != TaskState.Started && state != TaskState.Finished) {
            pendingTaskIds.add(taskHolder.getId());
        }
    });
    countingFun.accept(refJobHolder);
    countingFun.accept(runningJobHolder);
    return pendingTaskIds.size();
}
Also used : JobManagerConstants(com.netflix.titus.api.jobmanager.service.JobManagerConstants) JobServiceRuntime(com.netflix.titus.master.jobmanager.service.JobServiceRuntime) Task(com.netflix.titus.api.jobmanager.model.job.Task) HashMap(java.util.HashMap) Function(java.util.function.Function) TaskTimeoutChangeActions(com.netflix.titus.master.jobmanager.service.common.action.task.TaskTimeoutChangeActions) ArrayList(java.util.ArrayList) EbsVolume(com.netflix.titus.api.jobmanager.model.job.ebs.EbsVolume) TASK_ATTRIBUTES_EBS_VOLUME_ID(com.netflix.titus.api.jobmanager.TaskAttributes.TASK_ATTRIBUTES_EBS_VOLUME_ID) HashSet(java.util.HashSet) Map(java.util.Map) JobState(com.netflix.titus.api.jobmanager.model.job.JobState) BatchJobExt(com.netflix.titus.api.jobmanager.model.job.ext.BatchJobExt) ChangeAction(com.netflix.titus.common.framework.reconciler.ChangeAction) JobManagerConfiguration(com.netflix.titus.master.jobmanager.service.JobManagerConfiguration) JobStore(com.netflix.titus.api.jobmanager.store.JobStore) JobDescriptor(com.netflix.titus.api.jobmanager.model.job.JobDescriptor) Job(com.netflix.titus.api.jobmanager.model.job.Job) ServiceJobExt(com.netflix.titus.api.jobmanager.model.job.ext.ServiceJobExt) TaskStatus(com.netflix.titus.api.jobmanager.model.job.TaskStatus) Set(java.util.Set) JobFunctions(com.netflix.titus.api.jobmanager.model.job.JobFunctions) Collectors(java.util.stream.Collectors) TaskState(com.netflix.titus.api.jobmanager.model.job.TaskState) EntityHolder(com.netflix.titus.common.framework.reconciler.EntityHolder) Consumer(java.util.function.Consumer) List(java.util.List) ExecutableStatus(com.netflix.titus.api.jobmanager.model.job.ExecutableStatus) V3JobOperations(com.netflix.titus.api.jobmanager.service.V3JobOperations) VersionSupplier(com.netflix.titus.master.jobmanager.service.VersionSupplier) ReconciliationEngine(com.netflix.titus.common.framework.reconciler.ReconciliationEngine) Optional(java.util.Optional) BasicTaskActions(com.netflix.titus.master.jobmanager.service.common.action.task.BasicTaskActions) JobManagerReconcilerEvent(com.netflix.titus.master.jobmanager.service.event.JobManagerReconcilerEvent) TitusRuntime(com.netflix.titus.common.runtime.TitusRuntime) TokenBucket(com.netflix.titus.common.util.limiter.tokenbucket.TokenBucket) Clock(com.netflix.titus.common.util.time.Clock) KillInitiatedActions(com.netflix.titus.master.jobmanager.service.common.action.task.KillInitiatedActions) TASK_ATTRIBUTES_IP_ALLOCATION_ID(com.netflix.titus.api.jobmanager.TaskAttributes.TASK_ATTRIBUTES_IP_ALLOCATION_ID) EntityHolder(com.netflix.titus.common.framework.reconciler.EntityHolder) TaskState(com.netflix.titus.api.jobmanager.model.job.TaskState) HashSet(java.util.HashSet)

Example 44 with Task

use of com.netflix.titus.api.jobmanager.model.job.Task in project titus-control-plane by Netflix.

the class DefaultDirectKubeApiServerIntegrator method launchTask.

@Override
public Mono<Void> launchTask(Job job, Task task) {
    return Mono.fromCallable(() -> {
        try {
            V1Pod v1Pod = podFactory.buildV1Pod(job, task);
            logger.info("creating pod: {}", formatPodEssentials(v1Pod));
            logger.debug("complete pod data: {}", v1Pod);
            return v1Pod;
        } catch (Exception e) {
            logger.error("Unable to convert job {} and task {} to pod: {}", job, task, KubeUtil.toErrorDetails(e), e);
            throw new IllegalStateException("Unable to convert task to pod " + task.getId(), e);
        }
    }).flatMap(v1Pod -> launchPod(task, v1Pod)).subscribeOn(apiClientScheduler).timeout(Duration.ofMillis(configuration.getKubeApiClientTimeoutMs())).doOnError(TimeoutException.class, e -> metrics.launchTimeout(configuration.getKubeApiClientTimeoutMs())).ignoreElement().cast(Void.class);
}
Also used : Stopwatch(com.google.common.base.Stopwatch) Task(com.netflix.titus.api.jobmanager.model.job.Task) PodDeletedEvent(com.netflix.titus.master.kubernetes.client.model.PodDeletedEvent) FluxSink(reactor.core.publisher.FluxSink) StdKubeApiFacade(com.netflix.titus.runtime.connector.kubernetes.std.StdKubeApiFacade) LoggerFactory(org.slf4j.LoggerFactory) TimeoutException(java.util.concurrent.TimeoutException) HashMap(java.util.HashMap) StringExt(com.netflix.titus.common.util.StringExt) V1Node(io.kubernetes.client.openapi.models.V1Node) Singleton(javax.inject.Singleton) ReactorExt(com.netflix.titus.common.util.rx.ReactorExt) KubeUtil(com.netflix.titus.master.kubernetes.KubeUtil) Scheduler(reactor.core.scheduler.Scheduler) ConcurrentMap(java.util.concurrent.ConcurrentMap) Inject(javax.inject.Inject) PreDestroy(javax.annotation.PreDestroy) KubeObjectFormatter.formatPodEssentials(com.netflix.titus.master.kubernetes.KubeObjectFormatter.formatPodEssentials) Duration(java.time.Duration) Map(java.util.Map) ResourceEventHandler(io.kubernetes.client.informer.ResourceEventHandler) Schedulers(reactor.core.scheduler.Schedulers) ExecutorService(java.util.concurrent.ExecutorService) FitFramework(com.netflix.titus.common.framework.fit.FitFramework) PodEvent(com.netflix.titus.master.kubernetes.client.model.PodEvent) ExecutorsExt(com.netflix.titus.common.util.ExecutorsExt) Job(com.netflix.titus.api.jobmanager.model.job.Job) Logger(org.slf4j.Logger) DirectProcessor(reactor.core.publisher.DirectProcessor) PodFactory(com.netflix.titus.master.kubernetes.pod.PodFactory) JsonSyntaxException(com.google.gson.JsonSyntaxException) FitInjection(com.netflix.titus.common.framework.fit.FitInjection) PodUpdatedEvent(com.netflix.titus.master.kubernetes.client.model.PodUpdatedEvent) ConcurrentHashMap(java.util.concurrent.ConcurrentHashMap) TaskStatus(com.netflix.titus.api.jobmanager.model.job.TaskStatus) Mono(reactor.core.publisher.Mono) TaskState(com.netflix.titus.api.jobmanager.model.job.TaskState) TimeUnit(java.util.concurrent.TimeUnit) Flux(reactor.core.publisher.Flux) KubeApiException(com.netflix.titus.runtime.connector.kubernetes.KubeApiException) Optional(java.util.Optional) TitusRuntime(com.netflix.titus.common.runtime.TitusRuntime) V1Pod(io.kubernetes.client.openapi.models.V1Pod) V1Pod(io.kubernetes.client.openapi.models.V1Pod) TimeoutException(java.util.concurrent.TimeoutException) JsonSyntaxException(com.google.gson.JsonSyntaxException) KubeApiException(com.netflix.titus.runtime.connector.kubernetes.KubeApiException)

Example 45 with Task

use of com.netflix.titus.api.jobmanager.model.job.Task in project titus-control-plane by Netflix.

the class BasicTaskActions method updateTaskAndWriteItToStore.

/**
 * Update a task, and write it to store before updating reference and store models.
 * This action is used when handling user initiated updates.
 */
public static TitusChangeAction updateTaskAndWriteItToStore(String taskId, ReconciliationEngine<JobManagerReconcilerEvent> engine, Function<Task, Task> changeFunction, JobStore jobStore, Trigger trigger, String reason, VersionSupplier versionSupplier, TitusRuntime titusRuntime, CallMetadata callMetadata) {
    return TitusChangeAction.newAction("updateTaskAndWriteItToStore").id(taskId).trigger(trigger).summary(reason).callMetadata(callMetadata).changeWithModelUpdates(self -> JobEntityHolders.expectTask(engine, taskId, titusRuntime).map(task -> {
        Task newTask = VersionSuppliers.nextVersion(changeFunction.apply(task), versionSupplier);
        TitusModelAction modelUpdate = TitusModelAction.newModelUpdate(self).taskUpdate(newTask);
        return jobStore.updateTask(newTask).andThen(Observable.just(ModelActionHolder.referenceAndStore(modelUpdate)));
    }).orElseGet(() -> Observable.error(JobManagerException.taskNotFound(taskId))));
}
Also used : TitusModelAction(com.netflix.titus.master.jobmanager.service.common.action.TitusModelAction) Task(com.netflix.titus.api.jobmanager.model.job.Task)

Aggregations

Task (com.netflix.titus.api.jobmanager.model.job.Task)222 Test (org.junit.Test)98 ArrayList (java.util.ArrayList)63 List (java.util.List)62 Job (com.netflix.titus.api.jobmanager.model.job.Job)58 BatchJobTask (com.netflix.titus.api.jobmanager.model.job.BatchJobTask)45 TaskStatus (com.netflix.titus.api.jobmanager.model.job.TaskStatus)45 TaskState (com.netflix.titus.api.jobmanager.model.job.TaskState)42 TitusRuntime (com.netflix.titus.common.runtime.TitusRuntime)38 BatchJobExt (com.netflix.titus.api.jobmanager.model.job.ext.BatchJobExt)34 Pair (com.netflix.titus.common.util.tuple.Pair)32 V1Pod (io.kubernetes.client.openapi.models.V1Pod)32 V3JobOperations (com.netflix.titus.api.jobmanager.service.V3JobOperations)31 ServiceJobTask (com.netflix.titus.api.jobmanager.model.job.ServiceJobTask)29 Optional (java.util.Optional)27 Collections (java.util.Collections)26 Collectors (java.util.stream.Collectors)25 CallMetadata (com.netflix.titus.api.model.callmetadata.CallMetadata)24 HashMap (java.util.HashMap)24 TaskUpdateEvent (com.netflix.titus.api.jobmanager.model.job.event.TaskUpdateEvent)23