Search in sources :

Example 11 with JobStore

use of com.netflix.titus.api.jobmanager.store.JobStore in project titus-control-plane by Netflix.

the class CassandraJobStoreTest method testDeleteJob.

@Test
public void testDeleteJob() {
    JobStore store = getJobStore();
    Job<BatchJobExt> job = createBatchJobObject();
    store.init().await();
    store.storeJob(job).await();
    Pair<List<Job<?>>, Integer> jobsAndErrors = store.retrieveJobs().toBlocking().first();
    checkRetrievedJob(job, jobsAndErrors.getLeft().get(0));
    store.deleteJob(job).await();
    jobsAndErrors = store.retrieveJobs().toBlocking().first();
    assertThat(jobsAndErrors.getLeft()).isEmpty();
}
Also used : BatchJobExt(com.netflix.titus.api.jobmanager.model.job.ext.BatchJobExt) JobStore(com.netflix.titus.api.jobmanager.store.JobStore) ArrayList(java.util.ArrayList) List(java.util.List) Test(org.junit.Test) IntegrationNotParallelizableTest(com.netflix.titus.testkit.junit.category.IntegrationNotParallelizableTest)

Example 12 with JobStore

use of com.netflix.titus.api.jobmanager.store.JobStore in project titus-control-plane by Netflix.

the class TestStoreLoadCommand method execute.

@Override
public void execute(CommandContext commandContext) {
    CommandLine commandLine = commandContext.getCommandLine();
    String keyspace = commandContext.getTargetKeySpace();
    Integer jobs = Integer.valueOf(commandLine.getOptionValue("jobs"));
    Integer tasks = Integer.valueOf(commandLine.getOptionValue("tasks"));
    Integer concurrency = Integer.valueOf(commandLine.getOptionValue("concurrency"));
    Integer iterations = Integer.valueOf(commandLine.getOptionValue("iterations"));
    Session session = commandContext.getTargetSession();
    boolean keyspaceExists = session.getCluster().getMetadata().getKeyspace(keyspace) != null;
    if (!keyspaceExists) {
        throw new IllegalStateException("Keyspace: " + keyspace + " does not exist. You must create it first.");
    }
    session.execute("USE " + keyspace);
    JobStore titusStore = new CassandraJobStore(CONFIGURATION, session, TitusRuntimes.internal());
    // Create jobs and tasks
    long jobStartTime = System.currentTimeMillis();
    List<Observable<Void>> createJobAndTasksObservables = new ArrayList<>();
    for (int i = 0; i < jobs; i++) {
        createJobAndTasksObservables.add(createJobAndTasksObservable(tasks, titusStore));
    }
    Observable.merge(createJobAndTasksObservables, concurrency).toBlocking().subscribe(none -> {
    }, e -> logger.error("Error creating jobs: ", e), () -> {
        logger.info("Created {} jobs with {} tasks in {}[ms]", jobs, tasks, System.currentTimeMillis() - jobStartTime);
    });
    // try loading jobs and tasks for i iterations
    long loadTotalTime = 0L;
    for (int i = 0; i < iterations; i++) {
        long loadStartTime = System.currentTimeMillis();
        List<Pair<Job, List<Task>>> pairs = new ArrayList<>();
        titusStore.init().andThen(titusStore.retrieveJobs().flatMap(retrievedJobsAndErrors -> {
            List<Job<?>> retrievedJobs = retrievedJobsAndErrors.getLeft();
            List<Observable<Pair<Job, List<Task>>>> retrieveTasksObservables = new ArrayList<>();
            for (Job job : retrievedJobs) {
                Observable<Pair<Job, List<Task>>> retrieveTasksObservable = titusStore.retrieveTasksForJob(job.getId()).map(taskList -> new Pair<>(job, taskList.getLeft()));
                retrieveTasksObservables.add(retrieveTasksObservable);
            }
            return Observable.merge(retrieveTasksObservables, MAX_RETRIEVE_TASK_CONCURRENCY);
        })).map(p -> {
            pairs.add(p);
            return null;
        }).toBlocking().subscribe(none -> {
        }, e -> logger.error("Failed to load jobs from cassandra with error: ", e), () -> {
        });
        long loadTime = System.currentTimeMillis() - loadStartTime;
        logger.info("Loaded {} jobs from cassandra in {}[ms]", pairs.size(), loadTime);
        loadTotalTime += loadTime;
    }
    logger.info("Average load time: {}[ms]", loadTotalTime / iterations);
}
Also used : CassandraJobStore(com.netflix.titus.ext.cassandra.store.CassandraJobStore) BatchJobTask(com.netflix.titus.api.jobmanager.model.job.BatchJobTask) Task(com.netflix.titus.api.jobmanager.model.job.Task) ArrayList(java.util.ArrayList) JobStore(com.netflix.titus.api.jobmanager.store.JobStore) CassandraJobStore(com.netflix.titus.ext.cassandra.store.CassandraJobStore) Observable(rx.Observable) CommandLine(org.apache.commons.cli.CommandLine) ArrayList(java.util.ArrayList) List(java.util.List) Job(com.netflix.titus.api.jobmanager.model.job.Job) Session(com.datastax.driver.core.Session) Pair(com.netflix.titus.common.util.tuple.Pair)

Example 13 with JobStore

use of com.netflix.titus.api.jobmanager.store.JobStore in project titus-control-plane by Netflix.

the class BatchDifferenceResolver method createNewTaskAction.

private Optional<TitusChangeAction> createNewTaskAction(BatchJobView refJobView, int taskIndex, Optional<EntityHolder> previousTask, List<String> unassignedIpAllocations, List<String> ebsVolumeIds) {
    // Safety check
    long numberOfNotFinishedTasks = refJobView.getJobHolder().getChildren().stream().filter(holder -> TaskState.isRunning(((Task) holder.getEntity()).getStatus().getState())).count();
    if (numberOfNotFinishedTasks >= refJobView.getRequiredSize()) {
        titusRuntime.getCodeInvariants().inconsistent("Batch job reconciler attempts to create too many tasks: jobId=%s, requiredSize=%s, current=%s", refJobView.getJob().getId(), refJobView.getRequiredSize(), numberOfNotFinishedTasks);
        return Optional.empty();
    }
    Map<String, String> taskContext = getTaskContext(previousTask, unassignedIpAllocations, ebsVolumeIds);
    JobDescriptor jobDescriptor = refJobView.getJob().getJobDescriptor();
    ApplicationSLA capacityGroupDescriptor = JobManagerUtil.getCapacityGroupDescriptor(jobDescriptor, capacityGroupService);
    String resourcePool = capacityGroupDescriptor.getResourcePool();
    taskContext = CollectionsExt.copyAndAdd(taskContext, ImmutableMap.of(TaskAttributes.TASK_ATTRIBUTES_RESOURCE_POOL, resourcePool, TaskAttributes.TASK_ATTRIBUTES_TIER, capacityGroupDescriptor.getTier().name()));
    TitusChangeAction storeAction = storeWriteRetryInterceptor.apply(createOrReplaceTaskAction(runtime, jobStore, refJobView.getJobHolder(), taskIndex, versionSupplier, clock, taskContext));
    return Optional.of(storeAction);
}
Also used : JobServiceRuntime(com.netflix.titus.master.jobmanager.service.JobServiceRuntime) TitusChangeAction(com.netflix.titus.master.jobmanager.service.common.action.TitusChangeAction) Task(com.netflix.titus.api.jobmanager.model.job.Task) CollectionsExt(com.netflix.titus.common.util.CollectionsExt) LoggerFactory(org.slf4j.LoggerFactory) RetryActionInterceptor(com.netflix.titus.master.jobmanager.service.common.interceptor.RetryActionInterceptor) RECONCILER_CALLMETADATA(com.netflix.titus.api.jobmanager.service.JobManagerConstants.RECONCILER_CALLMETADATA) FeatureActivationConfiguration(com.netflix.titus.api.FeatureActivationConfiguration) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) Map(java.util.Map) JobState(com.netflix.titus.api.jobmanager.model.job.JobState) BasicJobActions(com.netflix.titus.master.jobmanager.service.common.action.task.BasicJobActions) JobManagerConfiguration(com.netflix.titus.master.jobmanager.service.JobManagerConfiguration) Schedulers(rx.schedulers.Schedulers) JobStore(com.netflix.titus.api.jobmanager.store.JobStore) CallMetadata(com.netflix.titus.api.model.callmetadata.CallMetadata) JobManagerUtil(com.netflix.titus.master.jobmanager.service.JobManagerUtil) TaskRetryers(com.netflix.titus.master.jobmanager.service.common.action.TaskRetryers) Job(com.netflix.titus.api.jobmanager.model.job.Job) ImmutableMap(com.google.common.collect.ImmutableMap) TaskStatus(com.netflix.titus.api.jobmanager.model.job.TaskStatus) Set(java.util.Set) Scheduler(rx.Scheduler) DifferenceResolverUtils.getUnassignedIpAllocations(com.netflix.titus.master.jobmanager.service.common.DifferenceResolverUtils.getUnassignedIpAllocations) TaskState(com.netflix.titus.api.jobmanager.model.job.TaskState) List(java.util.List) VersionSupplier(com.netflix.titus.master.jobmanager.service.VersionSupplier) ReconciliationEngine(com.netflix.titus.common.framework.reconciler.ReconciliationEngine) Optional(java.util.Optional) JobManagerReconcilerEvent(com.netflix.titus.master.jobmanager.service.event.JobManagerReconcilerEvent) Clock(com.netflix.titus.common.util.time.Clock) KillInitiatedActions(com.netflix.titus.master.jobmanager.service.common.action.task.KillInitiatedActions) BatchJobTask(com.netflix.titus.api.jobmanager.model.job.BatchJobTask) ApplicationSlaManagementService(com.netflix.titus.master.service.management.ApplicationSlaManagementService) CreateOrReplaceBatchTaskActions.createOrReplaceTaskAction(com.netflix.titus.master.jobmanager.service.batch.action.CreateOrReplaceBatchTaskActions.createOrReplaceTaskAction) DifferenceResolverUtils(com.netflix.titus.master.jobmanager.service.common.DifferenceResolverUtils) Singleton(javax.inject.Singleton) ArrayList(java.util.ArrayList) HashSet(java.util.HashSet) Inject(javax.inject.Inject) BatchJobExt(com.netflix.titus.api.jobmanager.model.job.ext.BatchJobExt) ChangeAction(com.netflix.titus.common.framework.reconciler.ChangeAction) ApplicationSLA(com.netflix.titus.api.model.ApplicationSLA) DifferenceResolverUtils.getUnassignedEbsVolumes(com.netflix.titus.master.jobmanager.service.common.DifferenceResolverUtils.getUnassignedEbsVolumes) Named(javax.inject.Named) JobDescriptor(com.netflix.titus.api.jobmanager.model.job.JobDescriptor) Logger(org.slf4j.Logger) DifferenceResolverUtils.getTaskContext(com.netflix.titus.master.jobmanager.service.common.DifferenceResolverUtils.getTaskContext) Retryers(com.netflix.titus.common.util.retry.Retryers) EntityHolder(com.netflix.titus.common.framework.reconciler.EntityHolder) TimeUnit(java.util.concurrent.TimeUnit) TaskAttributes(com.netflix.titus.api.jobmanager.TaskAttributes) BasicTaskActions(com.netflix.titus.master.jobmanager.service.common.action.task.BasicTaskActions) TitusRuntime(com.netflix.titus.common.runtime.TitusRuntime) TokenBucket(com.netflix.titus.common.util.limiter.tokenbucket.TokenBucket) Collections(java.util.Collections) Task(com.netflix.titus.api.jobmanager.model.job.Task) BatchJobTask(com.netflix.titus.api.jobmanager.model.job.BatchJobTask) JobDescriptor(com.netflix.titus.api.jobmanager.model.job.JobDescriptor) ApplicationSLA(com.netflix.titus.api.model.ApplicationSLA) TitusChangeAction(com.netflix.titus.master.jobmanager.service.common.action.TitusChangeAction)

Example 14 with JobStore

use of com.netflix.titus.api.jobmanager.store.JobStore in project titus-control-plane by Netflix.

the class BasicTaskActions method writeReferenceTaskToStore.

/**
 * Write updated task record to a store. If a task is completed, remove it from the scheduling service.
 * This command calls {@link JobStore#updateTask(Task)}, which assumes that the task record was created already.
 */
public static TitusChangeAction writeReferenceTaskToStore(JobStore titusStore, ReconciliationEngine<JobManagerReconcilerEvent> engine, String taskId, CallMetadata callMetadata, TitusRuntime titusRuntime) {
    return TitusChangeAction.newAction("writeReferenceTaskToStore").trigger(V3JobOperations.Trigger.Reconciler).id(taskId).summary("Persisting task to the store").callMetadata(callMetadata).changeWithModelUpdate(self -> {
        Optional<EntityHolder> taskHolder = engine.getReferenceView().findById(taskId);
        if (!taskHolder.isPresent()) {
            // Should never happen
            titusRuntime.getCodeInvariants().inconsistent("Reference task with id %s not found.", taskId);
            return Observable.empty();
        }
        Task referenceTask = taskHolder.get().getEntity();
        return titusStore.updateTask(referenceTask).andThen(Observable.fromCallable(() -> {
            TitusModelAction modelUpdateAction = TitusModelAction.newModelUpdate(self).taskUpdate(storeRoot -> {
                EntityHolder storedHolder = EntityHolder.newRoot(referenceTask.getId(), referenceTask);
                return Pair.of(storeRoot.addChild(storedHolder), storedHolder);
            });
            return ModelActionHolder.store(modelUpdateAction);
        }));
    });
}
Also used : Trigger(com.netflix.titus.api.jobmanager.service.V3JobOperations.Trigger) DateTimeExt(com.netflix.titus.common.util.DateTimeExt) JobModel(com.netflix.titus.api.jobmanager.model.job.JobModel) JobServiceRuntime(com.netflix.titus.master.jobmanager.service.JobServiceRuntime) TitusChangeAction(com.netflix.titus.master.jobmanager.service.common.action.TitusChangeAction) Task(com.netflix.titus.api.jobmanager.model.job.Task) CollectionsExt(com.netflix.titus.common.util.CollectionsExt) ReactorExt(com.netflix.titus.common.util.rx.ReactorExt) Function(java.util.function.Function) ArrayList(java.util.ArrayList) Observable(rx.Observable) Pair(com.netflix.titus.common.util.tuple.Pair) JobManagerConfiguration(com.netflix.titus.master.jobmanager.service.JobManagerConfiguration) JobManagerException(com.netflix.titus.api.jobmanager.service.JobManagerException) ExceptionExt(com.netflix.titus.common.util.ExceptionExt) JobEntityHolders(com.netflix.titus.master.jobmanager.service.common.action.JobEntityHolders) JobStore(com.netflix.titus.api.jobmanager.store.JobStore) CallMetadata(com.netflix.titus.api.model.callmetadata.CallMetadata) TaskRetryers(com.netflix.titus.master.jobmanager.service.common.action.TaskRetryers) Job(com.netflix.titus.api.jobmanager.model.job.Job) TaskStatus(com.netflix.titus.api.jobmanager.model.job.TaskStatus) JobFunctions(com.netflix.titus.api.jobmanager.model.job.JobFunctions) TitusModelAction(com.netflix.titus.master.jobmanager.service.common.action.TitusModelAction) TaskState(com.netflix.titus.api.jobmanager.model.job.TaskState) EntityHolder(com.netflix.titus.common.framework.reconciler.EntityHolder) ModelActionHolder(com.netflix.titus.common.framework.reconciler.ModelActionHolder) List(java.util.List) V3JobOperations(com.netflix.titus.api.jobmanager.service.V3JobOperations) VersionSupplier(com.netflix.titus.master.jobmanager.service.VersionSupplier) ReconciliationEngine(com.netflix.titus.common.framework.reconciler.ReconciliationEngine) VersionSuppliers(com.netflix.titus.master.jobmanager.service.VersionSuppliers) TaskAttributes(com.netflix.titus.api.jobmanager.TaskAttributes) Optional(java.util.Optional) JobManagerReconcilerEvent(com.netflix.titus.master.jobmanager.service.event.JobManagerReconcilerEvent) TitusRuntime(com.netflix.titus.common.runtime.TitusRuntime) Collections(java.util.Collections) TitusModelAction(com.netflix.titus.master.jobmanager.service.common.action.TitusModelAction) Task(com.netflix.titus.api.jobmanager.model.job.Task) EntityHolder(com.netflix.titus.common.framework.reconciler.EntityHolder)

Example 15 with JobStore

use of com.netflix.titus.api.jobmanager.store.JobStore in project titus-control-plane by Netflix.

the class KillInitiatedActions method reconcilerInitiatedAllTasksKillInitiated.

/**
 * For all active tasks, send terminate command to the compute provider, and change their state to {@link TaskState#KillInitiated}.
 * This method is used for internal state reconciliation.
 */
public static List<ChangeAction> reconcilerInitiatedAllTasksKillInitiated(ReconciliationEngine<JobManagerReconcilerEvent> engine, JobServiceRuntime runtime, JobStore jobStore, String reasonCode, String reason, int concurrencyLimit, VersionSupplier versionSupplier, TitusRuntime titusRuntime) {
    List<ChangeAction> result = new ArrayList<>();
    EntityHolder runningView = engine.getRunningView();
    Set<String> runningTaskIds = new HashSet<>();
    runningView.getChildren().forEach(taskHolder -> runningTaskIds.add(taskHolder.<Task>getEntity().getId()));
    // Immediately finish Accepted tasks, which are not yet in the running model.
    for (EntityHolder entityHolder : engine.getReferenceView().getChildren()) {
        if (result.size() >= concurrencyLimit) {
            return result;
        }
        Task task = entityHolder.getEntity();
        TaskState state = task.getStatus().getState();
        if (state == TaskState.Accepted && !runningTaskIds.contains(task.getId())) {
            result.add(BasicTaskActions.updateTaskAndWriteItToStore(task.getId(), engine, taskRef -> JobFunctions.changeTaskStatus(taskRef, TaskState.Finished, reasonCode, reason, titusRuntime.getClock()), jobStore, V3JobOperations.Trigger.Reconciler, reason, versionSupplier, titusRuntime, JobManagerConstants.RECONCILER_CALLMETADATA.toBuilder().withCallReason(reason).build()));
        }
    }
    // Move running tasks to KillInitiated state
    for (EntityHolder taskHolder : runningView.getChildren()) {
        if (result.size() >= concurrencyLimit) {
            return result;
        }
        Task task = taskHolder.getEntity();
        TaskState state = task.getStatus().getState();
        if (state != TaskState.KillInitiated && state != TaskState.Finished) {
            result.add(reconcilerInitiatedTaskKillInitiated(engine, task, runtime, jobStore, versionSupplier, reasonCode, reason, titusRuntime));
        }
    }
    return result;
}
Also used : Completable(rx.Completable) JobManagerConstants(com.netflix.titus.api.jobmanager.service.JobManagerConstants) JobServiceRuntime(com.netflix.titus.master.jobmanager.service.JobServiceRuntime) TitusChangeAction(com.netflix.titus.master.jobmanager.service.common.action.TitusChangeAction) Task(com.netflix.titus.api.jobmanager.model.job.Task) Callable(java.util.concurrent.Callable) ReactorExt(com.netflix.titus.common.util.rx.ReactorExt) ArrayList(java.util.ArrayList) Observable(rx.Observable) HashSet(java.util.HashSet) JobStatus(com.netflix.titus.api.jobmanager.model.job.JobStatus) JobState(com.netflix.titus.api.jobmanager.model.job.JobState) ChangeAction(com.netflix.titus.common.framework.reconciler.ChangeAction) JobManagerException(com.netflix.titus.api.jobmanager.service.JobManagerException) JobEntityHolders(com.netflix.titus.master.jobmanager.service.common.action.JobEntityHolders) JobStore(com.netflix.titus.api.jobmanager.store.JobStore) CallMetadata(com.netflix.titus.api.model.callmetadata.CallMetadata) Job(com.netflix.titus.api.jobmanager.model.job.Job) ServiceJobExt(com.netflix.titus.api.jobmanager.model.job.ext.ServiceJobExt) TaskStatus(com.netflix.titus.api.jobmanager.model.job.TaskStatus) Set(java.util.Set) JobFunctions(com.netflix.titus.api.jobmanager.model.job.JobFunctions) TitusModelAction(com.netflix.titus.master.jobmanager.service.common.action.TitusModelAction) TaskState(com.netflix.titus.api.jobmanager.model.job.TaskState) Capacity(com.netflix.titus.api.jobmanager.model.job.Capacity) EntityHolder(com.netflix.titus.common.framework.reconciler.EntityHolder) ModelActionHolder(com.netflix.titus.common.framework.reconciler.ModelActionHolder) List(java.util.List) V3JobOperations(com.netflix.titus.api.jobmanager.service.V3JobOperations) VersionSupplier(com.netflix.titus.master.jobmanager.service.VersionSupplier) ReconciliationEngine(com.netflix.titus.common.framework.reconciler.ReconciliationEngine) VersionSuppliers(com.netflix.titus.master.jobmanager.service.VersionSuppliers) Optional(java.util.Optional) JobManagerReconcilerEvent(com.netflix.titus.master.jobmanager.service.event.JobManagerReconcilerEvent) TitusRuntime(com.netflix.titus.common.runtime.TitusRuntime) Collections(java.util.Collections) Task(com.netflix.titus.api.jobmanager.model.job.Task) TitusChangeAction(com.netflix.titus.master.jobmanager.service.common.action.TitusChangeAction) ChangeAction(com.netflix.titus.common.framework.reconciler.ChangeAction) ArrayList(java.util.ArrayList) EntityHolder(com.netflix.titus.common.framework.reconciler.EntityHolder) TaskState(com.netflix.titus.api.jobmanager.model.job.TaskState) HashSet(java.util.HashSet)

Aggregations

JobStore (com.netflix.titus.api.jobmanager.store.JobStore)25 List (java.util.List)24 ArrayList (java.util.ArrayList)23 BatchJobExt (com.netflix.titus.api.jobmanager.model.job.ext.BatchJobExt)18 Task (com.netflix.titus.api.jobmanager.model.job.Task)17 IntegrationNotParallelizableTest (com.netflix.titus.testkit.junit.category.IntegrationNotParallelizableTest)14 Test (org.junit.Test)14 BatchJobTask (com.netflix.titus.api.jobmanager.model.job.BatchJobTask)13 ServiceJobTask (com.netflix.titus.api.jobmanager.model.job.ServiceJobTask)11 Job (com.netflix.titus.api.jobmanager.model.job.Job)8 TitusRuntime (com.netflix.titus.common.runtime.TitusRuntime)6 TaskState (com.netflix.titus.api.jobmanager.model.job.TaskState)5 TaskStatus (com.netflix.titus.api.jobmanager.model.job.TaskStatus)5 EntityHolder (com.netflix.titus.common.framework.reconciler.EntityHolder)5 ReconciliationEngine (com.netflix.titus.common.framework.reconciler.ReconciliationEngine)5 JobServiceRuntime (com.netflix.titus.master.jobmanager.service.JobServiceRuntime)5 VersionSupplier (com.netflix.titus.master.jobmanager.service.VersionSupplier)5 JobFunctions (com.netflix.titus.api.jobmanager.model.job.JobFunctions)4 JobState (com.netflix.titus.api.jobmanager.model.job.JobState)4 ServiceJobExt (com.netflix.titus.api.jobmanager.model.job.ext.ServiceJobExt)4