Search in sources :

Example 1 with TaskRelocationPlan

use of com.netflix.titus.api.relocation.model.TaskRelocationPlan in project titus-control-plane by Netflix.

the class JooqTaskRelocationStoreTest method testRelocationPlanStoreCrud.

@Test
public void testRelocationPlanStoreCrud() {
    List<TaskRelocationPlan> plans = newRelocationPlans(1);
    TaskRelocationPlan plan = plans.get(0);
    // Create
    Map<String, Optional<Throwable>> result = store.createOrUpdateTaskRelocationPlans(plans).block();
    assertThat(result).hasSize(1);
    assertThat(result.get(plan.getTaskId())).isEmpty();
    // Reboot (to force reload from the database).
    this.store = newStore();
    // Read
    assertThat(store.getAllTaskRelocationPlans().block()).hasSize(1);
    assertThat(store.getAllTaskRelocationPlans().block().get(plan.getTaskId())).isEqualTo(plan);
    // Update
    TaskRelocationPlan updatedPlan = plan.toBuilder().withReasonMessage("Updated...").build();
    Map<String, Optional<Throwable>> updatedPlanResult = store.createOrUpdateTaskRelocationPlans(Collections.singletonList(updatedPlan)).block();
    assertThat(updatedPlanResult).hasSize(1);
    assertThat(store.getAllTaskRelocationPlans().block().get(plan.getTaskId())).isEqualTo(updatedPlan);
    // Delete
    Map<String, Optional<Throwable>> deleteResult = store.removeTaskRelocationPlans(Collections.singleton(plan.getTaskId())).block();
    assertThat(deleteResult).hasSize(1);
    // Reboot
    this.store = newStore();
    assertThat(store.getAllTaskRelocationPlans().block()).hasSize(0);
}
Also used : Optional(java.util.Optional) TaskRelocationPlan(com.netflix.titus.api.relocation.model.TaskRelocationPlan) Test(org.junit.Test) SpringBootTest(org.springframework.boot.test.context.SpringBootTest)

Example 2 with TaskRelocationPlan

use of com.netflix.titus.api.relocation.model.TaskRelocationPlan in project titus-control-plane by Netflix.

the class DefaultDeschedulerServiceTest method verifyRelocationPlan.

private void verifyRelocationPlan(long relocationDelay, String reasonMessage) {
    ReadOnlyJobOperations jobOperations = mock(ReadOnlyJobOperations.class);
    DefaultDeschedulerService dds = new DefaultDeschedulerService(jobOperations, mock(ReadOnlyEvictionOperations.class), new KubernetesNodeDataResolver(configuration, TestDataFactory.mockFabric8IOConnector(), node -> true), () -> "foo|bar", titusRuntime);
    Job<ServiceJobExt> job = JobGenerator.serviceJobs(oneTaskServiceJobDescriptor().but(ofServiceSize(2), withDisruptionBudget(budget(selfManagedPolicy(relocationDelay), unlimitedRate(), Collections.emptyList())))).getValue();
    ServiceJobTask task = JobGenerator.serviceTasks(job).getValue();
    when(jobOperations.getJob(job.getId())).thenReturn(Optional.of(job));
    TitusNode node = TitusNode.newBuilder().withId("node1").withServerGroupId("asg1").withRelocationRequired(true).withBadCondition(false).build();
    // Advance test clock
    long clockAdvancedMs = 5_000;
    TestClock testClock = (TestClock) titusRuntime.getClock();
    testClock.advanceTime(Duration.ofMillis(clockAdvancedMs));
    Optional<TaskRelocationPlan> relocationPlanForTask = dds.getRelocationPlanForTask(node, task, Collections.emptyMap());
    assertThat(relocationPlanForTask).isPresent();
    assertThat(relocationPlanForTask.get().getTaskId()).isEqualTo(task.getId());
    // relocation time is expected to be decision clock time + retentionTimeMs
    assertThat(relocationPlanForTask.get().getRelocationTime()).isEqualTo(relocationDelay + clockAdvancedMs);
    assertThat(relocationPlanForTask.get().getDecisionTime()).isEqualTo(clockAdvancedMs);
    assertThat(relocationPlanForTask.get().getReasonMessage()).isEqualTo(reasonMessage);
}
Also used : TestDataFactory(com.netflix.titus.supplementary.relocation.TestDataFactory) Archaius2Ext(com.netflix.titus.common.util.archaius2.Archaius2Ext) DisruptionBudgetGenerator.unlimitedRate(com.netflix.titus.testkit.model.eviction.DisruptionBudgetGenerator.unlimitedRate) RelocationAttributes(com.netflix.titus.runtime.RelocationAttributes) Task(com.netflix.titus.api.jobmanager.model.job.Task) DisruptionBudgetGenerator.budget(com.netflix.titus.testkit.model.eviction.DisruptionBudgetGenerator.budget) Assertions.assertThat(org.assertj.core.api.Assertions.assertThat) KubernetesNodeDataResolver(com.netflix.titus.supplementary.relocation.connector.KubernetesNodeDataResolver) ServiceJobTask(com.netflix.titus.api.jobmanager.model.job.ServiceJobTask) TitusRuntimes(com.netflix.titus.common.runtime.TitusRuntimes) RelocationConfiguration(com.netflix.titus.supplementary.relocation.RelocationConfiguration) Duration(java.time.Duration) RelocationConnectorStubs(com.netflix.titus.supplementary.relocation.RelocationConnectorStubs) JobFunctions.ofServiceSize(com.netflix.titus.api.jobmanager.model.job.JobFunctions.ofServiceSize) ReadOnlyJobOperations(com.netflix.titus.api.jobmanager.service.ReadOnlyJobOperations) TaskRelocationReason(com.netflix.titus.api.relocation.model.TaskRelocationPlan.TaskRelocationReason) TaskRelocationPlan(com.netflix.titus.api.relocation.model.TaskRelocationPlan) MutableDataGenerator(com.netflix.titus.common.data.generator.MutableDataGenerator) JobDescriptorGenerator.oneTaskServiceJobDescriptor(com.netflix.titus.testkit.model.job.JobDescriptorGenerator.oneTaskServiceJobDescriptor) DeschedulingResult(com.netflix.titus.supplementary.relocation.model.DeschedulingResult) DisruptionBudgetGenerator.selfManagedPolicy(com.netflix.titus.testkit.model.eviction.DisruptionBudgetGenerator.selfManagedPolicy) Job(com.netflix.titus.api.jobmanager.model.job.Job) JobFunctions.withJobId(com.netflix.titus.api.jobmanager.model.job.JobFunctions.withJobId) ServiceJobExt(com.netflix.titus.api.jobmanager.model.job.ext.ServiceJobExt) JobFunctions.withDisruptionBudget(com.netflix.titus.api.jobmanager.model.job.JobFunctions.withDisruptionBudget) JobGenerator(com.netflix.titus.testkit.model.job.JobGenerator) Test(org.junit.Test) Mockito.when(org.mockito.Mockito.when) List(java.util.List) ReadOnlyEvictionOperations(com.netflix.titus.api.eviction.service.ReadOnlyEvictionOperations) Optional(java.util.Optional) TitusRuntime(com.netflix.titus.common.runtime.TitusRuntime) TestClock(com.netflix.titus.common.util.time.TestClock) Collections(java.util.Collections) TitusNode(com.netflix.titus.supplementary.relocation.connector.TitusNode) Mockito.mock(org.mockito.Mockito.mock) ReadOnlyJobOperations(com.netflix.titus.api.jobmanager.service.ReadOnlyJobOperations) KubernetesNodeDataResolver(com.netflix.titus.supplementary.relocation.connector.KubernetesNodeDataResolver) TaskRelocationPlan(com.netflix.titus.api.relocation.model.TaskRelocationPlan) TestClock(com.netflix.titus.common.util.time.TestClock) ServiceJobExt(com.netflix.titus.api.jobmanager.model.job.ext.ServiceJobExt) ReadOnlyEvictionOperations(com.netflix.titus.api.eviction.service.ReadOnlyEvictionOperations) TitusNode(com.netflix.titus.supplementary.relocation.connector.TitusNode) ServiceJobTask(com.netflix.titus.api.jobmanager.model.job.ServiceJobTask)

Example 3 with TaskRelocationPlan

use of com.netflix.titus.api.relocation.model.TaskRelocationPlan in project titus-control-plane by Netflix.

the class TaskEvictionStepTest method testFailedEviction.

@Test
public void testFailedEviction() {
    TaskRelocationPlan taskRelocationPlan = oneMigrationPlan().toBuilder().withTaskId("nonExistingTaskId").build();
    Map<String, TaskRelocationStatus> result = step.evict(Collections.singletonMap("nonExistingTaskId", taskRelocationPlan));
    assertThat(result).hasSize(1);
    TaskRelocationStatus relocationStatus = result.get("nonExistingTaskId");
    assertThat(relocationStatus.getTaskId()).isEqualTo("nonExistingTaskId");
    assertThat(relocationStatus.getStatusCode()).isEqualTo(TaskRelocationStatus.STATUS_EVICTION_ERROR);
    assertThat(relocationStatus.getTaskRelocationPlan()).isEqualTo(taskRelocationPlan);
}
Also used : TaskRelocationStatus(com.netflix.titus.api.relocation.model.TaskRelocationStatus) TaskRelocationPlan(com.netflix.titus.api.relocation.model.TaskRelocationPlan) AbstractTaskRelocationTest(com.netflix.titus.supplementary.relocation.AbstractTaskRelocationTest) Test(org.junit.Test)

Example 4 with TaskRelocationPlan

use of com.netflix.titus.api.relocation.model.TaskRelocationPlan in project titus-control-plane by Netflix.

the class MustBeRelocatedSelfManagedTaskCollectorStep method buildRelocationPlans.

private Map<String, TaskRelocationPlan> buildRelocationPlans() {
    Map<String, TitusNode> nodes = nodeDataResolver.resolve();
    List<Triple<Job<?>, Task, TitusNode>> allItems = findAllJobTaskAgentTriples(nodes);
    Map<String, TaskRelocationPlan> result = new HashMap<>();
    logger.debug("Number of triplets to check: {}", allItems.size());
    allItems.forEach(triple -> {
        Job<?> job = triple.getFirst();
        Task task = triple.getSecond();
        TitusNode instance = triple.getThird();
        checkIfNeedsRelocationPlan(job, task, instance).ifPresent(reason -> result.put(task.getId(), buildSelfManagedRelocationPlan(job, task, reason)));
    });
    this.lastResult = result;
    return result;
}
Also used : Triple(com.netflix.titus.common.util.tuple.Triple) Task(com.netflix.titus.api.jobmanager.model.job.Task) HashMap(java.util.HashMap) TitusNode(com.netflix.titus.supplementary.relocation.connector.TitusNode) TaskRelocationPlan(com.netflix.titus.api.relocation.model.TaskRelocationPlan)

Example 5 with TaskRelocationPlan

use of com.netflix.titus.api.relocation.model.TaskRelocationPlan in project titus-control-plane by Netflix.

the class TaskEvictionStep method execute.

private Map<String, TaskRelocationStatus> execute(Map<String, TaskRelocationPlan> taskToEvict) {
    Map<String, Mono<Void>> actions = taskToEvict.values().stream().collect(Collectors.toMap(TaskRelocationPlan::getTaskId, p -> {
        String message;
        switch(p.getReason()) {
            case AgentEvacuation:
                message = String.format("Agent evacuation: %s", p.getReasonMessage());
                break;
            case SelfManagedMigration:
                message = String.format("Self managed migration requested on %s: %s", DateTimeExt.toUtcDateTimeString(p.getDecisionTime()), p.getReasonMessage());
                break;
            case TaskMigration:
                message = p.getReasonMessage();
                break;
            default:
                message = String.format("[unrecognized relocation reason %s]: %s" + p.getReason(), p.getReasonMessage());
        }
        return evictionServiceClient.terminateTask(p.getTaskId(), message).timeout(EVICTION_TIMEOUT);
    }));
    Map<String, Optional<Throwable>> evictionResults;
    try {
        evictionResults = ReactorExt.merge(actions, CONCURRENCY_LIMIT, scheduler).block();
    } catch (Exception e) {
        logger.warn("Unexpected error when calling the eviction service", e);
        return taskToEvict.values().stream().map(p -> TaskRelocationStatus.newBuilder().withState(TaskRelocationState.Failure).withStatusCode(TaskRelocationStatus.STATUS_SYSTEM_ERROR).withStatusMessage("Unexpected error: " + ExceptionExt.toMessageChain(e)).withTimestamp(clock.wallTime()).build()).collect(Collectors.toMap(TaskRelocationStatus::getTaskId, s -> s));
    }
    Map<String, TaskRelocationStatus> results = new HashMap<>();
    taskToEvict.forEach((taskId, plan) -> {
        Optional<Throwable> evictionResult = evictionResults.get(plan.getTaskId());
        TaskRelocationStatus status;
        if (evictionResult != null) {
            if (!evictionResult.isPresent()) {
                status = TaskRelocationStatus.newBuilder().withTaskId(taskId).withState(TaskRelocationState.Success).withStatusCode(TaskRelocationStatus.STATUS_CODE_TERMINATED).withStatusMessage("Task terminated successfully").withTaskRelocationPlan(plan).withTimestamp(clock.wallTime()).build();
            } else {
                status = TaskRelocationStatus.newBuilder().withTaskId(taskId).withState(TaskRelocationState.Failure).withStatusCode(TaskRelocationStatus.STATUS_EVICTION_ERROR).withStatusMessage(evictionResult.get().getMessage()).withTaskRelocationPlan(plan).withTimestamp(clock.wallTime()).build();
            }
        } else {
            // This should never happen
            invariants.inconsistent("Eviction result missing: taskId=%s", plan.getTaskId());
            status = TaskRelocationStatus.newBuilder().withTaskId(taskId).withState(TaskRelocationState.Failure).withStatusCode(TaskRelocationStatus.STATUS_SYSTEM_ERROR).withStatusMessage("Eviction result missing").withTaskRelocationPlan(plan).withTimestamp(clock.wallTime()).build();
        }
        results.put(taskId, status);
        transactionLog.logTaskRelocationStatus(STEP_NAME, "eviction", status);
    });
    return results;
}
Also used : DateTimeExt(com.netflix.titus.common.util.DateTimeExt) Logger(org.slf4j.Logger) EvictionServiceClient(com.netflix.titus.runtime.connector.eviction.EvictionServiceClient) Stopwatch(com.google.common.base.Stopwatch) TaskRelocationState(com.netflix.titus.api.relocation.model.TaskRelocationStatus.TaskRelocationState) LoggerFactory(org.slf4j.LoggerFactory) HashMap(java.util.HashMap) Mono(reactor.core.publisher.Mono) ReactorExt(com.netflix.titus.common.util.rx.ReactorExt) Scheduler(reactor.core.scheduler.Scheduler) Collectors(java.util.stream.Collectors) TimeUnit(java.util.concurrent.TimeUnit) CodeInvariants(com.netflix.titus.common.util.code.CodeInvariants) Duration(java.time.Duration) Map(java.util.Map) Optional(java.util.Optional) ExceptionExt(com.netflix.titus.common.util.ExceptionExt) TitusRuntime(com.netflix.titus.common.runtime.TitusRuntime) TaskRelocationStatus(com.netflix.titus.api.relocation.model.TaskRelocationStatus) Clock(com.netflix.titus.common.util.time.Clock) TaskRelocationPlan(com.netflix.titus.api.relocation.model.TaskRelocationPlan) Optional(java.util.Optional) HashMap(java.util.HashMap) Mono(reactor.core.publisher.Mono) TaskRelocationStatus(com.netflix.titus.api.relocation.model.TaskRelocationStatus)

Aggregations

TaskRelocationPlan (com.netflix.titus.api.relocation.model.TaskRelocationPlan)18 Task (com.netflix.titus.api.jobmanager.model.job.Task)10 Test (org.junit.Test)10 Optional (java.util.Optional)5 Job (com.netflix.titus.api.jobmanager.model.job.Job)4 TitusRuntime (com.netflix.titus.common.runtime.TitusRuntime)4 Pair (com.netflix.titus.common.util.tuple.Pair)4 AbstractTaskRelocationTest (com.netflix.titus.supplementary.relocation.AbstractTaskRelocationTest)4 TitusNode (com.netflix.titus.supplementary.relocation.connector.TitusNode)4 DeschedulingResult (com.netflix.titus.supplementary.relocation.model.DeschedulingResult)4 HashMap (java.util.HashMap)4 ReadOnlyEvictionOperations (com.netflix.titus.api.eviction.service.ReadOnlyEvictionOperations)3 BatchJobExt (com.netflix.titus.api.jobmanager.model.job.ext.BatchJobExt)3 ReadOnlyJobOperations (com.netflix.titus.api.jobmanager.service.ReadOnlyJobOperations)3 TaskRelocationReason (com.netflix.titus.api.relocation.model.TaskRelocationPlan.TaskRelocationReason)3 TaskRelocationStatus (com.netflix.titus.api.relocation.model.TaskRelocationStatus)3 Clock (com.netflix.titus.common.util.time.Clock)3 List (java.util.List)3 Map (java.util.Map)3 Collectors (java.util.stream.Collectors)3