Search in sources :

Example 1 with TitusNode

use of com.netflix.titus.supplementary.relocation.connector.TitusNode in project titus-control-plane by Netflix.

the class RelocationConnectorStubs method place.

public RelocationConnectorStubs place(String instanceGroupId, Task... tasks) {
    List<TitusNode> nodes = new ArrayList<>(nodeDataResolver.getNodes(instanceGroupId).values());
    int counter = 0;
    for (Task task : tasks) {
        TitusNode node = nodes.get(counter++ % nodes.size());
        jobComponentStub.place(task.getId(), node.getId(), node.getIpAddress());
    }
    return this;
}
Also used : Task(com.netflix.titus.api.jobmanager.model.job.Task) ArrayList(java.util.ArrayList) TitusNode(com.netflix.titus.supplementary.relocation.connector.TitusNode)

Example 2 with TitusNode

use of com.netflix.titus.supplementary.relocation.connector.TitusNode in project titus-control-plane by Netflix.

the class RelocationConnectorStubs method markNodeRelocationRequired.

public void markNodeRelocationRequired(String nodeId) {
    TitusNode node = Preconditions.checkNotNull(nodeDataResolver.getNode(nodeId));
    nodeDataResolver.addNode(node.toBuilder().withRelocationRequired(true).build());
}
Also used : TitusNode(com.netflix.titus.supplementary.relocation.connector.TitusNode)

Example 3 with TitusNode

use of com.netflix.titus.supplementary.relocation.connector.TitusNode in project titus-control-plane by Netflix.

the class DefaultDeschedulerServiceTest method verifyRelocationPlan.

private void verifyRelocationPlan(long relocationDelay, String reasonMessage) {
    ReadOnlyJobOperations jobOperations = mock(ReadOnlyJobOperations.class);
    DefaultDeschedulerService dds = new DefaultDeschedulerService(jobOperations, mock(ReadOnlyEvictionOperations.class), new KubernetesNodeDataResolver(configuration, TestDataFactory.mockFabric8IOConnector(), node -> true), () -> "foo|bar", titusRuntime);
    Job<ServiceJobExt> job = JobGenerator.serviceJobs(oneTaskServiceJobDescriptor().but(ofServiceSize(2), withDisruptionBudget(budget(selfManagedPolicy(relocationDelay), unlimitedRate(), Collections.emptyList())))).getValue();
    ServiceJobTask task = JobGenerator.serviceTasks(job).getValue();
    when(jobOperations.getJob(job.getId())).thenReturn(Optional.of(job));
    TitusNode node = TitusNode.newBuilder().withId("node1").withServerGroupId("asg1").withRelocationRequired(true).withBadCondition(false).build();
    // Advance test clock
    long clockAdvancedMs = 5_000;
    TestClock testClock = (TestClock) titusRuntime.getClock();
    testClock.advanceTime(Duration.ofMillis(clockAdvancedMs));
    Optional<TaskRelocationPlan> relocationPlanForTask = dds.getRelocationPlanForTask(node, task, Collections.emptyMap());
    assertThat(relocationPlanForTask).isPresent();
    assertThat(relocationPlanForTask.get().getTaskId()).isEqualTo(task.getId());
    // relocation time is expected to be decision clock time + retentionTimeMs
    assertThat(relocationPlanForTask.get().getRelocationTime()).isEqualTo(relocationDelay + clockAdvancedMs);
    assertThat(relocationPlanForTask.get().getDecisionTime()).isEqualTo(clockAdvancedMs);
    assertThat(relocationPlanForTask.get().getReasonMessage()).isEqualTo(reasonMessage);
}
Also used : TestDataFactory(com.netflix.titus.supplementary.relocation.TestDataFactory) Archaius2Ext(com.netflix.titus.common.util.archaius2.Archaius2Ext) DisruptionBudgetGenerator.unlimitedRate(com.netflix.titus.testkit.model.eviction.DisruptionBudgetGenerator.unlimitedRate) RelocationAttributes(com.netflix.titus.runtime.RelocationAttributes) Task(com.netflix.titus.api.jobmanager.model.job.Task) DisruptionBudgetGenerator.budget(com.netflix.titus.testkit.model.eviction.DisruptionBudgetGenerator.budget) Assertions.assertThat(org.assertj.core.api.Assertions.assertThat) KubernetesNodeDataResolver(com.netflix.titus.supplementary.relocation.connector.KubernetesNodeDataResolver) ServiceJobTask(com.netflix.titus.api.jobmanager.model.job.ServiceJobTask) TitusRuntimes(com.netflix.titus.common.runtime.TitusRuntimes) RelocationConfiguration(com.netflix.titus.supplementary.relocation.RelocationConfiguration) Duration(java.time.Duration) RelocationConnectorStubs(com.netflix.titus.supplementary.relocation.RelocationConnectorStubs) JobFunctions.ofServiceSize(com.netflix.titus.api.jobmanager.model.job.JobFunctions.ofServiceSize) ReadOnlyJobOperations(com.netflix.titus.api.jobmanager.service.ReadOnlyJobOperations) TaskRelocationReason(com.netflix.titus.api.relocation.model.TaskRelocationPlan.TaskRelocationReason) TaskRelocationPlan(com.netflix.titus.api.relocation.model.TaskRelocationPlan) MutableDataGenerator(com.netflix.titus.common.data.generator.MutableDataGenerator) JobDescriptorGenerator.oneTaskServiceJobDescriptor(com.netflix.titus.testkit.model.job.JobDescriptorGenerator.oneTaskServiceJobDescriptor) DeschedulingResult(com.netflix.titus.supplementary.relocation.model.DeschedulingResult) DisruptionBudgetGenerator.selfManagedPolicy(com.netflix.titus.testkit.model.eviction.DisruptionBudgetGenerator.selfManagedPolicy) Job(com.netflix.titus.api.jobmanager.model.job.Job) JobFunctions.withJobId(com.netflix.titus.api.jobmanager.model.job.JobFunctions.withJobId) ServiceJobExt(com.netflix.titus.api.jobmanager.model.job.ext.ServiceJobExt) JobFunctions.withDisruptionBudget(com.netflix.titus.api.jobmanager.model.job.JobFunctions.withDisruptionBudget) JobGenerator(com.netflix.titus.testkit.model.job.JobGenerator) Test(org.junit.Test) Mockito.when(org.mockito.Mockito.when) List(java.util.List) ReadOnlyEvictionOperations(com.netflix.titus.api.eviction.service.ReadOnlyEvictionOperations) Optional(java.util.Optional) TitusRuntime(com.netflix.titus.common.runtime.TitusRuntime) TestClock(com.netflix.titus.common.util.time.TestClock) Collections(java.util.Collections) TitusNode(com.netflix.titus.supplementary.relocation.connector.TitusNode) Mockito.mock(org.mockito.Mockito.mock) ReadOnlyJobOperations(com.netflix.titus.api.jobmanager.service.ReadOnlyJobOperations) KubernetesNodeDataResolver(com.netflix.titus.supplementary.relocation.connector.KubernetesNodeDataResolver) TaskRelocationPlan(com.netflix.titus.api.relocation.model.TaskRelocationPlan) TestClock(com.netflix.titus.common.util.time.TestClock) ServiceJobExt(com.netflix.titus.api.jobmanager.model.job.ext.ServiceJobExt) ReadOnlyEvictionOperations(com.netflix.titus.api.eviction.service.ReadOnlyEvictionOperations) TitusNode(com.netflix.titus.supplementary.relocation.connector.TitusNode) ServiceJobTask(com.netflix.titus.api.jobmanager.model.job.ServiceJobTask)

Example 4 with TitusNode

use of com.netflix.titus.supplementary.relocation.connector.TitusNode in project titus-control-plane by Netflix.

the class RelocationUtilTest method buildTasksFromNodesAndJobsFilter.

@Test
public void buildTasksFromNodesAndJobsFilter() {
    String node1 = "node1";
    String node2 = "node2";
    String node3 = "node3";
    Job<BatchJobExt> job1 = JobGenerator.oneBatchJob();
    Job<BatchJobExt> job2 = JobGenerator.oneBatchJob();
    Job<BatchJobExt> job3 = JobGenerator.oneBatchJob();
    BatchJobTask task1 = JobGenerator.batchTasks(job1).getValue().toBuilder().addToTaskContext(TaskAttributes.TASK_ATTRIBUTES_AGENT_INSTANCE_ID, node1).build();
    BatchJobTask task2 = JobGenerator.batchTasks(job2).getValue().toBuilder().addToTaskContext(TaskAttributes.TASK_ATTRIBUTES_AGENT_INSTANCE_ID, node2).build();
    BatchJobTask task3 = JobGenerator.batchTasks(job3).getValue().toBuilder().addToTaskContext(TaskAttributes.TASK_ATTRIBUTES_AGENT_INSTANCE_ID, node3).build();
    ReadOnlyJobOperations jobOperations = mock(ReadOnlyJobOperations.class);
    when(jobOperations.getJobs()).thenReturn(Arrays.asList(job1, job2, job3));
    when(jobOperations.getTasks(job1.getId())).thenReturn(Collections.singletonList(task1));
    when(jobOperations.getTasks(job2.getId())).thenReturn(Collections.singletonList(task2));
    when(jobOperations.getTasks(job3.getId())).thenReturn(Collections.singletonList(task3));
    Map<String, TitusNode> nodes = new HashMap<>(3);
    nodes.put(node1, buildNode(node1));
    nodes.put(node2, buildNode(node2));
    nodes.put(node3, buildNode(node3));
    Set<String> jobIds = new HashSet<>(2);
    jobIds.addAll(Arrays.asList(job1.getId(), job3.getId()));
    List<String> taskIdsOnBadNodes = RelocationUtil.buildTasksFromNodesAndJobsFilter(nodes, jobIds, jobOperations);
    assertThat(taskIdsOnBadNodes.size()).isEqualTo(2);
}
Also used : ReadOnlyJobOperations(com.netflix.titus.api.jobmanager.service.ReadOnlyJobOperations) HashMap(java.util.HashMap) BatchJobExt(com.netflix.titus.api.jobmanager.model.job.ext.BatchJobExt) BatchJobTask(com.netflix.titus.api.jobmanager.model.job.BatchJobTask) TitusNode(com.netflix.titus.supplementary.relocation.connector.TitusNode) HashSet(java.util.HashSet) Test(org.junit.Test)

Example 5 with TitusNode

use of com.netflix.titus.supplementary.relocation.connector.TitusNode in project titus-control-plane by Netflix.

the class DefaultNodeConditionControllerTest method checkTasksTerminatedDueToBadNodeConditions.

@Test
public void checkTasksTerminatedDueToBadNodeConditions() {
    // Mock jobs, tasks & nodes
    Map<String, TitusNode> nodeMap = buildNodes();
    List<Job<BatchJobExt>> jobs = getJobs(true);
    Map<String, List<Task>> tasksByJobIdMap = buildTasksForJobAndNodeAssignment(new ArrayList<>(nodeMap.values()), jobs);
    TitusRuntime titusRuntime = mock(TitusRuntime.class);
    when(titusRuntime.getRegistry()).thenReturn(new DefaultRegistry());
    RelocationConfiguration configuration = mock(RelocationConfiguration.class);
    when(configuration.getBadNodeConditionPattern()).thenReturn(".*Failure");
    when(configuration.isTaskTerminationOnBadNodeConditionEnabled()).thenReturn(true);
    NodeDataResolver nodeDataResolver = mock(NodeDataResolver.class);
    when(nodeDataResolver.resolve()).thenReturn(nodeMap);
    JobDataReplicator jobDataReplicator = mock(JobDataReplicator.class);
    when(jobDataReplicator.getStalenessMs()).thenReturn(0L);
    ReadOnlyJobOperations readOnlyJobOperations = mock(ReadOnlyJobOperations.class);
    when(readOnlyJobOperations.getJobs()).thenReturn(new ArrayList<>(jobs));
    tasksByJobIdMap.forEach((key, value) -> when(readOnlyJobOperations.getTasks(key)).thenReturn(value));
    JobManagementClient jobManagementClient = mock(JobManagementClient.class);
    Set<String> terminatedTaskIds = new HashSet<>();
    when(jobManagementClient.killTask(anyString(), anyBoolean(), any())).thenAnswer(invocation -> {
        String taskIdToBeTerminated = invocation.getArgument(0);
        terminatedTaskIds.add(taskIdToBeTerminated);
        return Mono.empty();
    });
    DefaultNodeConditionController nodeConditionCtrl = new DefaultNodeConditionController(configuration, nodeDataResolver, jobDataReplicator, readOnlyJobOperations, jobManagementClient, titusRuntime);
    ExecutionContext executionContext = ExecutionContext.newBuilder().withIteration(ExecutionId.initial()).build();
    StepVerifier.create(nodeConditionCtrl.handleNodesWithBadCondition(executionContext)).verifyComplete();
    assertThat(terminatedTaskIds).isNotEmpty();
    assertThat(terminatedTaskIds.size()).isEqualTo(2);
    verifyTerminatedTasksOnBadNodes(terminatedTaskIds, tasksByJobIdMap, nodeMap);
}
Also used : JobDataReplicator(com.netflix.titus.runtime.connector.jobmanager.JobDataReplicator) ReadOnlyJobOperations(com.netflix.titus.api.jobmanager.service.ReadOnlyJobOperations) JobManagementClient(com.netflix.titus.runtime.connector.jobmanager.JobManagementClient) NodeDataResolver(com.netflix.titus.supplementary.relocation.connector.NodeDataResolver) ArgumentMatchers.anyString(org.mockito.ArgumentMatchers.anyString) TitusRuntime(com.netflix.titus.common.runtime.TitusRuntime) ExecutionContext(com.netflix.titus.common.framework.scheduler.ExecutionContext) DefaultRegistry(com.netflix.spectator.api.DefaultRegistry) ArrayList(java.util.ArrayList) List(java.util.List) TitusNode(com.netflix.titus.supplementary.relocation.connector.TitusNode) Job(com.netflix.titus.api.jobmanager.model.job.Job) RelocationConfiguration(com.netflix.titus.supplementary.relocation.RelocationConfiguration) HashSet(java.util.HashSet) Test(org.junit.Test)

Aggregations

TitusNode (com.netflix.titus.supplementary.relocation.connector.TitusNode)18 HashMap (java.util.HashMap)9 ArrayList (java.util.ArrayList)8 Job (com.netflix.titus.api.jobmanager.model.job.Job)7 Task (com.netflix.titus.api.jobmanager.model.job.Task)7 TitusRuntime (com.netflix.titus.common.runtime.TitusRuntime)7 DeschedulingResult (com.netflix.titus.supplementary.relocation.model.DeschedulingResult)7 List (java.util.List)7 ReadOnlyJobOperations (com.netflix.titus.api.jobmanager.service.ReadOnlyJobOperations)6 TaskRelocationPlan (com.netflix.titus.api.relocation.model.TaskRelocationPlan)6 Test (org.junit.Test)6 Optional (java.util.Optional)5 Clock (com.netflix.titus.common.util.time.Clock)4 Pair (com.netflix.titus.common.util.tuple.Pair)4 EvictionConfiguration (com.netflix.titus.runtime.connector.eviction.EvictionConfiguration)4 NodeDataResolver (com.netflix.titus.supplementary.relocation.connector.NodeDataResolver)4 ReadOnlyEvictionOperations (com.netflix.titus.api.eviction.service.ReadOnlyEvictionOperations)3 TaskRelocationReason (com.netflix.titus.api.relocation.model.TaskRelocationPlan.TaskRelocationReason)3 RelocationConfiguration (com.netflix.titus.supplementary.relocation.RelocationConfiguration)3 DeschedulingFailure (com.netflix.titus.supplementary.relocation.model.DeschedulingFailure)3