Search in sources :

Example 1 with PodWrapper

use of com.netflix.titus.master.kubernetes.client.model.PodWrapper in project titus-control-plane by Netflix.

the class KubeNotificationProcessor method handlePodUpdatedEvent.

private Mono<Void> handlePodUpdatedEvent(PodEvent event, Job job, Task task) {
    // This is basic sanity check. If it fails, we have a major problem with pod state.
    if (event.getPod() == null || event.getPod().getStatus() == null || event.getPod().getStatus().getPhase() == null) {
        logger.warn("Pod notification with pod without status or phase set: taskId={}, pod={}", task.getId(), event.getPod());
        metricsNoChangesApplied.increment();
        return Mono.empty();
    }
    PodWrapper podWrapper = new PodWrapper(event.getPod());
    Optional<V1Node> node;
    if (event instanceof PodUpdatedEvent) {
        node = ((PodUpdatedEvent) event).getNode();
    } else if (event instanceof PodDeletedEvent) {
        node = ((PodDeletedEvent) event).getNode();
    } else {
        node = Optional.empty();
    }
    Either<TaskStatus, String> newTaskStatusOrError = new PodToTaskMapper(podWrapper, node, task, event instanceof PodDeletedEvent, containerResultCodeResolver, titusRuntime).getNewTaskStatus();
    if (newTaskStatusOrError.hasError()) {
        logger.info(newTaskStatusOrError.getError());
        metricsNoChangesApplied.increment();
        return Mono.empty();
    }
    TaskStatus newTaskStatus = newTaskStatusOrError.getValue();
    if (TaskStatus.areEquivalent(task.getStatus(), newTaskStatus)) {
        logger.info("Pod change notification does not change task status: taskId={}, status={}, eventSequenceNumber={}", task.getId(), newTaskStatus, event.getSequenceNumber());
    } else {
        logger.info("Pod notification changes task status: taskId={}, fromStatus={}, toStatus={}, eventSequenceNumber={}", task.getId(), task.getStatus(), newTaskStatus, event.getSequenceNumber());
    }
    // against most up to date task version.
    if (!updateTaskStatus(podWrapper, newTaskStatus, node, task, true).isPresent()) {
        return Mono.empty();
    }
    return ReactorExt.toMono(v3JobOperations.updateTask(task.getId(), current -> updateTaskStatus(podWrapper, newTaskStatus, node, current, false), V3JobOperations.Trigger.Kube, "Pod status updated from kubernetes node (k8phase='" + event.getPod().getStatus().getPhase() + "', taskState=" + task.getStatus().getState() + ")", KUBE_CALL_METADATA));
}
Also used : Retry(reactor.util.retry.Retry) Task(com.netflix.titus.api.jobmanager.model.job.Task) CollectionsExt(com.netflix.titus.common.util.CollectionsExt) LoggerFactory(org.slf4j.LoggerFactory) V1PodStatus(io.kubernetes.client.openapi.models.V1PodStatus) ReactorExt(com.netflix.titus.common.util.rx.ReactorExt) KubeUtil(com.netflix.titus.master.kubernetes.KubeUtil) TITUS_NODE_DOMAIN(com.netflix.titus.runtime.kubernetes.KubeConstants.TITUS_NODE_DOMAIN) Duration(java.time.Duration) Map(java.util.Map) DirectKubeApiServerIntegrator(com.netflix.titus.master.kubernetes.client.DirectKubeApiServerIntegrator) Either(com.netflix.titus.common.util.tuple.Either) CallMetadata(com.netflix.titus.api.model.callmetadata.CallMetadata) PodEvent(com.netflix.titus.master.kubernetes.client.model.PodEvent) Job(com.netflix.titus.api.jobmanager.model.job.Job) TaskStatus(com.netflix.titus.api.jobmanager.model.job.TaskStatus) JobFunctions(com.netflix.titus.api.jobmanager.model.job.JobFunctions) TaskState(com.netflix.titus.api.jobmanager.model.job.TaskState) PodNotFoundEvent(com.netflix.titus.master.kubernetes.client.model.PodNotFoundEvent) Timer(com.netflix.spectator.api.Timer) List(java.util.List) Optional(java.util.Optional) PodWrapper(com.netflix.titus.master.kubernetes.client.model.PodWrapper) Gauge(com.netflix.spectator.api.Gauge) Disposable(reactor.core.Disposable) Stopwatch(com.google.common.base.Stopwatch) PodDeletedEvent(com.netflix.titus.master.kubernetes.client.model.PodDeletedEvent) Counter(com.netflix.spectator.api.Counter) HashMap(java.util.HashMap) MetricConstants(com.netflix.titus.master.MetricConstants) V1Node(io.kubernetes.client.openapi.models.V1Node) Singleton(javax.inject.Singleton) Scheduler(reactor.core.scheduler.Scheduler) ArrayList(java.util.ArrayList) Inject(javax.inject.Inject) Pair(com.netflix.titus.common.util.tuple.Pair) ContainerResultCodeResolver(com.netflix.titus.master.kubernetes.ContainerResultCodeResolver) Schedulers(reactor.core.scheduler.Schedulers) Evaluators.acceptNotNull(com.netflix.titus.common.util.Evaluators.acceptNotNull) KubeJobManagementReconciler(com.netflix.titus.master.kubernetes.controller.KubeJobManagementReconciler) ExecutorService(java.util.concurrent.ExecutorService) ExecutorsExt(com.netflix.titus.common.util.ExecutorsExt) Logger(org.slf4j.Logger) PodUpdatedEvent(com.netflix.titus.master.kubernetes.client.model.PodUpdatedEvent) Mono(reactor.core.publisher.Mono) Activator(com.netflix.titus.common.util.guice.annotation.Activator) TimeUnit(java.util.concurrent.TimeUnit) AtomicLong(java.util.concurrent.atomic.AtomicLong) ExecutableStatus(com.netflix.titus.api.jobmanager.model.job.ExecutableStatus) V3JobOperations(com.netflix.titus.api.jobmanager.service.V3JobOperations) TaskAttributes(com.netflix.titus.api.jobmanager.TaskAttributes) PodToTaskMapper(com.netflix.titus.master.kubernetes.PodToTaskMapper) V1ContainerState(io.kubernetes.client.openapi.models.V1ContainerState) VisibleForTesting(com.google.common.annotations.VisibleForTesting) TitusRuntime(com.netflix.titus.common.runtime.TitusRuntime) Comparator(java.util.Comparator) Evaluators(com.netflix.titus.common.util.Evaluators) PodToTaskMapper(com.netflix.titus.master.kubernetes.PodToTaskMapper) PodDeletedEvent(com.netflix.titus.master.kubernetes.client.model.PodDeletedEvent) V1Node(io.kubernetes.client.openapi.models.V1Node) PodWrapper(com.netflix.titus.master.kubernetes.client.model.PodWrapper) PodUpdatedEvent(com.netflix.titus.master.kubernetes.client.model.PodUpdatedEvent) TaskStatus(com.netflix.titus.api.jobmanager.model.job.TaskStatus)

Example 2 with PodWrapper

use of com.netflix.titus.master.kubernetes.client.model.PodWrapper in project titus-control-plane by Netflix.

the class KubeNotificationProcessorTest method testUpdateTaskStatusVK.

@Test
public void testUpdateTaskStatusVK() {
    V1Pod pod = newPod(TASK.getId(), andRunning());
    V1Node node = newNode(andIpAddress("2.2.2.2"), andNodeAnnotations(TITUS_NODE_DOMAIN + "ami", "ami123", TITUS_NODE_DOMAIN + "stack", "myStack"));
    Map<String, String> UpdatedAnnotations = new HashMap<>();
    UpdatedAnnotations.put(LEGACY_ANNOTATION_IP_ADDRESS, "1.2.3.4");
    pod.getMetadata().setAnnotations(UpdatedAnnotations);
    Task updatedTask = processor.updateTaskStatus(new PodWrapper(pod), TaskStatus.newBuilder().withState(TaskState.Started).build(), Optional.of(node), TASK, false).orElse(null);
    Set<TaskState> pastStates = updatedTask.getStatusHistory().stream().map(ExecutableStatus::getState).collect(Collectors.toSet());
    assertThat(pastStates).contains(TaskState.Accepted, TaskState.Launched, TaskState.StartInitiated);
    assertThat(updatedTask.getTaskContext()).containsEntry(TaskAttributes.TASK_ATTRIBUTES_AGENT_HOST, "2.2.2.2");
    assertThat(updatedTask.getTaskContext()).containsEntry(TaskAttributes.TASK_ATTRIBUTES_CONTAINER_IP, "1.2.3.4");
    assertThat(updatedTask.getTaskContext()).containsEntry(TaskAttributes.TASK_ATTRIBUTES_AGENT_AMI, "ami123");
    assertThat(updatedTask.getTaskContext()).containsEntry(TaskAttributes.TASK_ATTRIBUTES_AGENT_STACK, "myStack");
}
Also used : Task(com.netflix.titus.api.jobmanager.model.job.Task) BatchJobTask(com.netflix.titus.api.jobmanager.model.job.BatchJobTask) V1Node(io.kubernetes.client.openapi.models.V1Node) HashMap(java.util.HashMap) V1Pod(io.kubernetes.client.openapi.models.V1Pod) PodWrapper(com.netflix.titus.master.kubernetes.client.model.PodWrapper) ArgumentMatchers.anyString(org.mockito.ArgumentMatchers.anyString) TaskState(com.netflix.titus.api.jobmanager.model.job.TaskState) Test(org.junit.Test)

Example 3 with PodWrapper

use of com.netflix.titus.master.kubernetes.client.model.PodWrapper in project titus-control-plane by Netflix.

the class KubeNotificationProcessor method updateTaskStatus.

@VisibleForTesting
Optional<Task> updateTaskStatus(PodWrapper podWrapper, TaskStatus newTaskStatus, Optional<V1Node> node, Task currentTask, boolean precheck) {
    // 8. in the next reconciliation loop a task is moved again to 'KillInitiated' state.
    if (TaskState.isBefore(newTaskStatus.getState(), currentTask.getStatus().getState())) {
        logger.info("[precheck={}] Ignoring an attempt to move the task state to the earlier one: taskId={}, attempt={}, current={}", precheck, currentTask.getId(), newTaskStatus.getState(), currentTask.getStatus().getState());
        metricsNoChangesApplied.increment();
        return Optional.empty();
    }
    Task updatedTask;
    if (TaskStatus.areEquivalent(currentTask.getStatus(), newTaskStatus)) {
        updatedTask = currentTask;
    } else {
        List<TaskStatus> newHistory = CollectionsExt.copyAndAdd(currentTask.getStatusHistory(), currentTask.getStatus());
        updatedTask = currentTask.toBuilder().withStatus(newTaskStatus).withStatusHistory(newHistory).build();
    }
    Task fixedTask = fillInMissingStates(podWrapper, updatedTask);
    Task taskWithExecutorData = JobManagerUtil.attachNetworkDataFromPod(fixedTask, podWrapper);
    Task taskWithNodeMetadata = node.map(n -> attachNodeMetadata(taskWithExecutorData, n)).orElse(taskWithExecutorData);
    Optional<String> difference = areTasksEquivalent(currentTask, taskWithNodeMetadata);
    if (!difference.isPresent()) {
        logger.debug("[precheck={}] Ignoring the pod event as the update results in the identical task object as the current one: taskId={}", precheck, currentTask.getId());
        metricsNoChangesApplied.increment();
        return Optional.empty();
    }
    if (!precheck) {
        logger.info("[precheck={}] Tasks are different: difference='{}', current={}, updated={}", precheck, difference.get(), currentTask, taskWithNodeMetadata);
        metricsChangesApplied.increment();
    }
    return Optional.of(taskWithNodeMetadata);
}
Also used : Retry(reactor.util.retry.Retry) Task(com.netflix.titus.api.jobmanager.model.job.Task) CollectionsExt(com.netflix.titus.common.util.CollectionsExt) LoggerFactory(org.slf4j.LoggerFactory) V1PodStatus(io.kubernetes.client.openapi.models.V1PodStatus) ReactorExt(com.netflix.titus.common.util.rx.ReactorExt) KubeUtil(com.netflix.titus.master.kubernetes.KubeUtil) TITUS_NODE_DOMAIN(com.netflix.titus.runtime.kubernetes.KubeConstants.TITUS_NODE_DOMAIN) Duration(java.time.Duration) Map(java.util.Map) DirectKubeApiServerIntegrator(com.netflix.titus.master.kubernetes.client.DirectKubeApiServerIntegrator) Either(com.netflix.titus.common.util.tuple.Either) CallMetadata(com.netflix.titus.api.model.callmetadata.CallMetadata) PodEvent(com.netflix.titus.master.kubernetes.client.model.PodEvent) Job(com.netflix.titus.api.jobmanager.model.job.Job) TaskStatus(com.netflix.titus.api.jobmanager.model.job.TaskStatus) JobFunctions(com.netflix.titus.api.jobmanager.model.job.JobFunctions) TaskState(com.netflix.titus.api.jobmanager.model.job.TaskState) PodNotFoundEvent(com.netflix.titus.master.kubernetes.client.model.PodNotFoundEvent) Timer(com.netflix.spectator.api.Timer) List(java.util.List) Optional(java.util.Optional) PodWrapper(com.netflix.titus.master.kubernetes.client.model.PodWrapper) Gauge(com.netflix.spectator.api.Gauge) Disposable(reactor.core.Disposable) Stopwatch(com.google.common.base.Stopwatch) PodDeletedEvent(com.netflix.titus.master.kubernetes.client.model.PodDeletedEvent) Counter(com.netflix.spectator.api.Counter) HashMap(java.util.HashMap) MetricConstants(com.netflix.titus.master.MetricConstants) V1Node(io.kubernetes.client.openapi.models.V1Node) Singleton(javax.inject.Singleton) Scheduler(reactor.core.scheduler.Scheduler) ArrayList(java.util.ArrayList) Inject(javax.inject.Inject) Pair(com.netflix.titus.common.util.tuple.Pair) ContainerResultCodeResolver(com.netflix.titus.master.kubernetes.ContainerResultCodeResolver) Schedulers(reactor.core.scheduler.Schedulers) Evaluators.acceptNotNull(com.netflix.titus.common.util.Evaluators.acceptNotNull) KubeJobManagementReconciler(com.netflix.titus.master.kubernetes.controller.KubeJobManagementReconciler) ExecutorService(java.util.concurrent.ExecutorService) ExecutorsExt(com.netflix.titus.common.util.ExecutorsExt) Logger(org.slf4j.Logger) PodUpdatedEvent(com.netflix.titus.master.kubernetes.client.model.PodUpdatedEvent) Mono(reactor.core.publisher.Mono) Activator(com.netflix.titus.common.util.guice.annotation.Activator) TimeUnit(java.util.concurrent.TimeUnit) AtomicLong(java.util.concurrent.atomic.AtomicLong) ExecutableStatus(com.netflix.titus.api.jobmanager.model.job.ExecutableStatus) V3JobOperations(com.netflix.titus.api.jobmanager.service.V3JobOperations) TaskAttributes(com.netflix.titus.api.jobmanager.TaskAttributes) PodToTaskMapper(com.netflix.titus.master.kubernetes.PodToTaskMapper) V1ContainerState(io.kubernetes.client.openapi.models.V1ContainerState) VisibleForTesting(com.google.common.annotations.VisibleForTesting) TitusRuntime(com.netflix.titus.common.runtime.TitusRuntime) Comparator(java.util.Comparator) Evaluators(com.netflix.titus.common.util.Evaluators) Task(com.netflix.titus.api.jobmanager.model.job.Task) TaskStatus(com.netflix.titus.api.jobmanager.model.job.TaskStatus) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Example 4 with PodWrapper

use of com.netflix.titus.master.kubernetes.client.model.PodWrapper in project titus-control-plane by Netflix.

the class KubeNotificationProcessorTest method testTaskStateDoesNotMoveBack.

@Test
public void testTaskStateDoesNotMoveBack() {
    V1Pod pod = newPod(TASK.getId(), andRunning());
    Task updatedTask = processor.updateTaskStatus(new PodWrapper(pod), TaskStatus.newBuilder().withState(TaskState.Started).build(), Optional.of(newNode()), JobFunctions.changeTaskStatus(TASK, TaskStatus.newBuilder().withState(TaskState.KillInitiated).build()), false).orElse(null);
    assertThat(updatedTask).isNull();
}
Also used : Task(com.netflix.titus.api.jobmanager.model.job.Task) BatchJobTask(com.netflix.titus.api.jobmanager.model.job.BatchJobTask) V1Pod(io.kubernetes.client.openapi.models.V1Pod) PodWrapper(com.netflix.titus.master.kubernetes.client.model.PodWrapper) Test(org.junit.Test)

Example 5 with PodWrapper

use of com.netflix.titus.master.kubernetes.client.model.PodWrapper in project titus-control-plane by Netflix.

the class KubeNotificationProcessorTest method testUpdateTaskStatusVKWithTransitionNetworkMode.

@Test
public void testUpdateTaskStatusVKWithTransitionNetworkMode() {
    V1Pod pod = newPod(TASK.getId(), andRunning());
    V1Node node = newNode(andIpAddress("2.2.2.2"), andNodeAnnotations(TITUS_NODE_DOMAIN + "ami", "ami123", TITUS_NODE_DOMAIN + "stack", "myStack"));
    Map<String, String> UpdatedAnnotations = new HashMap<>();
    UpdatedAnnotations.put(LEGACY_ANNOTATION_IP_ADDRESS, "2001:db8:0:1234:0:567:8:1");
    UpdatedAnnotations.put(LEGACY_ANNOTATION_ENI_IP_ADDRESS, "192.0.2.1");
    UpdatedAnnotations.put(LEGACY_ANNOTATION_ENI_IPV6_ADDRESS, "2001:db8:0:1234:0:567:8:1");
    UpdatedAnnotations.put(LEGACY_ANNOTATION_NETWORK_MODE, NetworkConfiguration.NetworkMode.Ipv6AndIpv4Fallback.toString());
    pod.getMetadata().setAnnotations(UpdatedAnnotations);
    Task updatedTask = processor.updateTaskStatus(new PodWrapper(pod), TaskStatus.newBuilder().withState(TaskState.Started).build(), Optional.of(node), TASK, false).orElse(null);
    Set<TaskState> pastStates = updatedTask.getStatusHistory().stream().map(ExecutableStatus::getState).collect(Collectors.toSet());
    assertThat(pastStates).contains(TaskState.Accepted, TaskState.Launched, TaskState.StartInitiated);
    assertThat(updatedTask.getTaskContext()).containsEntry(TaskAttributes.TASK_ATTRIBUTES_AGENT_HOST, "2.2.2.2");
    assertThat(updatedTask.getTaskContext()).containsEntry(TaskAttributes.TASK_ATTRIBUTES_CONTAINER_IP, "2001:db8:0:1234:0:567:8:1");
    assertThat(updatedTask.getTaskContext()).containsEntry(TaskAttributes.TASK_ATTRIBUTES_CONTAINER_IPV6, "2001:db8:0:1234:0:567:8:1");
    // In IPv6 + transition mode, there should *not* be a ipv4. That would be confusing because such a v4 would not
    // be unique to that task, and tools would try to use it, people would try to ssh to it, etc.
    assertThat(updatedTask.getTaskContext()).doesNotContainKey(TaskAttributes.TASK_ATTRIBUTES_CONTAINER_IPV4);
    assertThat(updatedTask.getTaskContext()).containsEntry(TaskAttributes.TASK_ATTRIBUTES_TRANSITION_IPV4, "192.0.2.1");
}
Also used : Task(com.netflix.titus.api.jobmanager.model.job.Task) BatchJobTask(com.netflix.titus.api.jobmanager.model.job.BatchJobTask) V1Node(io.kubernetes.client.openapi.models.V1Node) HashMap(java.util.HashMap) V1Pod(io.kubernetes.client.openapi.models.V1Pod) PodWrapper(com.netflix.titus.master.kubernetes.client.model.PodWrapper) ArgumentMatchers.anyString(org.mockito.ArgumentMatchers.anyString) TaskState(com.netflix.titus.api.jobmanager.model.job.TaskState) Test(org.junit.Test)

Aggregations

Task (com.netflix.titus.api.jobmanager.model.job.Task)5 PodWrapper (com.netflix.titus.master.kubernetes.client.model.PodWrapper)5 TaskState (com.netflix.titus.api.jobmanager.model.job.TaskState)4 V1Node (io.kubernetes.client.openapi.models.V1Node)4 HashMap (java.util.HashMap)4 BatchJobTask (com.netflix.titus.api.jobmanager.model.job.BatchJobTask)3 V1Pod (io.kubernetes.client.openapi.models.V1Pod)3 VisibleForTesting (com.google.common.annotations.VisibleForTesting)2 Stopwatch (com.google.common.base.Stopwatch)2 Counter (com.netflix.spectator.api.Counter)2 Gauge (com.netflix.spectator.api.Gauge)2 Timer (com.netflix.spectator.api.Timer)2 TaskAttributes (com.netflix.titus.api.jobmanager.TaskAttributes)2 ExecutableStatus (com.netflix.titus.api.jobmanager.model.job.ExecutableStatus)2 Job (com.netflix.titus.api.jobmanager.model.job.Job)2 JobFunctions (com.netflix.titus.api.jobmanager.model.job.JobFunctions)2 TaskStatus (com.netflix.titus.api.jobmanager.model.job.TaskStatus)2 V3JobOperations (com.netflix.titus.api.jobmanager.service.V3JobOperations)2 CallMetadata (com.netflix.titus.api.model.callmetadata.CallMetadata)2 TitusRuntime (com.netflix.titus.common.runtime.TitusRuntime)2