Search in sources :

Example 1 with ContainerHealthStatus

use of com.netflix.titus.api.containerhealth.model.ContainerHealthStatus in project titus-control-plane by Netflix.

the class ContainerHealthAsserts method assertContainerHealthSnapshot.

@SafeVarargs
public static void assertContainerHealthSnapshot(ContainerHealthEvent event, Predicate<ContainerHealthStatus>... predicates) {
    assertThat(event).isInstanceOf(ContainerHealthSnapshotEvent.class);
    ContainerHealthSnapshotEvent snapshotEvent = (ContainerHealthSnapshotEvent) event;
    List<ContainerHealthStatus> snapshotEvents = snapshotEvent.getSnapshot();
    assertThat(snapshotEvents).describedAs("Expecting %s events, but got %s", predicates.length, snapshotEvents.size()).hasSize(predicates.length);
    for (int i = 0; i < snapshotEvents.size(); i++) {
        if (!predicates[i].test(snapshotEvents.get(i))) {
            fail("Event %s does not match its predicate: event=%s", i, snapshotEvents.get(i));
        }
    }
}
Also used : ContainerHealthStatus(com.netflix.titus.api.containerhealth.model.ContainerHealthStatus) ContainerHealthSnapshotEvent(com.netflix.titus.api.containerhealth.model.event.ContainerHealthSnapshotEvent)

Example 2 with ContainerHealthStatus

use of com.netflix.titus.api.containerhealth.model.ContainerHealthStatus in project titus-control-plane by Netflix.

the class UnhealthyTasksLimitTracker method countHealthy.

private Pair<Integer, String> countHealthy() {
    List<Task> tasks;
    try {
        tasks = jobOperations.getTasks(job.getId());
    } catch (JobManagerException e) {
        return Pair.of(0, "job not found");
    }
    int healthy = 0;
    Map<String, String> notStartedOrUnhealthyTasks = new HashMap<>();
    for (Task task : tasks) {
        if (task.getStatus().getState() == TaskState.Started) {
            Optional<ContainerHealthStatus> statusOpt = containerHealthService.findHealthStatus(task.getId());
            if (statusOpt.isPresent() && statusOpt.get().getState() == ContainerHealthState.Healthy) {
                healthy++;
            } else {
                String report = statusOpt.map(status -> startWithLowercase(status.getState().name()) + '(' + status.getReason() + ')').orElse("health not found");
                notStartedOrUnhealthyTasks.put(task.getId(), report);
            }
        } else {
            notStartedOrUnhealthyTasks.put(task.getId(), String.format("Not started (current task state=%s)", task.getStatus().getState()));
        }
    }
    if (!notStartedOrUnhealthyTasks.isEmpty()) {
        StringBuilder builder = new StringBuilder("not started and healthy: ");
        builder.append("total=").append(notStartedOrUnhealthyTasks.size());
        builder.append(", tasks=[");
        int counter = 0;
        for (Map.Entry<String, String> entry : notStartedOrUnhealthyTasks.entrySet()) {
            builder.append(entry.getKey()).append('=').append(entry.getValue());
            counter++;
            if (counter < notStartedOrUnhealthyTasks.size()) {
                builder.append(", ");
            } else {
                builder.append("]");
            }
            if (counter >= TASK_ID_REPORT_LIMIT && counter < notStartedOrUnhealthyTasks.size()) {
                builder.append(",... dropped ").append(notStartedOrUnhealthyTasks.size() - counter).append(" tasks]");
            }
        }
        return Pair.of(healthy, builder.toString());
    }
    return Pair.of(healthy, healthy > minimumHealthyCount ? "" : String.format("not enough healthy containers: healthy=%s, minimum=%s", healthy, minimumHealthyCount));
}
Also used : Job(com.netflix.titus.api.jobmanager.model.job.Job) Task(com.netflix.titus.api.jobmanager.model.job.Task) EvictionQuota(com.netflix.titus.api.eviction.model.EvictionQuota) HashMap(java.util.HashMap) JobFunctions(com.netflix.titus.api.jobmanager.model.job.JobFunctions) Reference(com.netflix.titus.api.model.reference.Reference) UnhealthyTasksLimitDisruptionBudgetPolicy(com.netflix.titus.api.jobmanager.model.job.disruptionbudget.UnhealthyTasksLimitDisruptionBudgetPolicy) TaskState(com.netflix.titus.api.jobmanager.model.job.TaskState) ContainerHealthState(com.netflix.titus.api.containerhealth.model.ContainerHealthState) List(java.util.List) ContainerHealthStatus(com.netflix.titus.api.containerhealth.model.ContainerHealthStatus) AvailabilityPercentageLimitDisruptionBudgetPolicy(com.netflix.titus.api.jobmanager.model.job.disruptionbudget.AvailabilityPercentageLimitDisruptionBudgetPolicy) V3JobOperations(com.netflix.titus.api.jobmanager.service.V3JobOperations) Pair(com.netflix.titus.common.util.tuple.Pair) QuotaTracker(com.netflix.titus.master.eviction.service.quota.QuotaTracker) Map(java.util.Map) Optional(java.util.Optional) JobManagerException(com.netflix.titus.api.jobmanager.service.JobManagerException) StringExt.startWithLowercase(com.netflix.titus.common.util.StringExt.startWithLowercase) VisibleForTesting(com.google.common.annotations.VisibleForTesting) ContainerHealthService(com.netflix.titus.api.containerhealth.service.ContainerHealthService) Task(com.netflix.titus.api.jobmanager.model.job.Task) HashMap(java.util.HashMap) ContainerHealthStatus(com.netflix.titus.api.containerhealth.model.ContainerHealthStatus) JobManagerException(com.netflix.titus.api.jobmanager.service.JobManagerException) HashMap(java.util.HashMap) Map(java.util.Map)

Example 3 with ContainerHealthStatus

use of com.netflix.titus.api.containerhealth.model.ContainerHealthStatus in project titus-control-plane by Netflix.

the class AggregatingContainerHealthService method takeStatusOfTaskWithHealthProviders.

private ContainerHealthStatus takeStatusOfTaskWithHealthProviders(Task task, Set<String> enabledServices) {
    ContainerHealthStatus current = null;
    for (String name : enabledServices) {
        ContainerHealthService healthService = healthServices.get(name);
        ContainerHealthStatus newStatus;
        if (healthService != null) {
            newStatus = healthService.findHealthStatus(task.getId()).orElseGet(() -> ContainerHealthStatus.newBuilder().withTaskId(task.getId()).withState(ContainerHealthState.Unknown).withReason("not known to: " + name).withTimestamp(clock.wallTime()).build());
        } else {
            newStatus = ContainerHealthStatus.newBuilder().withTaskId(task.getId()).withState(ContainerHealthState.Unknown).withReason("unknown container health provider set: " + name).withTimestamp(clock.wallTime()).build();
        }
        current = current == null ? newStatus : ContainerHealthFunctions.merge(current, newStatus);
    }
    return current;
}
Also used : ContainerHealthStatus(com.netflix.titus.api.containerhealth.model.ContainerHealthStatus) ContainerHealthService(com.netflix.titus.api.containerhealth.service.ContainerHealthService)

Example 4 with ContainerHealthStatus

use of com.netflix.titus.api.containerhealth.model.ContainerHealthStatus in project titus-control-plane by Netflix.

the class AggregatingContainerHealthService method handleNewState.

private Flux<ContainerHealthEvent> handleNewState(Job<?> job, Task task, ConcurrentMap<String, ContainerHealthState> emittedStates) {
    ContainerHealthStatus newStatus = takeStatusOf(job, task);
    ContainerHealthState previousState = emittedStates.get(task.getId());
    ContainerHealthState newState = newStatus.getState();
    if (newState == previousState) {
        return Flux.empty();
    }
    if (newState == ContainerHealthState.Terminated) {
        emittedStates.remove(task.getId());
    } else {
        emittedStates.put(task.getId(), newState);
    }
    return Flux.just(ContainerHealthEvent.healthChanged(newStatus));
}
Also used : ContainerHealthStatus(com.netflix.titus.api.containerhealth.model.ContainerHealthStatus) ContainerHealthState(com.netflix.titus.api.containerhealth.model.ContainerHealthState)

Example 5 with ContainerHealthStatus

use of com.netflix.titus.api.containerhealth.model.ContainerHealthStatus in project titus-control-plane by Netflix.

the class AggregatingContainerHealthService method buildCurrentSnapshot.

private ContainerHealthSnapshotEvent buildCurrentSnapshot() {
    List<ContainerHealthStatus> snapshot = new ArrayList<>();
    jobOperations.getJobsAndTasks().forEach(p -> {
        Job job = p.getLeft();
        p.getRight().forEach(task -> {
            if (task.getStatus().getState() == TaskState.Finished) {
                snapshot.add(ContainerHealthStatus.terminated(task.getId(), titusRuntime.getClock().wallTime()));
            } else {
                snapshot.add(takeStatusOf(job, task));
            }
        });
    });
    return new ContainerHealthSnapshotEvent(snapshot);
}
Also used : ContainerHealthStatus(com.netflix.titus.api.containerhealth.model.ContainerHealthStatus) ArrayList(java.util.ArrayList) ContainerHealthSnapshotEvent(com.netflix.titus.api.containerhealth.model.event.ContainerHealthSnapshotEvent) Job(com.netflix.titus.api.jobmanager.model.job.Job)

Aggregations

ContainerHealthStatus (com.netflix.titus.api.containerhealth.model.ContainerHealthStatus)6 ContainerHealthState (com.netflix.titus.api.containerhealth.model.ContainerHealthState)2 ContainerHealthSnapshotEvent (com.netflix.titus.api.containerhealth.model.event.ContainerHealthSnapshotEvent)2 ContainerHealthService (com.netflix.titus.api.containerhealth.service.ContainerHealthService)2 Job (com.netflix.titus.api.jobmanager.model.job.Job)2 Task (com.netflix.titus.api.jobmanager.model.job.Task)2 Pair (com.netflix.titus.common.util.tuple.Pair)2 ArrayList (java.util.ArrayList)2 VisibleForTesting (com.google.common.annotations.VisibleForTesting)1 CacheRefreshedEvent (com.netflix.discovery.CacheRefreshedEvent)1 ContainerHealthEvent (com.netflix.titus.api.containerhealth.model.event.ContainerHealthEvent)1 EvictionQuota (com.netflix.titus.api.eviction.model.EvictionQuota)1 JobFunctions (com.netflix.titus.api.jobmanager.model.job.JobFunctions)1 TaskState (com.netflix.titus.api.jobmanager.model.job.TaskState)1 AvailabilityPercentageLimitDisruptionBudgetPolicy (com.netflix.titus.api.jobmanager.model.job.disruptionbudget.AvailabilityPercentageLimitDisruptionBudgetPolicy)1 UnhealthyTasksLimitDisruptionBudgetPolicy (com.netflix.titus.api.jobmanager.model.job.disruptionbudget.UnhealthyTasksLimitDisruptionBudgetPolicy)1 JobManagerException (com.netflix.titus.api.jobmanager.service.JobManagerException)1 V3JobOperations (com.netflix.titus.api.jobmanager.service.V3JobOperations)1 Reference (com.netflix.titus.api.model.reference.Reference)1 StringExt.startWithLowercase (com.netflix.titus.common.util.StringExt.startWithLowercase)1