Search in sources :

Example 6 with CrashLoopInfo

use of com.hubspot.singularity.CrashLoopInfo in project Singularity by HubSpot.

the class SingularityCrashLoopTest method itDetectsFastFailureLoopsForNonLongRunning.

@Test
public void itDetectsFastFailureLoopsForNonLongRunning() {
    initRequestWithType(RequestType.ON_DEMAND, false);
    initFirstDeploy();
    long now = System.currentTimeMillis();
    createTaskFailure(1, now - 1000, TaskFailureType.BAD_EXIT_CODE);
    createTaskFailure(1, now - 10000, TaskFailureType.BAD_EXIT_CODE);
    createTaskFailure(1, now - 20000, TaskFailureType.BAD_EXIT_CODE);
    createTaskFailure(1, now - 30000, TaskFailureType.BAD_EXIT_CODE);
    createTaskFailure(1, now - 45000, TaskFailureType.BAD_EXIT_CODE);
    SingularityDeployStatistics deployStatistics = deployManager.getDeployStatistics(requestId, firstDeployId).get();
    List<CrashLoopInfo> active = crashLoops.getActiveCrashLoops(deployStatistics);
    Assertions.assertEquals(1, active.size());
    Assertions.assertEquals(CrashLoopType.FAST_FAILURE_LOOP, Iterables.getOnlyElement(active).getType());
}
Also used : CrashLoopInfo(com.hubspot.singularity.CrashLoopInfo) SingularityDeployStatistics(com.hubspot.singularity.SingularityDeployStatistics) Test(org.junit.jupiter.api.Test)

Example 7 with CrashLoopInfo

use of com.hubspot.singularity.CrashLoopInfo in project Singularity by HubSpot.

the class SingularityCrashLoopTest method itDoesNotTriggerWhenFailuresAreNotRecentEnough.

@Test
public void itDoesNotTriggerWhenFailuresAreNotRecentEnough() {
    initRequestWithType(RequestType.WORKER, false);
    initFirstDeploy();
    long now = System.currentTimeMillis();
    // 3 failures meets threshold, but latest must be < ~8mins ago for single instance fail loop
    createTaskFailure(1, now - TimeUnit.MINUTES.toMillis(10), TaskFailureType.BAD_EXIT_CODE);
    createTaskFailure(1, now - TimeUnit.MINUTES.toMillis(15), TaskFailureType.BAD_EXIT_CODE);
    createTaskFailure(1, now - TimeUnit.MINUTES.toMillis(20), TaskFailureType.BAD_EXIT_CODE);
    SingularityDeployStatistics deployStatistics = deployManager.getDeployStatistics(requestId, firstDeployId).get();
    List<CrashLoopInfo> active = crashLoops.getActiveCrashLoops(deployStatistics);
    Assertions.assertTrue(active.isEmpty());
}
Also used : CrashLoopInfo(com.hubspot.singularity.CrashLoopInfo) SingularityDeployStatistics(com.hubspot.singularity.SingularityDeployStatistics) Test(org.junit.jupiter.api.Test)

Example 8 with CrashLoopInfo

use of com.hubspot.singularity.CrashLoopInfo in project Singularity by HubSpot.

the class SingularityCrashLoopTest method itDetectsTooManyMultiInstanceFailures.

@Test
public void itDetectsTooManyMultiInstanceFailures() {
    initRequestWithType(RequestType.WORKER, false);
    initFirstDeploy();
    long now = System.currentTimeMillis();
    createTaskFailure(1, now - TimeUnit.MINUTES.toMillis(1), TaskFailureType.BAD_EXIT_CODE);
    createTaskFailure(2, now - TimeUnit.MINUTES.toMillis(4), TaskFailureType.OOM);
    createTaskFailure(6, now - TimeUnit.MINUTES.toMillis(5), TaskFailureType.OUT_OF_DISK_SPACE);
    createTaskFailure(3, now - TimeUnit.MINUTES.toMillis(7), TaskFailureType.OUT_OF_DISK_SPACE);
    createTaskFailure(4, now - TimeUnit.MINUTES.toMillis(10), TaskFailureType.OOM);
    createTaskFailure(1, now - TimeUnit.MINUTES.toMillis(12), TaskFailureType.OUT_OF_DISK_SPACE);
    createTaskFailure(5, now - TimeUnit.MINUTES.toMillis(16), TaskFailureType.BAD_EXIT_CODE);
    SingularityDeployStatistics deployStatistics = deployManager.getDeployStatistics(requestId, firstDeployId).get();
    List<CrashLoopInfo> active = crashLoops.getActiveCrashLoops(deployStatistics);
    Assertions.assertEquals(1, active.size());
    Assertions.assertEquals(CrashLoopType.MULTI_INSTANCE_FAILURE, Iterables.getOnlyElement(active).getType());
}
Also used : CrashLoopInfo(com.hubspot.singularity.CrashLoopInfo) SingularityDeployStatistics(com.hubspot.singularity.SingularityDeployStatistics) Test(org.junit.jupiter.api.Test)

Example 9 with CrashLoopInfo

use of com.hubspot.singularity.CrashLoopInfo in project Singularity by HubSpot.

the class SingularityCrashLoops method getUnexpectedExitLoop.

/*
   * Unexpected Exits. Too many task finished from a long-running type in X minutes
   */
private Optional<CrashLoopInfo> getUnexpectedExitLoop(long now, SingularityDeployStatistics deployStatistics) {
    // TODO - configurable?
    long thresholdUnexpectedExitTime = now - TimeUnit.MINUTES.toMillis(30);
    List<Long> recentUnexpectedExits = deployStatistics.getTaskFailureEvents().stream().filter(e -> e.getType() == TaskFailureType.UNEXPECTED_EXIT && e.getTimestamp() > thresholdUnexpectedExitTime).map(TaskFailureEvent::getTimestamp).collect(Collectors.toList());
    if (recentUnexpectedExits.size() > 4) {
        // TODO - configurable?
        return Optional.of(new CrashLoopInfo(deployStatistics.getRequestId(), deployStatistics.getDeployId(), recentUnexpectedExits.stream().min(Comparator.comparingLong(Long::longValue)).get(), Optional.empty(), CrashLoopType.UNEXPECTED_EXITS));
    }
    return Optional.empty();
}
Also used : CrashLoopInfo(com.hubspot.singularity.CrashLoopInfo)

Example 10 with CrashLoopInfo

use of com.hubspot.singularity.CrashLoopInfo in project Singularity by HubSpot.

the class SingularityCrashLoopTest method itDetectsStartupFailureLoops.

@Test
public void itDetectsStartupFailureLoops() {
    initRequestWithType(RequestType.WORKER, false);
    initFirstDeploy();
    long now = System.currentTimeMillis();
    SingularityTask task = startTask(firstDeploy, 1);
    taskManager.createTaskCleanup(new SingularityTaskCleanup(Optional.empty(), TaskCleanupType.UNHEALTHY_NEW_TASK, now - 30000, task.getTaskId(), Optional.empty(), Optional.empty(), Optional.empty()));
    createTaskFailure(1, now - 10000, TaskFailureType.STARTUP_FAILURE);
    createTaskFailure(1, now - 15000, TaskFailureType.STARTUP_FAILURE);
    createTaskFailure(1, now - 20000, TaskFailureType.STARTUP_FAILURE);
    SingularityDeployStatistics deployStatistics = deployManager.getDeployStatistics(requestId, firstDeployId).get();
    List<CrashLoopInfo> active = crashLoops.getActiveCrashLoops(deployStatistics);
    Assertions.assertEquals(1, active.size());
    Assertions.assertEquals(CrashLoopType.STARTUP_FAILURE_LOOP, Iterables.getOnlyElement(active).getType());
}
Also used : SingularityTask(com.hubspot.singularity.SingularityTask) CrashLoopInfo(com.hubspot.singularity.CrashLoopInfo) SingularityTaskCleanup(com.hubspot.singularity.SingularityTaskCleanup) SingularityDeployStatistics(com.hubspot.singularity.SingularityDeployStatistics) Test(org.junit.jupiter.api.Test)

Aggregations

CrashLoopInfo (com.hubspot.singularity.CrashLoopInfo)14 SingularityDeployStatistics (com.hubspot.singularity.SingularityDeployStatistics)10 Test (org.junit.jupiter.api.Test)9 Inject (com.google.inject.Inject)3 SingularityTask (com.hubspot.singularity.SingularityTask)3 SingularityTaskCleanup (com.hubspot.singularity.SingularityTaskCleanup)3 List (java.util.List)3 Optional (java.util.Optional)3 CrashLoopType (com.hubspot.singularity.CrashLoopType)2 SingularityRequestWithState (com.hubspot.singularity.SingularityRequestWithState)2 TaskFailureType (com.hubspot.singularity.TaskFailureType)2 SingularityConfiguration (com.hubspot.singularity.config.SingularityConfiguration)2 DeployManager (com.hubspot.singularity.data.DeployManager)2 RequestManager (com.hubspot.singularity.data.RequestManager)2 Comparator (java.util.Comparator)2 Map (java.util.Map)2 Collectors (java.util.stream.Collectors)2 Singleton (javax.inject.Singleton)2 Logger (org.slf4j.Logger)2 LoggerFactory (org.slf4j.LoggerFactory)2