Search in sources :

Example 1 with TaskStatusEvent

use of com.spotify.helios.common.descriptors.TaskStatusEvent in project helios by spotify.

the class JobHistoryCommand method run.

@Override
int run(final Namespace options, final HeliosClient client, final PrintStream out, final boolean json, final BufferedReader stdin) throws ExecutionException, InterruptedException {
    final String jobIdString = options.getString(jobIdArg.getDest());
    final Map<JobId, Job> jobs = client.jobs(jobIdString).get();
    if (jobs.size() == 0) {
        out.printf("Unknown job: %s%n", jobIdString);
        return 1;
    } else if (jobs.size() > 1) {
        out.printf("Ambiguous job id: %s%n", jobIdString);
        return 1;
    }
    final JobId jobId = getLast(jobs.keySet());
    final TaskStatusEvents result = client.jobHistory(jobId).get();
    if (json) {
        out.println(Json.asPrettyStringUnchecked(result));
        return 0;
    }
    final Table table = table(out);
    table.row("HOST", "TIMESTAMP", "STATE", "THROTTLED", "CONTAINERID");
    final List<TaskStatusEvent> events = result.getEvents();
    final DateTimeFormatter format = DateTimeFormat.forPattern("YYYY-MM-dd HH:mm:ss.SSS");
    for (final TaskStatusEvent event : events) {
        final String host = checkNotNull(event.getHost());
        final long timestamp = event.getTimestamp();
        final TaskStatus status = checkNotNull(event.getStatus());
        final State state = checkNotNull(status.getState());
        String containerId = status.getContainerId();
        containerId = containerId == null ? "<none>" : containerId;
        table.row(host, format.print(timestamp), state, status.getThrottled(), containerId);
    }
    table.print();
    return 0;
}
Also used : TaskStatusEvent(com.spotify.helios.common.descriptors.TaskStatusEvent) Table(com.spotify.helios.cli.Table) State(com.spotify.helios.common.descriptors.TaskStatus.State) TaskStatusEvents(com.spotify.helios.common.protocol.TaskStatusEvents) Job(com.spotify.helios.common.descriptors.Job) TaskStatus(com.spotify.helios.common.descriptors.TaskStatus) DateTimeFormatter(org.joda.time.format.DateTimeFormatter) JobId(com.spotify.helios.common.descriptors.JobId)

Example 2 with TaskStatusEvent

use of com.spotify.helios.common.descriptors.TaskStatusEvent in project helios by spotify.

the class TaskHistoryWriterTest method testSimpleWorkage.

@Test
public void testSimpleWorkage() throws Exception {
    writer.saveHistoryItem(TASK_STATUS, TIMESTAMP);
    final TaskStatusEvent historyItem = Iterables.getOnlyElement(awaitHistoryItems());
    assertEquals(JOB_ID, historyItem.getStatus().getJob().getId());
}
Also used : TaskStatusEvent(com.spotify.helios.common.descriptors.TaskStatusEvent) Test(org.junit.Test)

Example 3 with TaskStatusEvent

use of com.spotify.helios.common.descriptors.TaskStatusEvent in project helios by spotify.

the class TaskHistoryWriterTest method testWriteWithZooKeeperDown.

@Test
public void testWriteWithZooKeeperDown() throws Exception {
    zk.stop();
    writer.saveHistoryItem(TASK_STATUS, TIMESTAMP);
    zk.start();
    final TaskStatusEvent historyItem = Iterables.getOnlyElement(awaitHistoryItems());
    assertEquals(JOB_ID, historyItem.getStatus().getJob().getId());
}
Also used : TaskStatusEvent(com.spotify.helios.common.descriptors.TaskStatusEvent) Test(org.junit.Test)

Example 4 with TaskStatusEvent

use of com.spotify.helios.common.descriptors.TaskStatusEvent in project helios by spotify.

the class ZooKeeperMasterModel method getJobHistory.

/**
 * Given a jobId and host, returns the N most recent events in its history on that host in the
 * cluster.
 */
@Override
public List<TaskStatusEvent> getJobHistory(final JobId jobId, final String host) throws JobDoesNotExistException {
    final Job descriptor = getJob(jobId);
    if (descriptor == null) {
        throw new JobDoesNotExistException(jobId);
    }
    final ZooKeeperClient client = provider.get("getJobHistory");
    final List<String> hosts;
    try {
        hosts = (!isNullOrEmpty(host)) ? singletonList(host) : client.getChildren(Paths.historyJobHosts(jobId));
    } catch (NoNodeException e) {
        return emptyList();
    } catch (KeeperException e) {
        throw new RuntimeException(e);
    }
    final List<TaskStatusEvent> jsEvents = Lists.newArrayList();
    for (final String h : hosts) {
        final List<String> events;
        try {
            events = client.getChildren(Paths.historyJobHostEvents(jobId, h));
        } catch (NoNodeException e) {
            continue;
        } catch (KeeperException e) {
            throw new RuntimeException(e);
        }
        for (final String event : events) {
            try {
                final byte[] data = client.getData(Paths.historyJobHostEventsTimestamp(jobId, h, Long.valueOf(event)));
                final TaskStatus status = Json.read(data, TaskStatus.class);
                jsEvents.add(new TaskStatusEvent(status, Long.valueOf(event), h));
            } catch (NoNodeException e) {
            // ignore, it went away before we read it
            } catch (KeeperException | IOException e) {
                throw new RuntimeException(e);
            }
        }
    }
    return Ordering.from(EVENT_COMPARATOR).sortedCopy(jsEvents);
}
Also used : TaskStatusEvent(com.spotify.helios.common.descriptors.TaskStatusEvent) NoNodeException(org.apache.zookeeper.KeeperException.NoNodeException) IOException(java.io.IOException) TaskStatus(com.spotify.helios.common.descriptors.TaskStatus) HeliosRuntimeException(com.spotify.helios.common.HeliosRuntimeException) ZooKeeperClient(com.spotify.helios.servicescommon.coordination.ZooKeeperClient) Job(com.spotify.helios.common.descriptors.Job) KeeperException(org.apache.zookeeper.KeeperException)

Example 5 with TaskStatusEvent

use of com.spotify.helios.common.descriptors.TaskStatusEvent in project helios by spotify.

the class OldJobReaper method processItem.

@Override
void processItem(final Job job) {
    final JobId jobId = job.getId();
    try {
        final JobStatus jobStatus = masterModel.getJobStatus(jobId);
        if (jobStatus == null) {
            log.warn("Couldn't find job status for {} because job has already been deleted. Skipping.", jobId);
            return;
        }
        final Map<String, Deployment> deployments = jobStatus.getDeployments();
        final List<TaskStatusEvent> events = masterModel.getJobHistory(jobId);
        boolean reap;
        if (deployments.isEmpty()) {
            if (events.isEmpty()) {
                final Long created = job.getCreated();
                if (created == null) {
                    log.info("Marked job '{}' for reaping (not deployed, no history, no creation date)", jobId);
                    reap = true;
                } else if ((clock.now().getMillis() - created) > retentionMillis) {
                    log.info("Marked job '{}' for reaping (not deployed, no history, creation date " + "of {} before retention time of {} days)", jobId, DATE_FORMATTER.print(created), retentionDays);
                    reap = true;
                } else {
                    log.info("NOT reaping job '{}' (not deployed, no history, creation date of {} after " + "retention time of {} days)", jobId, DATE_FORMATTER.print(created), retentionDays);
                    reap = false;
                }
            } else {
                // Get the last event which is the most recent
                final TaskStatusEvent event = events.get(events.size() - 1);
                final String eventDate = DATE_FORMATTER.print(event.getTimestamp());
                // Calculate the amount of time in milliseconds that has elapsed since the last event
                final long unusedDurationMillis = clock.now().getMillis() - event.getTimestamp();
                // recently should NOT BE reaped
                if (unusedDurationMillis > retentionMillis && !jobsInDeploymentGroups.contains(jobId)) {
                    log.info("Marked job '{}' for reaping (not deployed, has history whose last event " + "on {} was before the retention time of {} days)", jobId, eventDate, retentionDays);
                    reap = true;
                } else {
                    log.info("NOT reaping job '{}' (not deployed, has history whose last event " + "on {} was after the retention time of {} days)", jobId, eventDate, retentionDays);
                    reap = false;
                }
            }
        } else {
            // A job that's deployed should NOT BE reaped regardless of its history or creation date
            reap = false;
        }
        if (reap) {
            try {
                log.info("reaping old job '{}'", job.getId());
                masterModel.removeJob(jobId, job.getToken());
            } catch (Exception e) {
                log.warn("Failed to reap old job '{}'", jobId, e);
            }
        }
    } catch (Exception e) {
        log.warn("Failed to determine if job '{}' should be reaped", jobId, e);
    }
}
Also used : JobStatus(com.spotify.helios.common.descriptors.JobStatus) TaskStatusEvent(com.spotify.helios.common.descriptors.TaskStatusEvent) Deployment(com.spotify.helios.common.descriptors.Deployment) JobId(com.spotify.helios.common.descriptors.JobId)

Aggregations

TaskStatusEvent (com.spotify.helios.common.descriptors.TaskStatusEvent)15 JobId (com.spotify.helios.common.descriptors.JobId)8 Test (org.junit.Test)5 TaskStatus (com.spotify.helios.common.descriptors.TaskStatus)4 TaskStatusEvents (com.spotify.helios.common.protocol.TaskStatusEvents)4 IOException (java.io.IOException)4 Job (com.spotify.helios.common.descriptors.Job)3 HeliosClient (com.spotify.helios.client.HeliosClient)2 JobStatus (com.spotify.helios.common.descriptors.JobStatus)2 KeeperException (org.apache.zookeeper.KeeperException)2 ExceptionMetered (com.codahale.metrics.annotation.ExceptionMetered)1 Timed (com.codahale.metrics.annotation.Timed)1 ImmutableList (com.google.common.collect.ImmutableList)1 Table (com.spotify.helios.cli.Table)1 HeliosRuntimeException (com.spotify.helios.common.HeliosRuntimeException)1 Deployment (com.spotify.helios.common.descriptors.Deployment)1 ExecHealthCheck (com.spotify.helios.common.descriptors.ExecHealthCheck)1 HealthCheck (com.spotify.helios.common.descriptors.HealthCheck)1 HttpHealthCheck (com.spotify.helios.common.descriptors.HttpHealthCheck)1 ServiceEndpoint (com.spotify.helios.common.descriptors.ServiceEndpoint)1