Search in sources :

Example 11 with TaskStatusEvent

use of com.spotify.helios.common.descriptors.TaskStatusEvent in project helios by spotify.

the class OldJobReaperTest method events.

private List<TaskStatusEvent> events(final List<Long> timestamps) {
    final ImmutableList.Builder<TaskStatusEvent> builder = ImmutableList.builder();
    // First sort by timestamps ascending
    final List<Long> copy = Lists.newArrayList(timestamps);
    Collections.sort(copy);
    for (final Long timestamp : timestamps) {
        final TaskStatus taskStatus = TaskStatus.newBuilder().setJob(DUMMY_JOB).setGoal(Goal.START).setState(State.RUNNING).build();
        builder.add(new TaskStatusEvent(taskStatus, timestamp, ""));
    }
    return builder.build();
}
Also used : TaskStatusEvent(com.spotify.helios.common.descriptors.TaskStatusEvent) ImmutableList(com.google.common.collect.ImmutableList) TaskStatus(com.spotify.helios.common.descriptors.TaskStatus)

Example 12 with TaskStatusEvent

use of com.spotify.helios.common.descriptors.TaskStatusEvent in project helios by spotify.

the class TaskHistoryWriterTest method testWriteWithZooKeeperDownAndInterveningCrash.

@Test
public void testWriteWithZooKeeperDownAndInterveningCrash() throws Exception {
    zk.stop();
    writer.saveHistoryItem(TASK_STATUS, TIMESTAMP);
    // simulate a crash by recreating the writer
    writer.stopAsync().awaitTerminated();
    makeWriter(client);
    zk.start();
    final TaskStatusEvent historyItem = Iterables.getOnlyElement(awaitHistoryItems());
    assertEquals(JOB_ID, historyItem.getStatus().getJob().getId());
}
Also used : TaskStatusEvent(com.spotify.helios.common.descriptors.TaskStatusEvent) Test(org.junit.Test)

Example 13 with TaskStatusEvent

use of com.spotify.helios.common.descriptors.TaskStatusEvent in project helios by spotify.

the class TaskHistoryWriter method run.

@Override
public void run() {
    while (true) {
        final TaskStatusEvent item = getNext();
        if (item == null) {
            return;
        }
        final JobId jobId = item.getStatus().getJob().getId();
        final String historyPath = Paths.historyJobHostEventsTimestamp(jobId, hostname, item.getTimestamp());
        try {
            log.debug("writing queued item to zookeeper {} {}", item.getStatus().getJob().getId(), item.getTimestamp());
            client.ensurePath(historyPath, true);
            client.createAndSetData(historyPath, item.getStatus().toJsonBytes());
            // See if too many
            final List<String> events = client.getChildren(Paths.historyJobHostEvents(jobId, hostname));
            if (events.size() > MAX_NUMBER_STATUS_EVENTS_TO_RETAIN) {
                trimStatusEvents(events, jobId);
            }
        } catch (NodeExistsException e) {
            // Ahh, the two generals problem...  We handle by doing nothing since the thing
            // we wanted in, is in.
            log.debug("item we wanted in is already there");
        } catch (ConnectionLossException e) {
            log.warn("Connection lost while putting item into zookeeper, will retry");
            putBack(item);
            break;
        } catch (KeeperException e) {
            log.error("Error putting item into zookeeper, will retry", e);
            putBack(item);
            break;
        }
    }
}
Also used : TaskStatusEvent(com.spotify.helios.common.descriptors.TaskStatusEvent) NodeExistsException(org.apache.zookeeper.KeeperException.NodeExistsException) ConnectionLossException(org.apache.zookeeper.KeeperException.ConnectionLossException) JobId(com.spotify.helios.common.descriptors.JobId) KeeperException(org.apache.zookeeper.KeeperException)

Example 14 with TaskStatusEvent

use of com.spotify.helios.common.descriptors.TaskStatusEvent in project helios by spotify.

the class HealthCheckTest method testContainerDiesDuringHealthcheck.

@Test
public void testContainerDiesDuringHealthcheck() throws Exception {
    startDefaultMaster();
    final HeliosClient client = defaultClient();
    startDefaultAgent(testHost(), "--service-registry=" + registryAddress);
    awaitHostStatus(client, testHost(), UP, LONG_WAIT_SECONDS, SECONDS);
    final HealthCheck healthCheck = TcpHealthCheck.of("health");
    final Job job = pokeJob(healthCheck);
    final JobId jobId = createJob(job);
    deployJob(jobId, testHost());
    awaitTaskState(jobId, testHost(), HEALTHCHECKING);
    // kill the underlying container
    final JobStatus jobStatus = getOrNull(client.jobStatus(jobId));
    final TaskStatus taskStatus = jobStatus.getTaskStatuses().get(testHost());
    getNewDockerClient().killContainer(taskStatus.getContainerId());
    // ensure the job is marked as failed
    final int timeout = WAIT_TIMEOUT_SECONDS;
    Polling.await(timeout, SECONDS, new Callable<Object>() {

        @Override
        public Object call() throws Exception {
            final TaskStatusEvents jobHistory = getOrNull(client.jobHistory(jobId));
            for (final TaskStatusEvent event : jobHistory.getEvents()) {
                if (event.getStatus().getState() == FAILED) {
                    return true;
                }
            }
            return null;
        }
    });
    // wait for the job to come back up and start healthchecking again
    awaitTaskState(jobId, testHost(), HEALTHCHECKING);
    pokeAndVerifyRegistration(client, jobId, timeout);
}
Also used : TaskStatusEvent(com.spotify.helios.common.descriptors.TaskStatusEvent) HttpHealthCheck(com.spotify.helios.common.descriptors.HttpHealthCheck) HealthCheck(com.spotify.helios.common.descriptors.HealthCheck) ExecHealthCheck(com.spotify.helios.common.descriptors.ExecHealthCheck) TcpHealthCheck(com.spotify.helios.common.descriptors.TcpHealthCheck) TaskStatusEvents(com.spotify.helios.common.protocol.TaskStatusEvents) HeliosClient(com.spotify.helios.client.HeliosClient) TaskStatus(com.spotify.helios.common.descriptors.TaskStatus) ServiceEndpoint(com.spotify.helios.common.descriptors.ServiceEndpoint) Endpoint(com.spotify.helios.serviceregistration.ServiceRegistration.Endpoint) IOException(java.io.IOException) JobStatus(com.spotify.helios.common.descriptors.JobStatus) Job(com.spotify.helios.common.descriptors.Job) JobId(com.spotify.helios.common.descriptors.JobId) Test(org.junit.Test)

Example 15 with TaskStatusEvent

use of com.spotify.helios.common.descriptors.TaskStatusEvent in project helios by spotify.

the class JobHistoryTest method testJobHistory.

@Test
public void testJobHistory() throws Exception {
    startDefaultMaster();
    final HeliosClient client = defaultClient();
    startDefaultAgent(testHost());
    awaitHostStatus(testHost(), Status.UP, LONG_WAIT_SECONDS, SECONDS);
    final JobId jobId = createJob(testJobName, testJobVersion, BUSYBOX, IDLE_COMMAND);
    deployJob(jobId, testHost());
    awaitJobState(client, testHost(), jobId, RUNNING, LONG_WAIT_SECONDS, SECONDS);
    undeployJob(jobId, testHost());
    awaitTaskGone(client, testHost(), jobId, LONG_WAIT_SECONDS, SECONDS);
    final TaskStatusEvents events = Polling.await(WAIT_TIMEOUT_SECONDS, SECONDS, new Callable<TaskStatusEvents>() {

        @Override
        public TaskStatusEvents call() throws Exception {
            final TaskStatusEvents events = client.jobHistory(jobId).get();
            final int size = events.getEvents().size();
            if (size == 0) {
                return null;
            }
            // We sometimes get more than one PULLING_IMAGE in the history if a pull tempfails.
            int requiredEventCount = -1;
            for (int i = 0; i < size; i++) {
                if (events.getEvents().get(i).getStatus().getState() != State.PULLING_IMAGE) {
                    requiredEventCount = i + 5;
                    break;
                }
            }
            if (requiredEventCount == -1) {
                return null;
            }
            if (size < requiredEventCount) {
                return null;
            }
            return events;
        }
    });
    final ListIterator<TaskStatusEvent> it = events.getEvents().listIterator();
    while (true) {
        final TaskStatusEvent event = it.next();
        if (event.getStatus().getState() != State.PULLING_IMAGE) {
            // rewind so that this event is the one returned by the next call to it.next() below
            it.previous();
            break;
        }
        assertThat(event, not(hasContainerId()));
    }
    assertThat(it.next(), allOf(hasState(State.CREATING), not(hasContainerId())));
    assertThat(it.next(), allOf(hasState(State.STARTING), hasContainerId()));
    assertThat(it.next(), hasState(State.RUNNING));
    assertThat(it.next(), hasState(State.STOPPING));
    assertThat(it.next(), hasState(State.EXITED, State.STOPPED));
}
Also used : TaskStatusEvent(com.spotify.helios.common.descriptors.TaskStatusEvent) TaskStatusEvents(com.spotify.helios.common.protocol.TaskStatusEvents) HeliosClient(com.spotify.helios.client.HeliosClient) JobId(com.spotify.helios.common.descriptors.JobId) Test(org.junit.Test)

Aggregations

TaskStatusEvent (com.spotify.helios.common.descriptors.TaskStatusEvent)15 JobId (com.spotify.helios.common.descriptors.JobId)8 Test (org.junit.Test)5 TaskStatus (com.spotify.helios.common.descriptors.TaskStatus)4 TaskStatusEvents (com.spotify.helios.common.protocol.TaskStatusEvents)4 IOException (java.io.IOException)4 Job (com.spotify.helios.common.descriptors.Job)3 HeliosClient (com.spotify.helios.client.HeliosClient)2 JobStatus (com.spotify.helios.common.descriptors.JobStatus)2 KeeperException (org.apache.zookeeper.KeeperException)2 ExceptionMetered (com.codahale.metrics.annotation.ExceptionMetered)1 Timed (com.codahale.metrics.annotation.Timed)1 ImmutableList (com.google.common.collect.ImmutableList)1 Table (com.spotify.helios.cli.Table)1 HeliosRuntimeException (com.spotify.helios.common.HeliosRuntimeException)1 Deployment (com.spotify.helios.common.descriptors.Deployment)1 ExecHealthCheck (com.spotify.helios.common.descriptors.ExecHealthCheck)1 HealthCheck (com.spotify.helios.common.descriptors.HealthCheck)1 HttpHealthCheck (com.spotify.helios.common.descriptors.HttpHealthCheck)1 ServiceEndpoint (com.spotify.helios.common.descriptors.ServiceEndpoint)1