Search in sources :

Example 16 with ZooKeeperOperation

use of com.spotify.helios.servicescommon.coordination.ZooKeeperOperation in project helios by spotify.

the class ZooKeeperMasterModel method deployJobRetry.

private void deployJobRetry(final ZooKeeperClient client, final String host, final Deployment deployment, int count, final String token) throws JobDoesNotExistException, JobAlreadyDeployedException, HostNotFoundException, JobPortAllocationConflictException, TokenVerificationException {
    if (count == 3) {
        throw new HeliosRuntimeException("3 failures (possibly concurrent modifications) while " + "deploying. Giving up.");
    }
    log.info("deploying {}: {} (retry={})", deployment, host, count);
    final JobId id = deployment.getJobId();
    final Job job = getJob(id);
    if (job == null) {
        throw new JobDoesNotExistException(id);
    }
    verifyToken(token, job);
    final UUID operationId = UUID.randomUUID();
    final String jobPath = Paths.configJob(id);
    try {
        Paths.configHostJob(host, id);
    } catch (IllegalArgumentException e) {
        throw new HostNotFoundException("Could not find Helios host '" + host + "'");
    }
    final String taskPath = Paths.configHostJob(host, id);
    final String taskCreationPath = Paths.configHostJobCreation(host, id, operationId);
    final List<Integer> staticPorts = staticPorts(job);
    final Map<String, byte[]> portNodes = Maps.newHashMap();
    final byte[] idJson = id.toJsonBytes();
    for (final int port : staticPorts) {
        final String path = Paths.configHostPort(host, port);
        portNodes.put(path, idJson);
    }
    final Task task = new Task(job, deployment.getGoal(), deployment.getDeployerUser(), deployment.getDeployerMaster(), deployment.getDeploymentGroupName());
    final List<ZooKeeperOperation> operations = Lists.newArrayList(check(jobPath), create(portNodes), create(Paths.configJobHost(id, host)));
    // Attempt to read a task here.
    try {
        client.getNode(taskPath);
        // if we get here the node exists already
        throw new JobAlreadyDeployedException(host, id);
    } catch (NoNodeException e) {
        operations.add(create(taskPath, task));
        operations.add(create(taskCreationPath));
    } catch (KeeperException e) {
        throw new HeliosRuntimeException("reading existing task description failed", e);
    }
    // TODO (dano): Failure handling is racy wrt agent and job modifications.
    try {
        client.transaction(operations);
        log.info("deployed {}: {} (retry={})", deployment, host, count);
    } catch (NoNodeException e) {
        // Either the job, the host or the task went away
        assertJobExists(client, id);
        assertHostExists(client, host);
        // If the job and host still exists, we likely tried to redeploy a job that had an UNDEPLOY
        // goal and lost the race with the agent removing the task before we could set it. Retry.
        deployJobRetry(client, host, deployment, count + 1, token);
    } catch (NodeExistsException e) {
        // Check for conflict due to transaction retry
        try {
            if (client.exists(taskCreationPath) != null) {
                // Our creation operation node existed, we're done here
                return;
            }
        } catch (KeeperException ex) {
            throw new HeliosRuntimeException("checking job deployment failed", ex);
        }
        try {
            // Check if the job was already deployed
            if (client.stat(taskPath) != null) {
                throw new JobAlreadyDeployedException(host, id);
            }
        } catch (KeeperException ex) {
            throw new HeliosRuntimeException("checking job deployment failed", e);
        }
        // Check for static port collisions
        for (final int port : staticPorts) {
            checkForPortConflicts(client, host, port, id);
        }
        // Catch all for logic and ephemeral issues
        throw new HeliosRuntimeException("deploying job failed", e);
    } catch (KeeperException e) {
        throw new HeliosRuntimeException("deploying job failed", e);
    }
}
Also used : Task(com.spotify.helios.common.descriptors.Task) RolloutTask(com.spotify.helios.common.descriptors.RolloutTask) NoNodeException(org.apache.zookeeper.KeeperException.NoNodeException) ZooKeeperOperation(com.spotify.helios.servicescommon.coordination.ZooKeeperOperation) NodeExistsException(org.apache.zookeeper.KeeperException.NodeExistsException) HeliosRuntimeException(com.spotify.helios.common.HeliosRuntimeException) Job(com.spotify.helios.common.descriptors.Job) UUID(java.util.UUID) JobId(com.spotify.helios.common.descriptors.JobId) KeeperException(org.apache.zookeeper.KeeperException)

Example 17 with ZooKeeperOperation

use of com.spotify.helios.servicescommon.coordination.ZooKeeperOperation in project helios by spotify.

the class AgentZooKeeperRegistrarTest method oldRegistrationExists_DifferentHostId.

@Test
@SuppressWarnings("unchecked")
public void oldRegistrationExists_DifferentHostId() throws Exception {
    // the hostname is claimed by a different ID...
    when(client.getData(idPath)).thenReturn("a different host".getBytes());
    // ... but the hostInfo was last updated more than TTL minutes ago
    final Stat hostInfo = new Stat();
    hostInfo.setMtime(clock.now().minus(Duration.standardMinutes(registrationTtl * 2)).getMillis());
    when(client.stat(hostPath)).thenReturn(hostInfo);
    // expect the old host to be deregistered and this registration to succeed
    final boolean success = registrar.tryToRegister(client);
    assertTrue(success);
    // expect a transaction containing a delete of the idpath followed by a create
    // // TODO (mbrown): this should really be in a test of ZooKeeperRegistrarUtil, and
    // AgentZooKeeperRegistrar should not call a static method to do this
    final ArgumentCaptor<List> opsCaptor = ArgumentCaptor.forClass(List.class);
    verify(client).transaction(opsCaptor.capture());
    // note that we are not testing full equality of the list, just that it contains
    // a few notable items
    final List<ZooKeeperOperation> actual = opsCaptor.getValue();
    assertThat(actual, hasItems(ZooKeeperOperations.delete(idPath), ZooKeeperOperations.create(idPath, hostId.getBytes())));
}
Also used : Stat(org.apache.zookeeper.data.Stat) ZooKeeperOperation(com.spotify.helios.servicescommon.coordination.ZooKeeperOperation) List(java.util.List) Test(org.junit.Test)

Example 18 with ZooKeeperOperation

use of com.spotify.helios.servicescommon.coordination.ZooKeeperOperation in project helios by spotify.

the class DeploymentGroupTest method testUpdateDeploymentGroupHosts.

// A test that ensures healthy deployment groups will perform a rolling update when their hosts
// change.
@Test
public void testUpdateDeploymentGroupHosts() throws Exception {
    final ZooKeeperClient client = spy(this.client);
    final ZooKeeperMasterModel masterModel = spy(newMasterModel(client));
    // Return a job so we can add a real deployment group.
    final Job job = Job.newBuilder().setCommand(ImmutableList.of("COMMAND")).setImage("IMAGE").setName("JOB_NAME").setVersion("VERSION").build();
    doReturn(job).when(masterModel).getJob(job.getId());
    // Add a real deployment group.
    final DeploymentGroup dg = DeploymentGroup.newBuilder().setName(GROUP_NAME).setHostSelectors(ImmutableList.of(HostSelector.parse("role=melmac"))).setJobId(job.getId()).setRolloutOptions(RolloutOptions.newBuilder().build()).setRollingUpdateReason(MANUAL).build();
    masterModel.addDeploymentGroup(dg);
    // Setup some hosts
    final String oldHost = "host1";
    final String newHost = "host2";
    client.ensurePath(Paths.configHost(oldHost));
    client.ensurePath(Paths.configHost(newHost));
    client.ensurePath(Paths.statusHostUp(oldHost));
    client.ensurePath(Paths.statusHostUp(newHost));
    // Give the deployment group a host.
    client.setData(Paths.statusDeploymentGroupHosts(dg.getName()), Json.asBytes(ImmutableList.of(oldHost)));
    // And a status...
    client.setData(Paths.statusDeploymentGroup(dg.getName()), DeploymentGroupStatus.newBuilder().setState(DONE).build().toJsonBytes());
    // Switch out our host!
    // TODO(negz): Use an unchanged host, make sure ordering remains the same.
    masterModel.updateDeploymentGroupHosts(dg.getName(), ImmutableList.of(newHost));
    verify(client, times(2)).transaction(opCaptor.capture());
    final DeploymentGroup changed = dg.toBuilder().setRollingUpdateReason(HOSTS_CHANGED).build();
    // Ensure we set the DG status to HOSTS_CHANGED.
    // This means we triggered a rolling update.
    final ZooKeeperOperation setDeploymentGroupHostChanged = set(Paths.configDeploymentGroup(dg.getName()), changed);
    // Ensure ZK tasks are written to:
    // - Perform a rolling undeploy for the removed (old) host
    // - Perform a rolling update for the added (new) host and the unchanged host
    final List<RolloutTask> tasks = ImmutableList.<RolloutTask>builder().addAll(RollingUndeployPlanner.of(changed).plan(singletonList(oldHost))).addAll(RollingUpdatePlanner.of(changed).plan(singletonList(newHost))).build();
    final ZooKeeperOperation setDeploymentGroupTasks = set(Paths.statusDeploymentGroupTasks(dg.getName()), DeploymentGroupTasks.newBuilder().setRolloutTasks(tasks).setTaskIndex(0).setDeploymentGroup(changed).build());
    assertThat(opCaptor.getValue(), hasItems(setDeploymentGroupHostChanged, setDeploymentGroupTasks));
}
Also used : ZooKeeperClient(com.spotify.helios.servicescommon.coordination.ZooKeeperClient) DefaultZooKeeperClient(com.spotify.helios.servicescommon.coordination.DefaultZooKeeperClient) ZooKeeperOperation(com.spotify.helios.servicescommon.coordination.ZooKeeperOperation) RolloutTask(com.spotify.helios.common.descriptors.RolloutTask) Job(com.spotify.helios.common.descriptors.Job) DeploymentGroup(com.spotify.helios.common.descriptors.DeploymentGroup) Test(org.junit.Test)

Example 19 with ZooKeeperOperation

use of com.spotify.helios.servicescommon.coordination.ZooKeeperOperation in project helios by spotify.

the class RollingUpdateOpFactoryTest method testNextTaskWithOps.

@Test
public void testNextTaskWithOps() {
    final DeploymentGroupTasks deploymentGroupTasks = DeploymentGroupTasks.newBuilder().setTaskIndex(0).setRolloutTasks(Lists.newArrayList(RolloutTask.of(RolloutTask.Action.UNDEPLOY_OLD_JOBS, "host1"), RolloutTask.of(RolloutTask.Action.AWAIT_RUNNING, "host1"), RolloutTask.of(RolloutTask.Action.DEPLOY_NEW_JOB, "host1"))).setDeploymentGroup(MANUAL_DEPLOYMENT_GROUP).build();
    final RollingUpdateOpFactory opFactory = new RollingUpdateOpFactory(deploymentGroupTasks, eventFactory);
    final ZooKeeperOperation mockOp = mock(ZooKeeperOperation.class);
    final RollingUpdateOp op = opFactory.nextTask(Lists.newArrayList(mockOp));
    // A nexTask op with ZK operations should result in advancing the task index
    // and also contain the specified ZK operations
    assertEquals(ImmutableSet.of(mockOp, new SetData("/status/deployment-group-tasks/my_group", deploymentGroupTasks.toBuilder().setTaskIndex(1).build().toJsonBytes())), ImmutableSet.copyOf(op.operations()));
    // This is not a no-op -> an event should be emitted
    assertEquals(1, op.events().size());
    verify(eventFactory).rollingUpdateTaskSucceeded(MANUAL_DEPLOYMENT_GROUP, deploymentGroupTasks.getRolloutTasks().get(deploymentGroupTasks.getTaskIndex()));
}
Also used : ZooKeeperOperation(com.spotify.helios.servicescommon.coordination.ZooKeeperOperation) DeploymentGroupTasks(com.spotify.helios.common.descriptors.DeploymentGroupTasks) SetData(com.spotify.helios.servicescommon.coordination.SetData) Test(org.junit.Test)

Aggregations

ZooKeeperOperation (com.spotify.helios.servicescommon.coordination.ZooKeeperOperation)19 HeliosRuntimeException (com.spotify.helios.common.HeliosRuntimeException)10 KeeperException (org.apache.zookeeper.KeeperException)10 NoNodeException (org.apache.zookeeper.KeeperException.NoNodeException)10 ZooKeeperClient (com.spotify.helios.servicescommon.coordination.ZooKeeperClient)9 RolloutTask (com.spotify.helios.common.descriptors.RolloutTask)7 Job (com.spotify.helios.common.descriptors.Job)6 DeploymentGroup (com.spotify.helios.common.descriptors.DeploymentGroup)5 DeploymentGroupStatus (com.spotify.helios.common.descriptors.DeploymentGroupStatus)5 Map (java.util.Map)5 Test (org.junit.Test)5 DeploymentGroupTasks (com.spotify.helios.common.descriptors.DeploymentGroupTasks)3 JobId (com.spotify.helios.common.descriptors.JobId)3 RollingUpdateOp (com.spotify.helios.rollingupdate.RollingUpdateOp)3 DefaultZooKeeperClient (com.spotify.helios.servicescommon.coordination.DefaultZooKeeperClient)3 UUID (java.util.UUID)3 Stat (org.apache.zookeeper.data.Stat)3 ImmutableMap (com.google.common.collect.ImmutableMap)2 HostStatus (com.spotify.helios.common.descriptors.HostStatus)2 Task (com.spotify.helios.common.descriptors.Task)2