Search in sources :

Example 11 with ZooKeeperOperation

use of com.spotify.helios.servicescommon.coordination.ZooKeeperOperation in project helios by spotify.

the class RollingUpdateOpFactory method nextTask.

public RollingUpdateOp nextTask(final List<ZooKeeperOperation> operations) {
    final List<ZooKeeperOperation> ops = Lists.newArrayList(operations);
    final List<Map<String, Object>> events = Lists.newArrayList();
    final RolloutTask task = tasks.getRolloutTasks().get(tasks.getTaskIndex());
    // Update the task index, delete tasks if done
    if (tasks.getTaskIndex() + 1 == tasks.getRolloutTasks().size()) {
        final DeploymentGroupStatus status = DeploymentGroupStatus.newBuilder().setState(DONE).build();
        // We are done -> delete tasks & update status
        ops.add(delete(Paths.statusDeploymentGroupTasks(deploymentGroup.getName())));
        ops.add(set(Paths.statusDeploymentGroup(deploymentGroup.getName()), status));
        // Emit an event signalling that we're DONE!
        events.add(eventFactory.rollingUpdateDone(deploymentGroup));
    } else {
        ops.add(set(Paths.statusDeploymentGroupTasks(deploymentGroup.getName()), tasks.toBuilder().setTaskIndex(tasks.getTaskIndex() + 1).build()));
        // the task was effectively a no-op.
        if (!operations.isEmpty()) {
            events.add(eventFactory.rollingUpdateTaskSucceeded(deploymentGroup, task));
        }
    }
    return new RollingUpdateOp(ImmutableList.copyOf(ops), ImmutableList.copyOf(events));
}
Also used : ZooKeeperOperation(com.spotify.helios.servicescommon.coordination.ZooKeeperOperation) RolloutTask(com.spotify.helios.common.descriptors.RolloutTask) DeploymentGroupStatus(com.spotify.helios.common.descriptors.DeploymentGroupStatus) Map(java.util.Map)

Example 12 with ZooKeeperOperation

use of com.spotify.helios.servicescommon.coordination.ZooKeeperOperation in project helios by spotify.

the class ZooKeeperMasterModel method removeDeploymentGroup.

/**
   * Remove a deployment group.
   *
   * <p>If successful, all ZK nodes associated with the DG will be deleted. Specifically these
   * nodes are guaranteed to be non-existent after a successful remove (not all of them might exist
   * before, though):
   * <ul>
   *   <li>/config/deployment-groups/[group-name]</li>
   *   <li>/status/deployment-groups/[group-name]</li>
   *   <li>/status/deployment-groups/[group-name]/hosts</li>
   *   <li>/status/deployment-groups/[group-name]/removed</li>
   *   <li>/status/deployment-group-tasks/[group-name]</li>
   * </ul>
   * If the operation fails no ZK nodes will be removed.
   *
   * @throws DeploymentGroupDoesNotExistException If the DG does not exist.
   */
@Override
public void removeDeploymentGroup(final String name) throws DeploymentGroupDoesNotExistException {
    log.info("removing deployment-group: name={}", name);
    final ZooKeeperClient client = provider.get("removeDeploymentGroup");
    try {
        client.ensurePath(Paths.configDeploymentGroups());
        client.ensurePath(Paths.statusDeploymentGroups());
        client.ensurePath(Paths.statusDeploymentGroupTasks());
        final List<ZooKeeperOperation> operations = Lists.newArrayList();
        final List<String> paths = ImmutableList.of(Paths.configDeploymentGroup(name), Paths.statusDeploymentGroup(name), Paths.statusDeploymentGroupHosts(name), Paths.statusDeploymentGroupRemovedHosts(name), Paths.statusDeploymentGroupTasks(name));
        // DGs to become slower and spam logs with errors so we want to avoid it.
        for (final String path : paths) {
            if (client.exists(path) == null) {
                operations.add(create(path));
            }
        }
        for (final String path : Lists.reverse(paths)) {
            operations.add(delete(path));
        }
        client.transaction(operations);
    } catch (final NoNodeException e) {
        throw new DeploymentGroupDoesNotExistException(name);
    } catch (final KeeperException e) {
        throw new HeliosRuntimeException("removing deployment-group " + name + " failed", e);
    }
}
Also used : NoNodeException(org.apache.zookeeper.KeeperException.NoNodeException) ZooKeeperClient(com.spotify.helios.servicescommon.coordination.ZooKeeperClient) ZooKeeperOperation(com.spotify.helios.servicescommon.coordination.ZooKeeperOperation) HeliosRuntimeException(com.spotify.helios.common.HeliosRuntimeException) KeeperException(org.apache.zookeeper.KeeperException)

Example 13 with ZooKeeperOperation

use of com.spotify.helios.servicescommon.coordination.ZooKeeperOperation in project helios by spotify.

the class ZooKeeperMasterModel method stopDeploymentGroup.

@Override
public void stopDeploymentGroup(final String deploymentGroupName) throws DeploymentGroupDoesNotExistException {
    checkNotNull(deploymentGroupName, "name");
    log.info("stop deployment-group: name={}", deploymentGroupName);
    final ZooKeeperClient client = provider.get("stopDeploymentGroup");
    // Delete deployment group tasks (if any) and set DG state to FAILED
    final DeploymentGroupStatus status = DeploymentGroupStatus.newBuilder().setState(FAILED).setError("Stopped by user").build();
    final String statusPath = Paths.statusDeploymentGroup(deploymentGroupName);
    final String tasksPath = Paths.statusDeploymentGroupTasks(deploymentGroupName);
    try {
        client.ensurePath(Paths.statusDeploymentGroupTasks());
        final List<ZooKeeperOperation> operations = Lists.newArrayList();
        // NOTE: This remove operation is racey. If tasks exist and the rollout finishes before the
        // delete() is executed then this will fail. Conversely, if it doesn't exist but is created
        // before the transaction is executed it will also fail. This is annoying for users, but at
        // least means we won't have inconsistent state.
        //
        // That the set() is first in the list of operations is important because of the
        // kludgy error checking we do below to disambiguate "doesn't exist" failures from the race
        // condition mentioned below.
        operations.add(set(statusPath, status));
        final Stat tasksStat = client.exists(tasksPath);
        if (tasksStat != null) {
            operations.add(delete(tasksPath));
        } else {
            // There doesn't seem to be a "check that node doesn't exist" operation so we
            // do a create and a delete on the same path to emulate it.
            operations.add(create(tasksPath));
            operations.add(delete(tasksPath));
        }
        client.transaction(operations);
    } catch (final NoNodeException e) {
        // Yes, the way you figure out which operation in a transaction failed is retarded.
        if (((OpResult.ErrorResult) e.getResults().get(0)).getErr() == KeeperException.Code.NONODE.intValue()) {
            throw new DeploymentGroupDoesNotExistException(deploymentGroupName);
        } else {
            throw new HeliosRuntimeException("stop deployment-group " + deploymentGroupName + " failed due to a race condition, please retry", e);
        }
    } catch (final KeeperException e) {
        throw new HeliosRuntimeException("stop deployment-group " + deploymentGroupName + " failed", e);
    }
}
Also used : Stat(org.apache.zookeeper.data.Stat) NoNodeException(org.apache.zookeeper.KeeperException.NoNodeException) ZooKeeperClient(com.spotify.helios.servicescommon.coordination.ZooKeeperClient) ZooKeeperOperation(com.spotify.helios.servicescommon.coordination.ZooKeeperOperation) HeliosRuntimeException(com.spotify.helios.common.HeliosRuntimeException) DeploymentGroupStatus(com.spotify.helios.common.descriptors.DeploymentGroupStatus) KeeperException(org.apache.zookeeper.KeeperException)

Example 14 with ZooKeeperOperation

use of com.spotify.helios.servicescommon.coordination.ZooKeeperOperation in project helios by spotify.

the class RollingUpdateOpFactory method error.

public RollingUpdateOp error(final String msg, final String host, final RollingUpdateError errorCode, final Map<String, Object> metadata) {
    final List<ZooKeeperOperation> operations = Lists.newArrayList();
    final String errMsg = isNullOrEmpty(host) ? msg : host + ": " + msg;
    final DeploymentGroupStatus status = DeploymentGroupStatus.newBuilder().setState(FAILED).setError(errMsg).build();
    // Delete tasks, set state to FAILED
    operations.add(delete(Paths.statusDeploymentGroupTasks(deploymentGroup.getName())));
    operations.add(set(Paths.statusDeploymentGroup(deploymentGroup.getName()), status));
    final RolloutTask task = tasks.getRolloutTasks().get(tasks.getTaskIndex());
    // Emit a FAILED event and a failed task event
    final List<Map<String, Object>> events = Lists.newArrayList();
    final Map<String, Object> taskEv = eventFactory.rollingUpdateTaskFailed(deploymentGroup, task, errMsg, errorCode, metadata);
    events.add(taskEv);
    events.add(eventFactory.rollingUpdateFailed(deploymentGroup, taskEv));
    return new RollingUpdateOp(ImmutableList.copyOf(operations), ImmutableList.copyOf(events));
}
Also used : ZooKeeperOperation(com.spotify.helios.servicescommon.coordination.ZooKeeperOperation) DeploymentGroupStatus(com.spotify.helios.common.descriptors.DeploymentGroupStatus) RolloutTask(com.spotify.helios.common.descriptors.RolloutTask) Map(java.util.Map)

Example 15 with ZooKeeperOperation

use of com.spotify.helios.servicescommon.coordination.ZooKeeperOperation in project helios by spotify.

the class ZooKeeperMasterModel method rollingUpdateUndeploy.

private RollingUpdateOp rollingUpdateUndeploy(final ZooKeeperClient client, final RollingUpdateOpFactory opFactory, final DeploymentGroup deploymentGroup, final String host, final boolean skipRedundantUndeploys) {
    final List<ZooKeeperOperation> operations = Lists.newArrayList();
    for (final Deployment deployment : getTasks(client, host).values()) {
        if (!ownedByDeploymentGroup(deployment, deploymentGroup) && !isMigration(deployment, deploymentGroup)) {
            continue;
        }
        if (skipRedundantUndeploys && redundantUndeployment(deployment, deploymentGroup)) {
            continue;
        }
        try {
            final String token = MoreObjects.firstNonNull(deploymentGroup.getRolloutOptions().getToken(), Job.EMPTY_TOKEN);
            operations.addAll(getUndeployOperations(client, host, deployment.getJobId(), token));
            log.debug("planned undeploy operations for job={}", deployment.getJobId());
        } catch (TokenVerificationException e) {
            return opFactory.error(e, host, RollingUpdateError.TOKEN_VERIFICATION_ERROR);
        } catch (HostNotFoundException e) {
            return opFactory.error(e, host, RollingUpdateError.HOST_NOT_FOUND);
        } catch (JobNotDeployedException e) {
        // probably somebody beat us to the punch of undeploying. that's fine.
        }
    }
    return opFactory.nextTask(operations);
}
Also used : ZooKeeperOperation(com.spotify.helios.servicescommon.coordination.ZooKeeperOperation) Deployment(com.spotify.helios.common.descriptors.Deployment)

Aggregations

ZooKeeperOperation (com.spotify.helios.servicescommon.coordination.ZooKeeperOperation)19 HeliosRuntimeException (com.spotify.helios.common.HeliosRuntimeException)10 KeeperException (org.apache.zookeeper.KeeperException)10 NoNodeException (org.apache.zookeeper.KeeperException.NoNodeException)10 ZooKeeperClient (com.spotify.helios.servicescommon.coordination.ZooKeeperClient)9 RolloutTask (com.spotify.helios.common.descriptors.RolloutTask)7 Job (com.spotify.helios.common.descriptors.Job)6 DeploymentGroup (com.spotify.helios.common.descriptors.DeploymentGroup)5 DeploymentGroupStatus (com.spotify.helios.common.descriptors.DeploymentGroupStatus)5 Map (java.util.Map)5 Test (org.junit.Test)5 DeploymentGroupTasks (com.spotify.helios.common.descriptors.DeploymentGroupTasks)3 JobId (com.spotify.helios.common.descriptors.JobId)3 RollingUpdateOp (com.spotify.helios.rollingupdate.RollingUpdateOp)3 DefaultZooKeeperClient (com.spotify.helios.servicescommon.coordination.DefaultZooKeeperClient)3 UUID (java.util.UUID)3 Stat (org.apache.zookeeper.data.Stat)3 ImmutableMap (com.google.common.collect.ImmutableMap)2 HostStatus (com.spotify.helios.common.descriptors.HostStatus)2 Task (com.spotify.helios.common.descriptors.Task)2