Search in sources :

Example 16 with Status

use of com.cloud.host.Status in project cosmic by MissionCriticalCloud.

the class AgentManagerImpl method handleDisconnectWithoutInvestigation.

protected boolean handleDisconnectWithoutInvestigation(final AgentAttache attache, final Status.Event event, final boolean transitState, final boolean removeAgent) {
    final long hostId = attache.getId();
    s_logger.info("Host " + hostId + " is disconnecting with event " + event);
    Status nextStatus = null;
    final HostVO host = _hostDao.findById(hostId);
    if (host == null) {
        s_logger.warn("Can't find host with " + hostId);
        nextStatus = Status.Removed;
    } else {
        final Status currentStatus = host.getStatus();
        if (currentStatus == Status.Down || currentStatus == Status.Alert || currentStatus == Status.Removed) {
            if (s_logger.isDebugEnabled()) {
                s_logger.debug("Host " + hostId + " is already " + currentStatus);
            }
            nextStatus = currentStatus;
        } else {
            try {
                nextStatus = currentStatus.getNextStatus(event);
            } catch (final NoTransitionException e) {
                final String err = "Cannot find next status for " + event + " as current status is " + currentStatus + " for agent " + hostId;
                s_logger.debug(err);
                throw new CloudRuntimeException(err);
            }
            if (s_logger.isDebugEnabled()) {
                s_logger.debug("The next status of agent " + hostId + "is " + nextStatus + ", current status is " + currentStatus);
            }
        }
    }
    if (s_logger.isDebugEnabled()) {
        s_logger.debug("Deregistering link for " + hostId + " with state " + nextStatus);
    }
    removeAgent(attache, nextStatus);
    // update the DB
    if (host != null && transitState) {
        disconnectAgent(host, event, _nodeId);
    }
    return true;
}
Also used : Status(com.cloud.host.Status) CloudRuntimeException(com.cloud.utils.exception.CloudRuntimeException) NoTransitionException(com.cloud.utils.fsm.NoTransitionException) HostVO(com.cloud.host.HostVO)

Example 17 with Status

use of com.cloud.host.Status in project cosmic by MissionCriticalCloud.

the class AgentManagerImpl method easySend.

@Override
public Answer easySend(final Long hostId, final Command cmd) {
    try {
        final Host h = _hostDao.findById(hostId);
        if (h == null || h.getRemoved() != null) {
            s_logger.debug("Host with id " + hostId + " doesn't exist");
            return null;
        }
        final Status status = h.getStatus();
        if (!status.equals(Status.Up) && !status.equals(Status.Connecting)) {
            s_logger.debug("Can not send command " + cmd + " due to Host " + hostId + " is not up");
            return null;
        }
        final Answer answer = send(hostId, cmd);
        if (answer == null) {
            s_logger.warn("send returns null answer");
            return null;
        }
        if (s_logger.isDebugEnabled() && answer.getDetails() != null) {
            s_logger.debug("Details from executing " + cmd.getClass() + ": " + answer.getDetails());
        }
        return answer;
    } catch (final AgentUnavailableException e) {
        s_logger.warn(e.getMessage());
        return null;
    } catch (final OperationTimedoutException e) {
        s_logger.warn("Operation timed out: " + e.getMessage());
        return null;
    } catch (final Exception e) {
        s_logger.warn("Exception while sending", e);
        return null;
    }
}
Also used : Status(com.cloud.host.Status) UnsupportedAnswer(com.cloud.agent.api.UnsupportedAnswer) AgentControlAnswer(com.cloud.agent.api.AgentControlAnswer) Answer(com.cloud.agent.api.Answer) PingAnswer(com.cloud.agent.api.PingAnswer) ReadyAnswer(com.cloud.agent.api.ReadyAnswer) StartupAnswer(com.cloud.agent.api.StartupAnswer) OperationTimedoutException(com.cloud.exception.OperationTimedoutException) AgentUnavailableException(com.cloud.exception.AgentUnavailableException) Host(com.cloud.host.Host) ConnectionException(com.cloud.exception.ConnectionException) NoTransitionException(com.cloud.utils.fsm.NoTransitionException) AgentUnavailableException(com.cloud.exception.AgentUnavailableException) TaskExecutionException(com.cloud.utils.exception.TaskExecutionException) OperationTimedoutException(com.cloud.exception.OperationTimedoutException) InvocationTargetException(java.lang.reflect.InvocationTargetException) ConfigurationException(javax.naming.ConfigurationException) CloudRuntimeException(com.cloud.utils.exception.CloudRuntimeException) ClosedChannelException(java.nio.channels.ClosedChannelException) HypervisorVersionChangedException(com.cloud.utils.exception.HypervisorVersionChangedException) NioConnectionException(com.cloud.utils.exception.NioConnectionException) UnsupportedVersionException(com.cloud.exception.UnsupportedVersionException)

Example 18 with Status

use of com.cloud.host.Status in project cosmic by MissionCriticalCloud.

the class KvmInvestigator method isAgentAlive.

@Override
public Status isAgentAlive(final Host agent) {
    if (agent.getHypervisorType() != Hypervisor.HypervisorType.KVM) {
        return null;
    }
    Status hostStatus = null;
    Status neighbourStatus = null;
    final CheckOnHostCommand cmd = new CheckOnHostCommand(agent);
    try {
        final Answer answer = agentMgr.easySend(agent.getId(), cmd);
        if (answer != null) {
            hostStatus = answer.getResult() ? Status.Down : Status.Up;
        }
    } catch (final Exception e) {
        logger.debug("Failed to send command to host: " + agent.getId());
    }
    if (hostStatus == null) {
        hostStatus = Status.Disconnected;
    }
    final List<HostVO> neighbors = resourceMgr.listHostsInClusterByStatus(agent.getClusterId(), Status.Up);
    for (final HostVO neighbor : neighbors) {
        if (neighbor.getId() == agent.getId() || neighbor.getHypervisorType() != Hypervisor.HypervisorType.KVM) {
            continue;
        }
        logger.debug("Investigating host:" + agent.getId() + " via neighbouring host:" + neighbor.getId());
        try {
            final Answer answer = agentMgr.easySend(neighbor.getId(), cmd);
            if (answer != null) {
                neighbourStatus = answer.getResult() ? Status.Down : Status.Up;
                logger.debug("Neighbouring host:" + neighbor.getId() + " returned status:" + neighbourStatus + " for the investigated host:" + agent.getId());
                if (neighbourStatus == Status.Up) {
                    break;
                }
            }
        } catch (final Exception e) {
            logger.debug("Failed to send command to host: " + neighbor.getId());
        }
    }
    if (neighbourStatus == Status.Up && (hostStatus == Status.Disconnected || hostStatus == Status.Down)) {
        hostStatus = Status.Disconnected;
    }
    if (neighbourStatus == Status.Down && (hostStatus == Status.Disconnected || hostStatus == Status.Down)) {
        hostStatus = Status.Down;
    }
    return hostStatus;
}
Also used : Status(com.cloud.host.Status) CheckOnHostCommand(com.cloud.agent.api.CheckOnHostCommand) Answer(com.cloud.agent.api.Answer) HostVO(com.cloud.host.HostVO)

Example 19 with Status

use of com.cloud.host.Status in project cosmic by MissionCriticalCloud.

the class CheckOnHostCommandTest method testGetStatus.

@Test
public void testGetStatus() {
    final Status s = host.getStatus();
    assertTrue(s == Status.Up);
}
Also used : Status(com.cloud.host.Status) Test(org.junit.Test)

Example 20 with Status

use of com.cloud.host.Status in project cloudstack by apache.

the class AgentManagerImpl method handleDisconnectWithInvestigation.

protected boolean handleDisconnectWithInvestigation(final AgentAttache attache, Status.Event event) {
    final long hostId = attache.getId();
    HostVO host = _hostDao.findById(hostId);
    if (host != null) {
        Status nextStatus = null;
        try {
            nextStatus = host.getStatus().getNextStatus(event);
        } catch (final NoTransitionException ne) {
            /*
                 * Agent may be currently in status of Down, Alert, Removed, namely there is no next status for some events. Why this can happen? Ask God not me. I hate there was
                 * no piece of comment for code handling race condition. God knew what race condition the code dealt with!
                 */
            s_logger.debug("Caught exception while getting agent's next status", ne);
        }
        if (nextStatus == Status.Alert) {
            /* OK, we are going to the bad status, let's see what happened */
            s_logger.info("Investigating why host " + hostId + " has disconnected with event " + event);
            Status determinedState = investigate(attache);
            // if state cannot be determined do nothing and bail out
            if (determinedState == null) {
                if ((System.currentTimeMillis() >> 10) - host.getLastPinged() > AlertWait.value()) {
                    s_logger.warn("Agent " + hostId + " state cannot be determined for more than " + AlertWait + "(" + AlertWait.value() + ") seconds, will go to Alert state");
                    determinedState = Status.Alert;
                } else {
                    s_logger.warn("Agent " + hostId + " state cannot be determined, do nothing");
                    return false;
                }
            }
            final Status currentStatus = host.getStatus();
            s_logger.info("The agent from host " + hostId + " state determined is " + determinedState);
            if (determinedState == Status.Down) {
                final String message = "Host is down: " + host.getId() + "-" + host.getName() + ". Starting HA on the VMs";
                s_logger.error(message);
                if (host.getType() != Host.Type.SecondaryStorage && host.getType() != Host.Type.ConsoleProxy) {
                    _alertMgr.sendAlert(AlertManager.AlertType.ALERT_TYPE_HOST, host.getDataCenterId(), host.getPodId(), "Host down, " + host.getId(), message);
                }
                event = Status.Event.HostDown;
            } else if (determinedState == Status.Up) {
                /* Got ping response from host, bring it back */
                s_logger.info("Agent is determined to be up and running");
                agentStatusTransitTo(host, Status.Event.Ping, _nodeId);
                return false;
            } else if (determinedState == Status.Disconnected) {
                s_logger.warn("Agent is disconnected but the host is still up: " + host.getId() + "-" + host.getName());
                if (currentStatus == Status.Disconnected) {
                    if ((System.currentTimeMillis() >> 10) - host.getLastPinged() > AlertWait.value()) {
                        s_logger.warn("Host " + host.getId() + " has been disconnected past the wait time it should be disconnected.");
                        event = Status.Event.WaitedTooLong;
                    } else {
                        s_logger.debug("Host " + host.getId() + " has been determined to be disconnected but it hasn't passed the wait time yet.");
                        return false;
                    }
                } else if (currentStatus == Status.Up) {
                    final DataCenterVO dcVO = _dcDao.findById(host.getDataCenterId());
                    final HostPodVO podVO = _podDao.findById(host.getPodId());
                    final String hostDesc = "name: " + host.getName() + " (id:" + host.getId() + "), availability zone: " + dcVO.getName() + ", pod: " + podVO.getName();
                    if (host.getType() != Host.Type.SecondaryStorage && host.getType() != Host.Type.ConsoleProxy) {
                        _alertMgr.sendAlert(AlertManager.AlertType.ALERT_TYPE_HOST, host.getDataCenterId(), host.getPodId(), "Host disconnected, " + hostDesc, "If the agent for host [" + hostDesc + "] is not restarted within " + AlertWait + " seconds, host will go to Alert state");
                    }
                    event = Status.Event.AgentDisconnected;
                }
            } else {
                // if we end up here we are in alert state, send an alert
                final DataCenterVO dcVO = _dcDao.findById(host.getDataCenterId());
                final HostPodVO podVO = _podDao.findById(host.getPodId());
                final String podName = podVO != null ? podVO.getName() : "NO POD";
                final String hostDesc = "name: " + host.getName() + " (id:" + host.getId() + "), availability zone: " + dcVO.getName() + ", pod: " + podName;
                _alertMgr.sendAlert(AlertManager.AlertType.ALERT_TYPE_HOST, host.getDataCenterId(), host.getPodId(), "Host in ALERT state, " + hostDesc, "In availability zone " + host.getDataCenterId() + ", host is in alert state: " + host.getId() + "-" + host.getName());
            }
        } else {
            s_logger.debug("The next status of agent " + host.getId() + " is not Alert, no need to investigate what happened");
        }
    }
    handleDisconnectWithoutInvestigation(attache, event, true, true);
    // Maybe the host magically reappeared?
    host = _hostDao.findById(hostId);
    if (host != null && host.getStatus() == Status.Down) {
        _haMgr.scheduleRestartForVmsOnHost(host, true);
    }
    return true;
}
Also used : Status(com.cloud.host.Status) DataCenterVO(com.cloud.dc.DataCenterVO) NoTransitionException(com.cloud.utils.fsm.NoTransitionException) HostPodVO(com.cloud.dc.HostPodVO) HostVO(com.cloud.host.HostVO)

Aggregations

Status (com.cloud.host.Status)23 HostVO (com.cloud.host.HostVO)14 Answer (com.cloud.agent.api.Answer)9 NoTransitionException (com.cloud.utils.fsm.NoTransitionException)7 CloudRuntimeException (com.cloud.utils.exception.CloudRuntimeException)6 AgentControlAnswer (com.cloud.agent.api.AgentControlAnswer)5 PingAnswer (com.cloud.agent.api.PingAnswer)4 ReadyAnswer (com.cloud.agent.api.ReadyAnswer)4 StartupAnswer (com.cloud.agent.api.StartupAnswer)4 UnsupportedAnswer (com.cloud.agent.api.UnsupportedAnswer)4 ConnectionException (com.cloud.exception.ConnectionException)4 Host (com.cloud.host.Host)4 CheckOnHostCommand (com.cloud.agent.api.CheckOnHostCommand)3 AgentUnavailableException (com.cloud.exception.AgentUnavailableException)3 OperationTimedoutException (com.cloud.exception.OperationTimedoutException)3 Nic (com.cloud.vm.Nic)3 Test (org.junit.Test)3 AgentManager (com.cloud.agent.AgentManager)2 CheckHealthCommand (com.cloud.agent.api.CheckHealthCommand)2 Command (com.cloud.agent.api.Command)2