Search in sources :

Example 6 with Status

use of com.cloud.host.Status in project cloudstack by apache.

the class AgentManagerImpl method handleDisconnectWithoutInvestigation.

protected boolean handleDisconnectWithoutInvestigation(final AgentAttache attache, final Status.Event event, final boolean transitState, final boolean removeAgent) {
    final long hostId = attache.getId();
    s_logger.info("Host " + hostId + " is disconnecting with event " + event);
    Status nextStatus = null;
    final HostVO host = _hostDao.findById(hostId);
    if (host == null) {
        s_logger.warn("Can't find host with " + hostId);
        nextStatus = Status.Removed;
    } else {
        final Status currentStatus = host.getStatus();
        if (currentStatus == Status.Down || currentStatus == Status.Alert || currentStatus == Status.Removed) {
            if (s_logger.isDebugEnabled()) {
                s_logger.debug("Host " + hostId + " is already " + currentStatus);
            }
            nextStatus = currentStatus;
        } else {
            try {
                nextStatus = currentStatus.getNextStatus(event);
            } catch (final NoTransitionException e) {
                final String err = "Cannot find next status for " + event + " as current status is " + currentStatus + " for agent " + hostId;
                s_logger.debug(err);
                throw new CloudRuntimeException(err);
            }
            if (s_logger.isDebugEnabled()) {
                s_logger.debug("The next status of agent " + hostId + "is " + nextStatus + ", current status is " + currentStatus);
            }
        }
    }
    if (s_logger.isDebugEnabled()) {
        s_logger.debug("Deregistering link for " + hostId + " with state " + nextStatus);
    }
    removeAgent(attache, nextStatus);
    // update the DB
    if (host != null && transitState) {
        disconnectAgent(host, event, _nodeId);
    }
    return true;
}
Also used : Status(com.cloud.host.Status) CloudRuntimeException(com.cloud.utils.exception.CloudRuntimeException) NoTransitionException(com.cloud.utils.fsm.NoTransitionException) HostVO(com.cloud.host.HostVO)

Example 7 with Status

use of com.cloud.host.Status in project cloudstack by apache.

the class AgentManagerImpl method handleDisconnectWithInvestigation.

protected boolean handleDisconnectWithInvestigation(final AgentAttache attache, Status.Event event) {
    final long hostId = attache.getId();
    HostVO host = _hostDao.findById(hostId);
    if (host != null) {
        Status nextStatus = null;
        try {
            nextStatus = host.getStatus().getNextStatus(event);
        } catch (final NoTransitionException ne) {
            /*
                 * Agent may be currently in status of Down, Alert, Removed, namely there is no next status for some events. Why this can happen? Ask God not me. I hate there was
                 * no piece of comment for code handling race condition. God knew what race condition the code dealt with!
                 */
            s_logger.debug("Caught exception while getting agent's next status", ne);
        }
        if (nextStatus == Status.Alert) {
            /* OK, we are going to the bad status, let's see what happened */
            s_logger.info("Investigating why host " + hostId + " has disconnected with event " + event);
            Status determinedState = investigate(attache);
            // if state cannot be determined do nothing and bail out
            if (determinedState == null) {
                if ((System.currentTimeMillis() >> 10) - host.getLastPinged() > AlertWait.value()) {
                    s_logger.warn("Agent " + hostId + " state cannot be determined for more than " + AlertWait + "(" + AlertWait.value() + ") seconds, will go to Alert state");
                    determinedState = Status.Alert;
                } else {
                    s_logger.warn("Agent " + hostId + " state cannot be determined, do nothing");
                    return false;
                }
            }
            final Status currentStatus = host.getStatus();
            s_logger.info("The agent from host " + hostId + " state determined is " + determinedState);
            if (determinedState == Status.Down) {
                final String message = "Host is down: " + host.getId() + "-" + host.getName() + ". Starting HA on the VMs";
                s_logger.error(message);
                if (host.getType() != Host.Type.SecondaryStorage && host.getType() != Host.Type.ConsoleProxy) {
                    _alertMgr.sendAlert(AlertManager.AlertType.ALERT_TYPE_HOST, host.getDataCenterId(), host.getPodId(), "Host down, " + host.getId(), message);
                }
                event = Status.Event.HostDown;
            } else if (determinedState == Status.Up) {
                /* Got ping response from host, bring it back */
                s_logger.info("Agent is determined to be up and running");
                agentStatusTransitTo(host, Status.Event.Ping, _nodeId);
                return false;
            } else if (determinedState == Status.Disconnected) {
                s_logger.warn("Agent is disconnected but the host is still up: " + host.getId() + "-" + host.getName());
                if (currentStatus == Status.Disconnected) {
                    if ((System.currentTimeMillis() >> 10) - host.getLastPinged() > AlertWait.value()) {
                        s_logger.warn("Host " + host.getId() + " has been disconnected past the wait time it should be disconnected.");
                        event = Status.Event.WaitedTooLong;
                    } else {
                        s_logger.debug("Host " + host.getId() + " has been determined to be disconnected but it hasn't passed the wait time yet.");
                        return false;
                    }
                } else if (currentStatus == Status.Up) {
                    final DataCenterVO dcVO = _dcDao.findById(host.getDataCenterId());
                    final HostPodVO podVO = _podDao.findById(host.getPodId());
                    final String hostDesc = "name: " + host.getName() + " (id:" + host.getId() + "), availability zone: " + dcVO.getName() + ", pod: " + podVO.getName();
                    if (host.getType() != Host.Type.SecondaryStorage && host.getType() != Host.Type.ConsoleProxy) {
                        _alertMgr.sendAlert(AlertManager.AlertType.ALERT_TYPE_HOST, host.getDataCenterId(), host.getPodId(), "Host disconnected, " + hostDesc, "If the agent for host [" + hostDesc + "] is not restarted within " + AlertWait + " seconds, host will go to Alert state");
                    }
                    event = Status.Event.AgentDisconnected;
                }
            } else {
                // if we end up here we are in alert state, send an alert
                final DataCenterVO dcVO = _dcDao.findById(host.getDataCenterId());
                final HostPodVO podVO = _podDao.findById(host.getPodId());
                final String podName = podVO != null ? podVO.getName() : "NO POD";
                final String hostDesc = "name: " + host.getName() + " (id:" + host.getId() + "), availability zone: " + dcVO.getName() + ", pod: " + podName;
                _alertMgr.sendAlert(AlertManager.AlertType.ALERT_TYPE_HOST, host.getDataCenterId(), host.getPodId(), "Host in ALERT state, " + hostDesc, "In availability zone " + host.getDataCenterId() + ", host is in alert state: " + host.getId() + "-" + host.getName());
            }
        } else {
            s_logger.debug("The next status of agent " + host.getId() + " is not Alert, no need to investigate what happened");
        }
    }
    handleDisconnectWithoutInvestigation(attache, event, true, true);
    // Maybe the host magically reappeared?
    host = _hostDao.findById(hostId);
    if (host != null && host.getStatus() == Status.Down) {
        _haMgr.scheduleRestartForVmsOnHost(host, true);
    }
    return true;
}
Also used : Status(com.cloud.host.Status) DataCenterVO(com.cloud.dc.DataCenterVO) NoTransitionException(com.cloud.utils.fsm.NoTransitionException) HostPodVO(com.cloud.dc.HostPodVO) HostVO(com.cloud.host.HostVO)

Example 8 with Status

use of com.cloud.host.Status in project cloudstack by apache.

the class CheckOnHostCommandTest method testGetStatus.

@Test
public void testGetStatus() {
    Status s = host.getStatus();
    assertTrue(s == Status.Up);
}
Also used : Status(com.cloud.host.Status) Test(org.junit.Test)

Example 9 with Status

use of com.cloud.host.Status in project cloudstack by apache.

the class CheckOnHostCommandTest method testGetState.

@Test
public void testGetState() {
    Status s = host.getState();
    assertTrue(s == Status.Up);
}
Also used : Status(com.cloud.host.Status) Test(org.junit.Test)

Example 10 with Status

use of com.cloud.host.Status in project cloudstack by apache.

the class HighAvailabilityManagerImpl method investigate.

@Override
public Status investigate(final long hostId) {
    final HostVO host = _hostDao.findById(hostId);
    if (host == null) {
        return Status.Alert;
    }
    Status hostState = null;
    for (Investigator investigator : investigators) {
        hostState = investigator.isAgentAlive(host);
        if (hostState != null) {
            if (s_logger.isDebugEnabled()) {
                s_logger.debug(investigator.getName() + " was able to determine host " + hostId + " is in " + hostState.toString());
            }
            return hostState;
        }
        if (s_logger.isDebugEnabled()) {
            s_logger.debug(investigator.getName() + " unable to determine the state of the host.  Moving on.");
        }
    }
    return hostState;
}
Also used : Status(com.cloud.host.Status) HostVO(com.cloud.host.HostVO)

Aggregations

Status (com.cloud.host.Status)10 HostVO (com.cloud.host.HostVO)6 Answer (com.cloud.agent.api.Answer)3 NoTransitionException (com.cloud.utils.fsm.NoTransitionException)3 AgentControlAnswer (com.cloud.agent.api.AgentControlAnswer)2 PingAnswer (com.cloud.agent.api.PingAnswer)2 ReadyAnswer (com.cloud.agent.api.ReadyAnswer)2 StartupAnswer (com.cloud.agent.api.StartupAnswer)2 UnsupportedAnswer (com.cloud.agent.api.UnsupportedAnswer)2 CloudRuntimeException (com.cloud.utils.exception.CloudRuntimeException)2 Test (org.junit.Test)2 CheckHealthCommand (com.cloud.agent.api.CheckHealthCommand)1 CheckOnHostCommand (com.cloud.agent.api.CheckOnHostCommand)1 DataCenterVO (com.cloud.dc.DataCenterVO)1 HostPodVO (com.cloud.dc.HostPodVO)1 AgentUnavailableException (com.cloud.exception.AgentUnavailableException)1 ConnectionException (com.cloud.exception.ConnectionException)1 OperationTimedoutException (com.cloud.exception.OperationTimedoutException)1 UnsupportedVersionException (com.cloud.exception.UnsupportedVersionException)1 Host (com.cloud.host.Host)1