Search in sources :

Example 1 with WaitForState

use of org.apache.solr.client.solrj.request.CoreAdminRequest.WaitForState in project lucene-solr by apache.

the class RecoveryStrategy method sendPrepRecoveryCmd.

private final void sendPrepRecoveryCmd(String leaderBaseUrl, String leaderCoreName, Slice slice) throws SolrServerException, IOException, InterruptedException, ExecutionException {
    WaitForState prepCmd = new WaitForState();
    prepCmd.setCoreName(leaderCoreName);
    prepCmd.setNodeName(zkController.getNodeName());
    prepCmd.setCoreNodeName(coreZkNodeName);
    prepCmd.setState(Replica.State.RECOVERING);
    prepCmd.setCheckLive(true);
    prepCmd.setOnlyIfLeader(true);
    final Slice.State state = slice.getState();
    if (state != Slice.State.CONSTRUCTION && state != Slice.State.RECOVERY && state != Slice.State.RECOVERY_FAILED) {
        prepCmd.setOnlyIfLeaderActive(true);
    }
    final int maxTries = 30;
    for (int numTries = 0; numTries < maxTries; numTries++) {
        try {
            sendPrepRecoveryCmd(leaderBaseUrl, prepCmd);
            break;
        } catch (ExecutionException e) {
            if (e.getCause() instanceof SolrServerException) {
                SolrServerException solrException = (SolrServerException) e.getCause();
                if (solrException.getRootCause() instanceof SocketTimeoutException && numTries < maxTries) {
                    LOG.warn("Socket timeout on send prep recovery cmd, retrying.. ");
                    continue;
                }
            }
            throw e;
        }
    }
}
Also used : WaitForState(org.apache.solr.client.solrj.request.CoreAdminRequest.WaitForState) SocketTimeoutException(java.net.SocketTimeoutException) Slice(org.apache.solr.common.cloud.Slice) SolrServerException(org.apache.solr.client.solrj.SolrServerException) ExecutionException(java.util.concurrent.ExecutionException)

Example 2 with WaitForState

use of org.apache.solr.client.solrj.request.CoreAdminRequest.WaitForState in project lucene-solr by apache.

the class ZkController method waitForLeaderToSeeDownState.

private ZkCoreNodeProps waitForLeaderToSeeDownState(CoreDescriptor descriptor, final String coreZkNodeName) {
    // try not to wait too long here - if we are waiting too long, we should probably
    // move along and join the election
    CloudDescriptor cloudDesc = descriptor.getCloudDescriptor();
    String collection = cloudDesc.getCollectionName();
    String shard = cloudDesc.getShardId();
    ZkCoreNodeProps leaderProps = null;
    int retries = 2;
    for (int i = 0; i < retries; i++) {
        try {
            if (isClosed) {
                throw new SolrException(ErrorCode.SERVICE_UNAVAILABLE, "We have been closed");
            }
            // go straight to zk, not the cloud state - we want current info
            leaderProps = getLeaderProps(collection, shard, 5000);
            break;
        } catch (Exception e) {
            SolrException.log(log, "There was a problem finding the leader in zk", e);
            try {
                Thread.sleep(2000);
            } catch (InterruptedException e1) {
                Thread.currentThread().interrupt();
            }
            if (i == retries - 1) {
                throw new SolrException(ErrorCode.SERVER_ERROR, "There was a problem finding the leader in zk");
            }
        }
    }
    String leaderBaseUrl = leaderProps.getBaseUrl();
    String leaderCoreName = leaderProps.getCoreName();
    String myCoreNodeName = cloudDesc.getCoreNodeName();
    String myCoreName = descriptor.getName();
    String ourUrl = ZkCoreNodeProps.getCoreUrl(getBaseUrl(), myCoreName);
    boolean isLeader = leaderProps.getCoreUrl().equals(ourUrl);
    if (!isLeader && !SKIP_AUTO_RECOVERY) {
        // detect if this core is in leader-initiated recovery and if so, 
        // then we don't need the leader to wait on seeing the down state
        Replica.State lirState = null;
        try {
            lirState = getLeaderInitiatedRecoveryState(collection, shard, myCoreNodeName);
        } catch (Exception exc) {
            log.error("Failed to determine if replica " + myCoreNodeName + " is in leader-initiated recovery due to: " + exc, exc);
        }
        if (lirState != null) {
            log.debug("Replica " + myCoreNodeName + " is already in leader-initiated recovery, so not waiting for leader to see down state.");
        } else {
            log.info("Replica " + myCoreNodeName + " NOT in leader-initiated recovery, need to wait for leader to see down state.");
            try (HttpSolrClient client = new Builder(leaderBaseUrl).build()) {
                client.setConnectionTimeout(15000);
                client.setSoTimeout(120000);
                WaitForState prepCmd = new WaitForState();
                prepCmd.setCoreName(leaderCoreName);
                prepCmd.setNodeName(getNodeName());
                prepCmd.setCoreNodeName(coreZkNodeName);
                prepCmd.setState(Replica.State.DOWN);
                // let's retry a couple times - perhaps the leader just went down,
                // or perhaps he is just not quite ready for us yet
                retries = 2;
                for (int i = 0; i < retries; i++) {
                    if (isClosed) {
                        throw new SolrException(ErrorCode.SERVICE_UNAVAILABLE, "We have been closed");
                    }
                    try {
                        client.request(prepCmd);
                        break;
                    } catch (Exception e) {
                        // if the core container is shutdown, don't wait
                        if (cc.isShutDown()) {
                            throw new SolrException(ErrorCode.SERVICE_UNAVAILABLE, "Core container is shutdown.");
                        }
                        Throwable rootCause = SolrException.getRootCause(e);
                        if (rootCause instanceof IOException) {
                            // if there was a communication error talking to the leader, see if the leader is even alive
                            if (!zkStateReader.getClusterState().liveNodesContain(leaderProps.getNodeName())) {
                                throw new SolrException(ErrorCode.SERVICE_UNAVAILABLE, "Node " + leaderProps.getNodeName() + " hosting leader for " + shard + " in " + collection + " is not live!");
                            }
                        }
                        SolrException.log(log, "There was a problem making a request to the leader", e);
                        try {
                            Thread.sleep(2000);
                        } catch (InterruptedException e1) {
                            Thread.currentThread().interrupt();
                        }
                        if (i == retries - 1) {
                            throw new SolrException(ErrorCode.SERVER_ERROR, "There was a problem making a request to the leader");
                        }
                    }
                }
            } catch (IOException e) {
                SolrException.log(log, "Error closing HttpSolrClient", e);
            }
        }
    }
    return leaderProps;
}
Also used : WaitForState(org.apache.solr.client.solrj.request.CoreAdminRequest.WaitForState) ZkCoreNodeProps(org.apache.solr.common.cloud.ZkCoreNodeProps) Builder(org.apache.solr.client.solrj.impl.HttpSolrClient.Builder) IOException(java.io.IOException) Replica(org.apache.solr.common.cloud.Replica) TimeoutException(java.util.concurrent.TimeoutException) SolrException(org.apache.solr.common.SolrException) ZooKeeperException(org.apache.solr.common.cloud.ZooKeeperException) UnsupportedEncodingException(java.io.UnsupportedEncodingException) SessionExpiredException(org.apache.zookeeper.KeeperException.SessionExpiredException) ConnectionLossException(org.apache.zookeeper.KeeperException.ConnectionLossException) KeeperException(org.apache.zookeeper.KeeperException) IOException(java.io.IOException) UnknownHostException(java.net.UnknownHostException) NoNodeException(org.apache.zookeeper.KeeperException.NoNodeException) SolrCoreInitializationException(org.apache.solr.core.SolrCoreInitializationException) HttpSolrClient(org.apache.solr.client.solrj.impl.HttpSolrClient) SolrException(org.apache.solr.common.SolrException)

Aggregations

WaitForState (org.apache.solr.client.solrj.request.CoreAdminRequest.WaitForState)2 IOException (java.io.IOException)1 UnsupportedEncodingException (java.io.UnsupportedEncodingException)1 SocketTimeoutException (java.net.SocketTimeoutException)1 UnknownHostException (java.net.UnknownHostException)1 ExecutionException (java.util.concurrent.ExecutionException)1 TimeoutException (java.util.concurrent.TimeoutException)1 SolrServerException (org.apache.solr.client.solrj.SolrServerException)1 HttpSolrClient (org.apache.solr.client.solrj.impl.HttpSolrClient)1 Builder (org.apache.solr.client.solrj.impl.HttpSolrClient.Builder)1 SolrException (org.apache.solr.common.SolrException)1 Replica (org.apache.solr.common.cloud.Replica)1 Slice (org.apache.solr.common.cloud.Slice)1 ZkCoreNodeProps (org.apache.solr.common.cloud.ZkCoreNodeProps)1 ZooKeeperException (org.apache.solr.common.cloud.ZooKeeperException)1 SolrCoreInitializationException (org.apache.solr.core.SolrCoreInitializationException)1 KeeperException (org.apache.zookeeper.KeeperException)1 ConnectionLossException (org.apache.zookeeper.KeeperException.ConnectionLossException)1 NoNodeException (org.apache.zookeeper.KeeperException.NoNodeException)1 SessionExpiredException (org.apache.zookeeper.KeeperException.SessionExpiredException)1