Search in sources :

Example 1 with AsyncOperationTimeoutException

use of voldemort.client.protocol.admin.AsyncOperationTimeoutException in project voldemort by voldemort.

the class AdminStoreSwapper method invokeFetch.

public Map<Node, Response> invokeFetch(final String storeName, final String basePath, final long pushVersion) {
    // do fetch
    final Map<Integer, Future<String>> fetchDirs = new HashMap<Integer, Future<String>>();
    for (final Node node : cluster.getNodes()) {
        fetchDirs.put(node.getId(), executor.submit(new Callable<String>() {

            public String call() throws Exception {
                String response = null;
                if (buildPrimaryReplicasOnly) {
                    // Then we give the root directory to the server and let it decide what to fetch
                    response = fetch(basePath);
                } else {
                    // Old behavior: fetch the node directory only
                    String storeDir = basePath + "/" + ReadOnlyUtils.NODE_DIRECTORY_PREFIX + node.getId();
                    response = fetch(storeDir);
                }
                if (response == null)
                    throw new VoldemortException("Fetch request on " + node.briefToString() + " failed");
                logger.info("Fetch succeeded on " + node.briefToString());
                return response.trim();
            }

            private String fetch(String hadoopStoreDirToFetch) {
                // We need to keep the AdminClient instance separate in each Callable, so that a refresh of
                // the client in one callable does not refresh the AdminClient used by another callable.
                AdminClient currentAdminClient = AdminStoreSwapper.this.adminClient;
                int attempt = 1;
                while (attempt <= MAX_FETCH_ATTEMPTS) {
                    if (attempt > 1) {
                        logger.info("Fetch attempt " + attempt + "/" + MAX_FETCH_ATTEMPTS + " for " + node.briefToString() + ". Will wait " + WAIT_TIME_BETWEEN_FETCH_ATTEMPTS + " ms before going ahead.");
                        try {
                            Thread.sleep(WAIT_TIME_BETWEEN_FETCH_ATTEMPTS);
                        } catch (InterruptedException e) {
                            throw new VoldemortException(e);
                        }
                    }
                    logger.info("Invoking fetch for " + node.briefToString() + " for " + hadoopStoreDirToFetch);
                    try {
                        return currentAdminClient.readonlyOps.fetchStore(node.getId(), storeName, hadoopStoreDirToFetch, pushVersion, timeoutMs);
                    } catch (AsyncOperationTimeoutException e) {
                        throw e;
                    } catch (VoldemortException ve) {
                        if (attempt >= MAX_FETCH_ATTEMPTS) {
                            throw ve;
                        }
                        if (ExceptionUtils.recursiveClassEquals(ve, ExceptionUtils.BNP_SOFT_ERRORS)) {
                            String logMessage = "Got a " + ve.getClass().getSimpleName() + " from " + node.briefToString() + " while trying to fetch store '" + storeName + "'" + " (attempt " + attempt + "/" + MAX_FETCH_ATTEMPTS + ").";
                            if (currentAdminClient.isClusterModified()) {
                                logMessage += " It seems like the cluster.xml state has changed since this" + " AdminClient was constructed. Therefore, we will attempt constructing" + " a fresh AdminClient and retrying the fetch operation.";
                                currentAdminClient = currentAdminClient.getFreshClient();
                            } else {
                                logMessage += " The cluster.xml is up to date. We will retry with the same AdminClient.";
                            }
                            logger.info(logMessage);
                            attempt++;
                        } else {
                            throw ve;
                        }
                    }
                }
                // Defensive coding
                throw new IllegalStateException("Code should never reach here!");
            }
        }));
    }
    Map<Node, Response> fetchResponseMap = Maps.newTreeMap();
    boolean fetchErrors = false;
    /*
         * We wait for all fetches to complete successfully or throw any
         * Exception. We don't handle QuotaException in a special way here. The
         * idea is to protect the disk. It is okay to let the Bnp job run to
         * completion. We still want to delete data of a failed fetch in all
         * nodes that successfully fetched the data. After deleting the
         * failedFetch data, we bubble up the Quota Exception as needed.
         * 
         * The alternate is to cancel all future tasks as soon as we detect a
         * QuotaExceededException. This will save time (fail faster) and protect
         * the disk usage. But does not guarantee a clean state in all nodes wrt
         * to data from failed fetch. Someone manually needs to clean up all the
         * data from failedFetches. Instead we try to cleanup the data as much
         * as we can before we fail the job.
         * 
         * In iteration 2 we can try to improve this to fail faster, by adding
         * either/both:
         * 
         * 1. Client side checks 2. Server side takes care of failing fast as
         * soon as it detect QuotaExceededException in one of the servers. Note
         * that this needs careful decision on how to handle those fetches that
         * already started in other nodes and how & when to clean them up.
         */
    ArrayList<Node> failedNodes = new ArrayList<Node>();
    for (final Node node : cluster.getNodes()) {
        Future<String> val = fetchDirs.get(node.getId());
        try {
            String response = val.get();
            fetchResponseMap.put(node, new Response(response));
        } catch (Exception e) {
            if (e.getCause() instanceof UnauthorizedStoreException) {
                throw (UnauthorizedStoreException) e.getCause();
            } else {
                fetchErrors = true;
                fetchResponseMap.put(node, new Response(e));
                failedNodes.add(node);
            }
        }
    }
    if (fetchErrors) {
        // Log All the errors for the user
        for (Map.Entry<Node, Response> entry : fetchResponseMap.entrySet()) {
            if (!entry.getValue().isSuccessful()) {
                logger.error("Error on " + entry.getKey().briefToString() + " during push : ", entry.getValue().getException());
            }
        }
        Iterator<FailedFetchStrategy> strategyIterator = failedFetchStrategyList.iterator();
        boolean swapIsPossible = false;
        FailedFetchStrategy strategy = null;
        while (strategyIterator.hasNext() && !swapIsPossible) {
            strategy = strategyIterator.next();
            try {
                logger.info("About to attempt: " + strategy.toString());
                swapIsPossible = strategy.dealWithIt(storeName, pushVersion, fetchResponseMap);
                logger.info("Finished executing: " + strategy.toString() + "; swapIsPossible: " + swapIsPossible);
            } catch (Exception e) {
                if (strategyIterator.hasNext()) {
                    logger.error("Got an exception while trying to execute: " + strategy.toString() + ". Continuing with next strategy.", e);
                } else {
                    logger.error("Got an exception while trying to execute the last remaining strategy: " + strategy.toString() + ". Swap will be aborted.", e);
                }
            }
        }
        if (!swapIsPossible) {
            throw new VoldemortException("Exception during push. Swap will be aborted", fetchResponseMap.get(failedNodes.get(0)).getException());
        }
    }
    return fetchResponseMap;
}
Also used : UnauthorizedStoreException(voldemort.store.readonly.UnauthorizedStoreException) HashMap(java.util.HashMap) Node(voldemort.cluster.Node) ArrayList(java.util.ArrayList) VoldemortException(voldemort.VoldemortException) Callable(java.util.concurrent.Callable) AsyncOperationTimeoutException(voldemort.client.protocol.admin.AsyncOperationTimeoutException) AsyncOperationTimeoutException(voldemort.client.protocol.admin.AsyncOperationTimeoutException) UnauthorizedStoreException(voldemort.store.readonly.UnauthorizedStoreException) VoldemortException(voldemort.VoldemortException) Future(java.util.concurrent.Future) HashMap(java.util.HashMap) Map(java.util.Map) AdminClient(voldemort.client.protocol.admin.AdminClient)

Aggregations

ArrayList (java.util.ArrayList)1 HashMap (java.util.HashMap)1 Map (java.util.Map)1 Callable (java.util.concurrent.Callable)1 Future (java.util.concurrent.Future)1 VoldemortException (voldemort.VoldemortException)1 AdminClient (voldemort.client.protocol.admin.AdminClient)1 AsyncOperationTimeoutException (voldemort.client.protocol.admin.AsyncOperationTimeoutException)1 Node (voldemort.cluster.Node)1 UnauthorizedStoreException (voldemort.store.readonly.UnauthorizedStoreException)1