Search in sources :

Example 46 with TimeValue

use of io.crate.common.unit.TimeValue in project crate by crate.

the class RecoveryState method toXContent.

@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
    builder.field(Fields.ID, shardId.id());
    builder.field(Fields.TYPE, recoverySource.getType());
    builder.field(Fields.STAGE, stage.toString());
    builder.field(Fields.PRIMARY, primary);
    builder.timeField(Fields.START_TIME_IN_MILLIS, Fields.START_TIME, timer.startTime);
    if (timer.stopTime > 0) {
        builder.timeField(Fields.STOP_TIME_IN_MILLIS, Fields.STOP_TIME, timer.stopTime);
    }
    builder.humanReadableField(Fields.TOTAL_TIME_IN_MILLIS, Fields.TOTAL_TIME, new TimeValue(timer.time()));
    if (recoverySource.getType() == RecoverySource.Type.PEER) {
        builder.startObject(Fields.SOURCE);
        builder.field(Fields.ID, sourceNode.getId());
        builder.field(Fields.HOST, sourceNode.getHostName());
        builder.field(Fields.TRANSPORT_ADDRESS, sourceNode.getAddress().toString());
        builder.field(Fields.IP, sourceNode.getHostAddress());
        builder.field(Fields.NAME, sourceNode.getName());
        builder.endObject();
    } else {
        builder.startObject(Fields.SOURCE);
        recoverySource.addAdditionalFields(builder, params);
        builder.endObject();
    }
    builder.startObject(Fields.TARGET);
    builder.field(Fields.ID, targetNode.getId());
    builder.field(Fields.HOST, targetNode.getHostName());
    builder.field(Fields.TRANSPORT_ADDRESS, targetNode.getAddress().toString());
    builder.field(Fields.IP, targetNode.getHostAddress());
    builder.field(Fields.NAME, targetNode.getName());
    builder.endObject();
    builder.startObject(Fields.INDEX);
    index.toXContent(builder, params);
    builder.endObject();
    builder.startObject(Fields.TRANSLOG);
    translog.toXContent(builder, params);
    builder.endObject();
    builder.startObject(Fields.VERIFY_INDEX);
    verifyIndex.toXContent(builder, params);
    builder.endObject();
    return builder;
}
Also used : TimeValue(io.crate.common.unit.TimeValue)

Example 47 with TimeValue

use of io.crate.common.unit.TimeValue in project crate by crate.

the class RemoteRecoveryTargetHandler method executeRetryableAction.

private <T extends TransportResponse> void executeRetryableAction(String action, TransportRequest request, TransportRequestOptions options, ActionListener<T> actionListener, Writeable.Reader<T> reader) {
    final Object key = new Object();
    final ActionListener<T> removeListener = ActionListener.runBefore(actionListener, () -> onGoingRetryableActions.remove(key));
    final TimeValue initialDelay = TimeValue.timeValueMillis(200);
    final TimeValue timeout = recoverySettings.internalActionRetryTimeout();
    final RetryableAction<T> retryableAction = new RetryableAction<T>(LOGGER, threadPool, initialDelay, timeout, removeListener) {

        @Override
        public void tryAction(ActionListener<T> listener) {
            transportService.sendRequest(targetNode, action, request, options, new ActionListenerResponseHandler<>(listener, reader, ThreadPool.Names.GENERIC));
        }

        @Override
        public boolean shouldRetry(Exception e) {
            return retryableException(e);
        }
    };
    onGoingRetryableActions.put(key, retryableAction);
    retryableAction.run();
    if (isCancelled) {
        retryableAction.cancel(new CancellableThreads.ExecutionCancelledException("recovery was cancelled"));
    }
}
Also used : RetryableAction(org.elasticsearch.action.support.RetryableAction) CancellableThreads(org.elasticsearch.common.util.CancellableThreads) ActionListener(org.elasticsearch.action.ActionListener) TimeValue(io.crate.common.unit.TimeValue) ElasticsearchException(org.elasticsearch.ElasticsearchException) IOException(java.io.IOException) RemoteTransportException(org.elasticsearch.transport.RemoteTransportException) CircuitBreakingException(org.elasticsearch.common.breaker.CircuitBreakingException) EsRejectedExecutionException(org.elasticsearch.common.util.concurrent.EsRejectedExecutionException)

Example 48 with TimeValue

use of io.crate.common.unit.TimeValue in project crate by crate.

the class PeerRecoveryTargetService method doRecovery.

private void doRecovery(final long recoveryId) {
    final StartRecoveryRequest request;
    final RecoveryState.Timer timer;
    CancellableThreads cancellableThreads;
    try (RecoveryRef recoveryRef = onGoingRecoveries.getRecovery(recoveryId)) {
        if (recoveryRef == null) {
            LOGGER.trace("not running recovery with id [{}] - can not find it (probably finished)", recoveryId);
            return;
        }
        final RecoveryTarget recoveryTarget = recoveryRef.target();
        timer = recoveryTarget.state().getTimer();
        cancellableThreads = recoveryTarget.cancellableThreads();
        try {
            assert recoveryTarget.sourceNode() != null : "can not do a recovery without a source node";
            LOGGER.trace("{} preparing shard for peer recovery", recoveryTarget.shardId());
            recoveryTarget.indexShard().prepareForIndexRecovery();
            final long startingSeqNo = recoveryTarget.indexShard().recoverLocallyUpToGlobalCheckpoint();
            assert startingSeqNo == UNASSIGNED_SEQ_NO || recoveryTarget.state().getStage() == RecoveryState.Stage.TRANSLOG : "unexpected recovery stage [" + recoveryTarget.state().getStage() + "] starting seqno [ " + startingSeqNo + "]";
            request = getStartRecoveryRequest(LOGGER, clusterService.localNode(), recoveryTarget, startingSeqNo);
        } catch (final Exception e) {
            // this will be logged as warning later on...
            LOGGER.trace("unexpected error while preparing shard for peer recovery, failing recovery", e);
            onGoingRecoveries.failRecovery(recoveryId, new RecoveryFailedException(recoveryTarget.state(), "failed to prepare shard for recovery", e), true);
            return;
        }
    }
    Consumer<Exception> handleException = e -> {
        if (LOGGER.isTraceEnabled()) {
            LOGGER.trace(() -> new ParameterizedMessage("[{}][{}] Got exception on recovery", request.shardId().getIndex().getName(), request.shardId().id()), e);
        }
        Throwable cause = SQLExceptions.unwrap(e);
        if (cause instanceof CancellableThreads.ExecutionCancelledException) {
            // this can also come from the source wrapped in a RemoteTransportException
            onGoingRecoveries.failRecovery(recoveryId, new RecoveryFailedException(request, "source has canceled the recovery", cause), false);
            return;
        }
        if (cause instanceof RecoveryEngineException) {
            // unwrap an exception that was thrown as part of the recovery
            cause = cause.getCause();
        }
        // do it twice, in case we have double transport exception
        cause = SQLExceptions.unwrap(cause);
        if (cause instanceof RecoveryEngineException) {
            // unwrap an exception that was thrown as part of the recovery
            cause = cause.getCause();
        }
        if (cause instanceof IllegalIndexShardStateException || cause instanceof IndexNotFoundException || cause instanceof ShardNotFoundException) {
            // if the target is not ready yet, retry
            retryRecovery(recoveryId, "remote shard not ready", recoverySettings.retryDelayStateSync(), recoverySettings.activityTimeout());
            return;
        }
        if (cause instanceof DelayRecoveryException) {
            retryRecovery(recoveryId, cause, recoverySettings.retryDelayStateSync(), recoverySettings.activityTimeout());
            return;
        }
        if (cause instanceof ConnectTransportException) {
            LOGGER.debug("delaying recovery of {} for [{}] due to networking error [{}]", request.shardId(), recoverySettings.retryDelayNetwork(), cause.getMessage());
            retryRecovery(recoveryId, cause.getMessage(), recoverySettings.retryDelayNetwork(), recoverySettings.activityTimeout());
            return;
        }
        if (cause instanceof AlreadyClosedException) {
            onGoingRecoveries.failRecovery(recoveryId, new RecoveryFailedException(request, "source shard is closed", cause), false);
            return;
        }
        onGoingRecoveries.failRecovery(recoveryId, new RecoveryFailedException(request, e), true);
    };
    try {
        LOGGER.trace("{} starting recovery from {}", request.shardId(), request.sourceNode());
        cancellableThreads.executeIO(() -> transportService.sendRequest(request.sourceNode(), PeerRecoverySourceService.Actions.START_RECOVERY, request, new TransportResponseHandler<RecoveryResponse>() {

            @Override
            public void handleResponse(RecoveryResponse recoveryResponse) {
                final TimeValue recoveryTime = new TimeValue(timer.time());
                // do this through ongoing recoveries to remove it from the collection
                onGoingRecoveries.markRecoveryAsDone(recoveryId);
                if (LOGGER.isTraceEnabled()) {
                    StringBuilder sb = new StringBuilder();
                    sb.append('[').append(request.shardId().getIndex().getName()).append(']').append('[').append(request.shardId().id()).append("] ");
                    sb.append("recovery completed from ").append(request.sourceNode()).append(", took[").append(recoveryTime).append("]\n");
                    sb.append("   phase1: recovered_files [").append(recoveryResponse.phase1FileNames.size()).append("]").append(" with total_size of [").append(new ByteSizeValue(recoveryResponse.phase1TotalSize)).append("]").append(", took [").append(timeValueMillis(recoveryResponse.phase1Time)).append("], throttling_wait [").append(timeValueMillis(recoveryResponse.phase1ThrottlingWaitTime)).append(']').append("\n");
                    sb.append("         : reusing_files   [").append(recoveryResponse.phase1ExistingFileNames.size()).append("] with total_size of [").append(new ByteSizeValue(recoveryResponse.phase1ExistingTotalSize)).append("]\n");
                    sb.append("   phase2: start took [").append(timeValueMillis(recoveryResponse.startTime)).append("]\n");
                    sb.append("         : recovered [").append(recoveryResponse.phase2Operations).append("]").append(" transaction log operations").append(", took [").append(timeValueMillis(recoveryResponse.phase2Time)).append("]").append("\n");
                    LOGGER.trace("{}", sb);
                } else {
                    LOGGER.debug("{} recovery done from [{}], took [{}]", request.shardId(), request.sourceNode(), recoveryTime);
                }
            }

            @Override
            public void handleException(TransportException e) {
                handleException.accept(e);
            }

            @Override
            public String executor() {
                // we do some heavy work like refreshes in the response so fork off to the generic threadpool
                return ThreadPool.Names.GENERIC;
            }

            @Override
            public RecoveryResponse read(StreamInput in) throws IOException {
                return new RecoveryResponse(in);
            }
        }));
    } catch (CancellableThreads.ExecutionCancelledException e) {
        LOGGER.trace("recovery cancelled", e);
    } catch (Exception e) {
        handleException.accept(e);
    }
}
Also used : ElasticsearchException(org.elasticsearch.ElasticsearchException) CancellableThreads(org.elasticsearch.common.util.CancellableThreads) ShardId(org.elasticsearch.index.shard.ShardId) TransportChannel(org.elasticsearch.transport.TransportChannel) IndexMetadata(org.elasticsearch.cluster.metadata.IndexMetadata) ClusterService(org.elasticsearch.cluster.service.ClusterService) AlreadyClosedException(org.apache.lucene.store.AlreadyClosedException) RecoveryEngineException(org.elasticsearch.index.engine.RecoveryEngineException) ParameterizedMessage(org.apache.logging.log4j.message.ParameterizedMessage) ShardNotFoundException(org.elasticsearch.index.shard.ShardNotFoundException) TranslogCorruptedException(org.elasticsearch.index.translog.TranslogCorruptedException) ElasticsearchTimeoutException(org.elasticsearch.ElasticsearchTimeoutException) ClusterState(org.elasticsearch.cluster.ClusterState) DiscoveryNode(org.elasticsearch.cluster.node.DiscoveryNode) ConnectTransportException(org.elasticsearch.transport.ConnectTransportException) Settings(org.elasticsearch.common.settings.Settings) IndexNotFoundException(org.elasticsearch.index.IndexNotFoundException) Store(org.elasticsearch.index.store.Store) ChannelActionListener(org.elasticsearch.action.support.ChannelActionListener) ThreadPool(org.elasticsearch.threadpool.ThreadPool) TransportResponse(org.elasticsearch.transport.TransportResponse) TransportService(org.elasticsearch.transport.TransportService) ClusterStateObserver(org.elasticsearch.cluster.ClusterStateObserver) Nullable(javax.annotation.Nullable) ByteSizeValue(org.elasticsearch.common.unit.ByteSizeValue) RecoveryRef(org.elasticsearch.indices.recovery.RecoveriesCollection.RecoveryRef) IndexEventListener(org.elasticsearch.index.shard.IndexEventListener) IndexShard(org.elasticsearch.index.shard.IndexShard) UNASSIGNED_SEQ_NO(org.elasticsearch.index.seqno.SequenceNumbers.UNASSIGNED_SEQ_NO) IOException(java.io.IOException) IllegalIndexShardStateException(org.elasticsearch.index.shard.IllegalIndexShardStateException) TransportRequestHandler(org.elasticsearch.transport.TransportRequestHandler) Consumer(java.util.function.Consumer) AtomicLong(java.util.concurrent.atomic.AtomicLong) AbstractRunnable(org.elasticsearch.common.util.concurrent.AbstractRunnable) Logger(org.apache.logging.log4j.Logger) TimeValue.timeValueMillis(io.crate.common.unit.TimeValue.timeValueMillis) StreamInput(org.elasticsearch.common.io.stream.StreamInput) TimeValue(io.crate.common.unit.TimeValue) Translog(org.elasticsearch.index.translog.Translog) TransportResponseHandler(org.elasticsearch.transport.TransportResponseHandler) SQLExceptions(io.crate.exceptions.SQLExceptions) LogManager(org.apache.logging.log4j.LogManager) TransportException(org.elasticsearch.transport.TransportException) RateLimiter(org.apache.lucene.store.RateLimiter) ActionListener(org.elasticsearch.action.ActionListener) MapperException(org.elasticsearch.index.mapper.MapperException) ByteSizeValue(org.elasticsearch.common.unit.ByteSizeValue) AlreadyClosedException(org.apache.lucene.store.AlreadyClosedException) RecoveryEngineException(org.elasticsearch.index.engine.RecoveryEngineException) RecoveryRef(org.elasticsearch.indices.recovery.RecoveriesCollection.RecoveryRef) TimeValue(io.crate.common.unit.TimeValue) CancellableThreads(org.elasticsearch.common.util.CancellableThreads) TransportResponseHandler(org.elasticsearch.transport.TransportResponseHandler) IllegalIndexShardStateException(org.elasticsearch.index.shard.IllegalIndexShardStateException) ConnectTransportException(org.elasticsearch.transport.ConnectTransportException) TransportException(org.elasticsearch.transport.TransportException) ElasticsearchException(org.elasticsearch.ElasticsearchException) AlreadyClosedException(org.apache.lucene.store.AlreadyClosedException) RecoveryEngineException(org.elasticsearch.index.engine.RecoveryEngineException) ShardNotFoundException(org.elasticsearch.index.shard.ShardNotFoundException) TranslogCorruptedException(org.elasticsearch.index.translog.TranslogCorruptedException) ElasticsearchTimeoutException(org.elasticsearch.ElasticsearchTimeoutException) ConnectTransportException(org.elasticsearch.transport.ConnectTransportException) IndexNotFoundException(org.elasticsearch.index.IndexNotFoundException) IOException(java.io.IOException) IllegalIndexShardStateException(org.elasticsearch.index.shard.IllegalIndexShardStateException) TransportException(org.elasticsearch.transport.TransportException) MapperException(org.elasticsearch.index.mapper.MapperException) ShardNotFoundException(org.elasticsearch.index.shard.ShardNotFoundException) ConnectTransportException(org.elasticsearch.transport.ConnectTransportException) StreamInput(org.elasticsearch.common.io.stream.StreamInput) IndexNotFoundException(org.elasticsearch.index.IndexNotFoundException) ParameterizedMessage(org.apache.logging.log4j.message.ParameterizedMessage)

Example 49 with TimeValue

use of io.crate.common.unit.TimeValue in project crate by crate.

the class TransportNodesListShardStoreMetadata method listStoreMetadata.

private StoreFilesMetadata listStoreMetadata(NodeRequest request) throws IOException {
    final ShardId shardId = request.getShardId();
    logger.trace("listing store meta data for {}", shardId);
    long startTimeNS = System.nanoTime();
    boolean exists = false;
    try {
        IndexService indexService = indicesService.indexService(shardId.getIndex());
        if (indexService != null) {
            IndexShard indexShard = indexService.getShardOrNull(shardId.id());
            if (indexShard != null) {
                try {
                    final StoreFilesMetadata storeFilesMetadata = new StoreFilesMetadata(shardId, indexShard.snapshotStoreMetadata(), indexShard.getPeerRecoveryRetentionLeases());
                    exists = true;
                    return storeFilesMetadata;
                } catch (org.apache.lucene.index.IndexNotFoundException e) {
                    logger.trace(new ParameterizedMessage("[{}] node is missing index, responding with empty", shardId), e);
                    return new StoreFilesMetadata(shardId, Store.MetadataSnapshot.EMPTY, Collections.emptyList());
                } catch (IOException e) {
                    logger.warn(new ParameterizedMessage("[{}] can't read metadata from store, responding with empty", shardId), e);
                    return new StoreFilesMetadata(shardId, Store.MetadataSnapshot.EMPTY, Collections.emptyList());
                }
            }
        }
        final String customDataPath;
        if (request.getCustomDataPath() != null) {
            customDataPath = request.getCustomDataPath();
        } else {
            // TODO: Fallback for BWC with older ES versions. Remove this once request.getCustomDataPath() always returns non-null
            if (indexService != null) {
                customDataPath = indexService.getIndexSettings().customDataPath();
            } else {
                IndexMetadata metadata = clusterService.state().metadata().index(shardId.getIndex());
                if (metadata != null) {
                    customDataPath = new IndexSettings(metadata, settings).customDataPath();
                } else {
                    logger.trace("{} node doesn't have meta data for the requests index", shardId);
                    throw new ElasticsearchException("node doesn't have meta data for index " + shardId.getIndex());
                }
            }
        }
        final ShardPath shardPath = ShardPath.loadShardPath(logger, nodeEnv, shardId, customDataPath);
        if (shardPath == null) {
            return new StoreFilesMetadata(shardId, Store.MetadataSnapshot.EMPTY, Collections.emptyList());
        }
        // note that this may fail if it can't get access to the shard lock. Since we check above there is an active shard, this means:
        // 1) a shard is being constructed, which means the master will not use a copy of this replica
        // 2) A shard is shutting down and has not cleared it's content within lock timeout. In this case the master may not
        // reuse local resources.
        final Store.MetadataSnapshot metadataSnapshot = Store.readMetadataSnapshot(shardPath.resolveIndex(), shardId, nodeEnv::shardLock, logger);
        // we refresh shard info after the primary has started. Hence, we can ignore retention leases if there is no active shard.
        return new StoreFilesMetadata(shardId, metadataSnapshot, Collections.emptyList());
    } finally {
        TimeValue took = new TimeValue(System.nanoTime() - startTimeNS, TimeUnit.NANOSECONDS);
        if (exists) {
            logger.debug("{} loaded store meta data (took [{}])", shardId, took);
        } else {
            logger.trace("{} didn't find any store meta data to load (took [{}])", shardId, took);
        }
    }
}
Also used : IndexService(org.elasticsearch.index.IndexService) IndexShard(org.elasticsearch.index.shard.IndexShard) IndexSettings(org.elasticsearch.index.IndexSettings) Store(org.elasticsearch.index.store.Store) IOException(java.io.IOException) ElasticsearchException(org.elasticsearch.ElasticsearchException) ShardId(org.elasticsearch.index.shard.ShardId) ShardPath(org.elasticsearch.index.shard.ShardPath) ParameterizedMessage(org.apache.logging.log4j.message.ParameterizedMessage) IndexMetadata(org.elasticsearch.cluster.metadata.IndexMetadata) TimeValue(io.crate.common.unit.TimeValue)

Example 50 with TimeValue

use of io.crate.common.unit.TimeValue in project crate by crate.

the class LimitedBackoffPolicyTest method testNoNext.

@Test
public void testNoNext() throws Exception {
    BackoffPolicy policy = new LimitedExponentialBackoff(0, 1, Integer.MAX_VALUE);
    Iterator<TimeValue> it = policy.iterator();
    it.next();
    expectedException.expect(NoSuchElementException.class);
    expectedException.expectMessage("Reached maximum amount of backoff iterations. Only 1 iterations allowed.");
    it.next();
}
Also used : BackoffPolicy(org.elasticsearch.action.bulk.BackoffPolicy) TimeValue(io.crate.common.unit.TimeValue) Test(org.junit.Test)

Aggregations

TimeValue (io.crate.common.unit.TimeValue)75 Test (org.junit.Test)23 ClusterState (org.elasticsearch.cluster.ClusterState)20 IOException (java.io.IOException)17 ParameterizedMessage (org.apache.logging.log4j.message.ParameterizedMessage)12 ActionListener (org.elasticsearch.action.ActionListener)12 IndexMetadata (org.elasticsearch.cluster.metadata.IndexMetadata)11 ArrayList (java.util.ArrayList)10 ThreadPool (org.elasticsearch.threadpool.ThreadPool)10 ElasticsearchException (org.elasticsearch.ElasticsearchException)9 Settings (org.elasticsearch.common.settings.Settings)9 Logger (org.apache.logging.log4j.Logger)8 ClusterStateUpdateTask (org.elasticsearch.cluster.ClusterStateUpdateTask)8 ClusterService (org.elasticsearch.cluster.service.ClusterService)8 List (java.util.List)7 LogManager (org.apache.logging.log4j.LogManager)7 Version (org.elasticsearch.Version)7 ElasticsearchTimeoutException (org.elasticsearch.ElasticsearchTimeoutException)6 ClusterStateObserver (org.elasticsearch.cluster.ClusterStateObserver)6 StreamInput (org.elasticsearch.common.io.stream.StreamInput)6