Search in sources :

Example 6 with StopWatch

use of org.elasticsearch.common.StopWatch in project crate by crate.

the class BlobRecoveryHandler method phase1.

public void phase1() throws Exception {
    logger.debug("[{}][{}] recovery [phase1] to {}: start", request.shardId().index().name(), request.shardId().id(), request.targetNode().getName());
    StopWatch stopWatch = new StopWatch().start();
    blobTransferTarget.startRecovery();
    blobTransferTarget.createActiveTransfersSnapshot();
    sendStartRecoveryRequest();
    final AtomicReference<Exception> lastException = new AtomicReference<Exception>();
    try {
        syncVarFiles(lastException);
    } catch (InterruptedException ex) {
        throw new ElasticsearchException("blob recovery phase1 failed", ex);
    }
    Exception exception = lastException.get();
    if (exception != null) {
        throw exception;
    }
    /**
         * as soon as the recovery starts the target node will receive PutChunkReplicaRequests
         * the target node will then request the bytes it is missing from the source node
         * (it is missing bytes from PutChunk/StartBlob requests that happened before the recovery)
         * here we need to block so that the target node has enough time to request the head chunks
         *
         * e.g.
         *      Target Node receives Chunk X with bytes 10-19
         *      Target Node requests bytes 0-9 from Source Node
         *      Source Node sends bytes 0-9
         *      Source Node sets transferTakenOver
         */
    blobTransferTarget.waitForGetHeadRequests(GET_HEAD_TIMEOUT, TimeUnit.SECONDS);
    blobTransferTarget.createActivePutHeadChunkTransfersSnapshot();
    /**
         * After receiving a getHeadRequest the source node starts to send HeadChunks to the target
         * wait for all PutHeadChunk-Runnables to finish before ending the recovery.
         */
    blobTransferTarget.waitUntilPutHeadChunksAreFinished();
    sendFinalizeRecoveryRequest();
    blobTransferTarget.stopRecovery();
    stopWatch.stop();
    logger.debug("[{}][{}] recovery [phase1] to {}: took [{}]", request.shardId().index().name(), request.shardId().id(), request.targetNode().getName(), stopWatch.totalTime());
}
Also used : AtomicReference(java.util.concurrent.atomic.AtomicReference) ElasticsearchException(org.elasticsearch.ElasticsearchException) ElasticsearchException(org.elasticsearch.ElasticsearchException) IndexShardClosedException(org.elasticsearch.index.shard.IndexShardClosedException) IOException(java.io.IOException) StopWatch(org.elasticsearch.common.StopWatch)

Example 7 with StopWatch

use of org.elasticsearch.common.StopWatch in project crate by crate.

the class JobCollectContext method measureCollectTime.

private void measureCollectTime() {
    final StopWatch stopWatch = new StopWatch(collectPhase.phaseId() + ": " + collectPhase.name());
    stopWatch.start("starting collectors");
    consumer.completionFuture().whenComplete((result, ex) -> {
        stopWatch.stop();
        logger.trace("Collectors finished: {}", stopWatch.shortSummary());
    });
}
Also used : StopWatch(org.elasticsearch.common.StopWatch)

Example 8 with StopWatch

use of org.elasticsearch.common.StopWatch in project crate by crate.

the class BlobRecoveryHandler method blobRecoveryHook.

@Override
protected void blobRecoveryHook() throws Exception {
    LOGGER.debug("[{}][{}] recovery [phase1] to {}: start", request.shardId().getIndexName(), request.shardId().id(), request.targetNode().getName());
    final StopWatch stopWatch = new StopWatch().start();
    blobTransferTarget.startRecovery();
    blobTransferTarget.createActiveTransfersSnapshot();
    sendStartRecoveryRequest();
    final AtomicReference<Exception> lastException = new AtomicReference<>();
    try {
        syncVarFiles(lastException);
    } catch (InterruptedException ex) {
        throw new ElasticsearchException("blob recovery phase1 failed", ex);
    }
    Exception exception = lastException.get();
    if (exception != null) {
        throw exception;
    }
    /*
          as soon as the recovery starts the target node will receive PutChunkReplicaRequests
          the target node will then request the bytes it is missing from the source node
          (it is missing bytes from PutChunk/StartBlob requests that happened before the recovery)
          here we need to block so that the target node has enough time to request the head chunks

          e.g.
               Target Node receives Chunk X with bytes 10-19
               Target Node requests bytes 0-9 from Source Node
               Source Node sends bytes 0-9
               Source Node sets transferTakenOver
         */
    blobTransferTarget.waitForGetHeadRequests(GET_HEAD_TIMEOUT, TimeUnit.SECONDS);
    blobTransferTarget.createActivePutHeadChunkTransfersSnapshot();
    /*
          After receiving a getHeadRequest the source node starts to send HeadChunks to the target
          wait for all PutHeadChunk-Runnables to finish before ending the recovery.
         */
    blobTransferTarget.waitUntilPutHeadChunksAreFinished();
    sendFinalizeRecoveryRequest();
    blobTransferTarget.stopRecovery();
    stopWatch.stop();
    LOGGER.debug("[{}][{}] recovery [phase1] to {}: took [{}]", request.shardId().getIndexName(), request.shardId().id(), request.targetNode().getName(), stopWatch.totalTime());
}
Also used : AtomicReference(java.util.concurrent.atomic.AtomicReference) ElasticsearchException(org.elasticsearch.ElasticsearchException) ElasticsearchException(org.elasticsearch.ElasticsearchException) IndexShardClosedException(org.elasticsearch.index.shard.IndexShardClosedException) IOException(java.io.IOException) StopWatch(org.elasticsearch.common.StopWatch)

Example 9 with StopWatch

use of org.elasticsearch.common.StopWatch in project crate by crate.

the class RecoverySourceHandler method phase2.

/**
 * Perform phase two of the recovery process.
 * <p>
 * Phase two uses a snapshot of the current translog *without* acquiring the write lock (however, the translog snapshot is
 * point-in-time view of the translog). It then sends each translog operation to the target node so it can be replayed into the new
 * shard.
 *
 * @param startingSeqNo              the sequence number to start recovery from, or {@link SequenceNumbers#UNASSIGNED_SEQ_NO} if all
 *                                   ops should be sent
 * @param endingSeqNo                the highest sequence number that should be sent
 * @param snapshot                   a snapshot of the translog
 * @param maxSeenAutoIdTimestamp     the max auto_id_timestamp of append-only requests on the primary
 * @param maxSeqNoOfUpdatesOrDeletes the max seq_no of updates or deletes on the primary after these operations were executed on it.
 * @param listener                   a listener which will be notified with the local checkpoint on the target.
 */
void phase2(long startingSeqNo, long endingSeqNo, Translog.Snapshot snapshot, long maxSeenAutoIdTimestamp, long maxSeqNoOfUpdatesOrDeletes, RetentionLeases retentionLeases, long mappingVersion, ActionListener<SendSnapshotResult> listener) throws IOException {
    if (shard.state() == IndexShardState.CLOSED) {
        throw new IndexShardClosedException(request.shardId());
    }
    logger.trace("recovery [phase2]: sending transaction log operations (from [" + startingSeqNo + "] to [" + endingSeqNo + "]");
    final StopWatch stopWatch = new StopWatch().start();
    final StepListener<Void> sendListener = new StepListener<>();
    final OperationBatchSender sender = new OperationBatchSender(startingSeqNo, endingSeqNo, snapshot, maxSeenAutoIdTimestamp, maxSeqNoOfUpdatesOrDeletes, retentionLeases, mappingVersion, sendListener);
    sendListener.whenComplete(ignored -> {
        final long skippedOps = sender.skippedOps.get();
        final int totalSentOps = sender.sentOps.get();
        final long targetLocalCheckpoint = sender.targetLocalCheckpoint.get();
        assert snapshot.totalOperations() == snapshot.skippedOperations() + skippedOps + totalSentOps : String.format(Locale.ROOT, "expected total [%d], overridden [%d], skipped [%d], total sent [%d]", snapshot.totalOperations(), snapshot.skippedOperations(), skippedOps, totalSentOps);
        stopWatch.stop();
        final TimeValue tookTime = stopWatch.totalTime();
        logger.trace("recovery [phase2]: took [{}]", tookTime);
        listener.onResponse(new SendSnapshotResult(targetLocalCheckpoint, totalSentOps, tookTime));
    }, listener::onFailure);
    sender.start();
}
Also used : IndexShardClosedException(org.elasticsearch.index.shard.IndexShardClosedException) StepListener(org.elasticsearch.action.StepListener) TimeValue(io.crate.common.unit.TimeValue) StopWatch(org.elasticsearch.common.StopWatch)

Example 10 with StopWatch

use of org.elasticsearch.common.StopWatch in project crate by crate.

the class RecoverySourceHandler method finalizeRecovery.

/*
     * finalizes the recovery process
     */
void finalizeRecovery(long targetLocalCheckpoint, long trimAboveSeqNo, final ActionListener<Void> listener) throws IOException {
    if (shard.state() == IndexShardState.CLOSED) {
        throw new IndexShardClosedException(request.shardId());
    }
    cancellableThreads.checkForCancel();
    final StopWatch stopWatch = new StopWatch().start();
    logger.trace("finalizing recovery");
    /*
         * Before marking the shard as in-sync we acquire an operation permit. We do this so that there is a barrier between marking a
         * shard as in-sync and relocating a shard. If we acquire the permit then no relocation handoff can complete before we are done
         * marking the shard as in-sync. If the relocation handoff holds all the permits then after the handoff completes and we acquire
         * the permit then the state of the shard will be relocated and this recovery will fail.
         */
    runUnderPrimaryPermit(() -> shard.markAllocationIdAsInSync(request.targetAllocationId(), targetLocalCheckpoint), shardId + " marking " + request.targetAllocationId() + " as in sync", shard, cancellableThreads, logger);
    // this global checkpoint is persisted in finalizeRecovery
    final long globalCheckpoint = shard.getLastKnownGlobalCheckpoint();
    final StepListener<Void> finalizeListener = new StepListener<>();
    cancellableThreads.checkForCancel();
    recoveryTarget.finalizeRecovery(globalCheckpoint, trimAboveSeqNo, finalizeListener);
    finalizeListener.whenComplete(r -> {
        runUnderPrimaryPermit(() -> shard.updateGlobalCheckpointForShard(request.targetAllocationId(), globalCheckpoint), shardId + " updating " + request.targetAllocationId() + "'s global checkpoint", shard, cancellableThreads, logger);
        if (request.isPrimaryRelocation()) {
            logger.trace("performing relocation hand-off");
            // TODO: make relocated async
            // this acquires all IndexShard operation permits and will thus delay new recoveries until it is done
            cancellableThreads.execute(() -> shard.relocated(request.targetAllocationId(), recoveryTarget::handoffPrimaryContext));
        /*
                 * if the recovery process fails after disabling primary mode on the source shard, both relocation source and
                 * target are failed (see {@link IndexShard#updateRoutingEntry}).
                 */
        }
        stopWatch.stop();
        logger.trace("finalizing recovery took [{}]", stopWatch.totalTime());
        listener.onResponse(null);
    }, listener::onFailure);
}
Also used : IndexShardClosedException(org.elasticsearch.index.shard.IndexShardClosedException) StepListener(org.elasticsearch.action.StepListener) StopWatch(org.elasticsearch.common.StopWatch)

Aggregations

StopWatch (org.elasticsearch.common.StopWatch)21 IndexShardClosedException (org.elasticsearch.index.shard.IndexShardClosedException)11 TimeValue (io.crate.common.unit.TimeValue)4 IOException (java.io.IOException)4 ArrayList (java.util.ArrayList)4 AtomicReference (java.util.concurrent.atomic.AtomicReference)3 Interruptable (org.elasticsearch.common.util.CancellableThreads.Interruptable)3 Closeable (java.io.Closeable)2 CorruptIndexException (org.apache.lucene.index.CorruptIndexException)2 IndexFormatTooNewException (org.apache.lucene.index.IndexFormatTooNewException)2 IndexFormatTooOldException (org.apache.lucene.index.IndexFormatTooOldException)2 IndexInput (org.apache.lucene.store.IndexInput)2 RateLimiter (org.apache.lucene.store.RateLimiter)2 ElasticsearchException (org.elasticsearch.ElasticsearchException)2 StepListener (org.elasticsearch.action.StepListener)2 Client (org.elasticsearch.client.Client)2 NodeClient (org.elasticsearch.client.node.NodeClient)2 NodeConnectionsService (org.elasticsearch.cluster.NodeConnectionsService)2 ClusterService (org.elasticsearch.cluster.service.ClusterService)2 BytesArray (org.elasticsearch.common.bytes.BytesArray)2