Search in sources :

Example 6 with RetentionLease

use of org.elasticsearch.index.seqno.RetentionLease in project crate by crate.

the class SoftDeletesPolicyTests method testWhenGlobalCheckpointDictatesThePolicy.

@Test
public void testWhenGlobalCheckpointDictatesThePolicy() {
    final int retentionOperations = randomIntBetween(0, 1024);
    final AtomicLong globalCheckpoint = new AtomicLong(randomLongBetween(0, Long.MAX_VALUE - 2));
    final Collection<RetentionLease> leases = new ArrayList<>();
    final int numberOfLeases = randomIntBetween(0, 16);
    for (int i = 0; i < numberOfLeases; i++) {
        // setup leases where the minimum retained sequence number is more than the policy dictated by the global checkpoint
        leases.add(new RetentionLease(Integer.toString(i), randomLongBetween(1 + globalCheckpoint.get() - retentionOperations + 1, Long.MAX_VALUE), randomNonNegativeLong(), "test"));
    }
    final long primaryTerm = randomNonNegativeLong();
    final long version = randomNonNegativeLong();
    final Supplier<RetentionLeases> leasesSupplier = () -> new RetentionLeases(primaryTerm, version, Collections.unmodifiableCollection(new ArrayList<>(leases)));
    final SoftDeletesPolicy policy = new SoftDeletesPolicy(globalCheckpoint::get, 0, retentionOperations, leasesSupplier);
    // set the local checkpoint of the safe commit to more than the policy dicated by the global checkpoint
    final long localCheckpointOfSafeCommit = randomLongBetween(1 + globalCheckpoint.get() - retentionOperations + 1, Long.MAX_VALUE);
    policy.setLocalCheckpointOfSafeCommit(localCheckpointOfSafeCommit);
    assertThat(policy.getMinRetainedSeqNo(), equalTo(1 + globalCheckpoint.get() - retentionOperations));
}
Also used : AtomicLong(java.util.concurrent.atomic.AtomicLong) RetentionLease(org.elasticsearch.index.seqno.RetentionLease) ArrayList(java.util.ArrayList) RetentionLeases(org.elasticsearch.index.seqno.RetentionLeases) Test(org.junit.Test)

Example 7 with RetentionLease

use of org.elasticsearch.index.seqno.RetentionLease in project crate by crate.

the class IndexShardRetentionLeaseTests method runExpirationTest.

private void runExpirationTest(final boolean primary) throws IOException {
    final long retentionLeaseMillis = randomLongBetween(1, TimeValue.timeValueHours(12).millis());
    final Settings settings = Settings.builder().put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true).put(IndexSettings.INDEX_SOFT_DELETES_RETENTION_LEASE_PERIOD_SETTING.getKey(), TimeValue.timeValueMillis(retentionLeaseMillis).getStringRep()).build();
    // current time is mocked through the thread pool
    final IndexShard indexShard = newStartedShard(primary, settings, new InternalEngineFactory());
    final long primaryTerm = indexShard.getOperationPrimaryTerm();
    try {
        final long[] retainingSequenceNumbers = new long[1];
        retainingSequenceNumbers[0] = randomLongBetween(0, Long.MAX_VALUE);
        final long initialVersion;
        if (primary) {
            initialVersion = 2;
            indexShard.addRetentionLease("0", retainingSequenceNumbers[0], "test-0", ActionListener.wrap(() -> {
            }));
        } else {
            initialVersion = 3;
            final RetentionLeases retentionLeases = new RetentionLeases(primaryTerm, initialVersion, Arrays.asList(peerRecoveryRetentionLease(indexShard), new RetentionLease("0", retainingSequenceNumbers[0], currentTimeMillis.get(), "test-0")));
            indexShard.updateRetentionLeasesOnReplica(retentionLeases);
        }
        {
            final RetentionLeases retentionLeases = indexShard.getEngine().config().retentionLeasesSupplier().get();
            assertThat(retentionLeases.version(), equalTo(initialVersion));
            assertThat(retentionLeases.leases(), hasSize(2));
            final RetentionLease retentionLease = retentionLeases.get("0");
            assertThat(retentionLease.timestamp(), equalTo(currentTimeMillis.get()));
            assertRetentionLeases(indexShard, 1, retainingSequenceNumbers, primaryTerm, initialVersion, primary, false);
        }
        // renew the lease
        currentTimeMillis.set(currentTimeMillis.get() + randomLongBetween(0, 1024));
        retainingSequenceNumbers[0] = randomLongBetween(retainingSequenceNumbers[0], Long.MAX_VALUE);
        if (primary) {
            indexShard.renewRetentionLease("0", retainingSequenceNumbers[0], "test-0");
        } else {
            final RetentionLeases retentionLeases = new RetentionLeases(primaryTerm, initialVersion + 1, Arrays.asList(peerRecoveryRetentionLease(indexShard), new RetentionLease("0", retainingSequenceNumbers[0], currentTimeMillis.get(), "test-0")));
            indexShard.updateRetentionLeasesOnReplica(retentionLeases);
        }
        {
            final RetentionLeases retentionLeases = indexShard.getEngine().config().retentionLeasesSupplier().get();
            assertThat(retentionLeases.version(), equalTo(initialVersion + 1));
            assertThat(retentionLeases.leases(), hasSize(2));
            final RetentionLease retentionLease = retentionLeases.get("0");
            assertThat(retentionLease.timestamp(), equalTo(currentTimeMillis.get()));
            assertRetentionLeases(indexShard, 1, retainingSequenceNumbers, primaryTerm, initialVersion + 1, primary, false);
        }
        // now force the lease to expire
        currentTimeMillis.set(currentTimeMillis.get() + randomLongBetween(retentionLeaseMillis, Long.MAX_VALUE - currentTimeMillis.get()));
        if (primary) {
            assertRetentionLeases(indexShard, 1, retainingSequenceNumbers, primaryTerm, initialVersion + 1, true, false);
            assertRetentionLeases(indexShard, 0, new long[0], primaryTerm, initialVersion + 2, true, true);
        } else {
            assertRetentionLeases(indexShard, 1, retainingSequenceNumbers, primaryTerm, initialVersion + 1, false, false);
        }
    } finally {
        closeShards(indexShard);
    }
}
Also used : RetentionLease(org.elasticsearch.index.seqno.RetentionLease) InternalEngineFactory(org.elasticsearch.index.engine.InternalEngineFactory) Settings(org.elasticsearch.common.settings.Settings) IndexSettings(org.elasticsearch.index.IndexSettings) RetentionLeases(org.elasticsearch.index.seqno.RetentionLeases)

Example 8 with RetentionLease

use of org.elasticsearch.index.seqno.RetentionLease in project crate by crate.

the class RecoverySourceHandler method recoverToTarget.

/**
 * performs the recovery from the local engine to the target
 */
public void recoverToTarget(ActionListener<RecoveryResponse> listener) {
    final Closeable releaseResources = () -> IOUtils.close(resources);
    final ActionListener<RecoveryResponse> wrappedListener = ActionListener.notifyOnce(listener);
    try {
        cancellableThreads.setOnCancel((reason, beforeCancelEx) -> {
            final RuntimeException e;
            if (shard.state() == IndexShardState.CLOSED) {
                // check if the shard got closed on us
                e = new IndexShardClosedException(shard.shardId(), "shard is closed and recovery was canceled reason [" + reason + "]");
            } else {
                e = new CancellableThreads.ExecutionCancelledException("recovery was canceled reason [" + reason + "]");
            }
            if (beforeCancelEx != null) {
                e.addSuppressed(beforeCancelEx);
            }
            IOUtils.closeWhileHandlingException(releaseResources, () -> wrappedListener.onFailure(e));
            throw e;
        });
        final Consumer<Exception> onFailure = e -> {
            assert Transports.assertNotTransportThread(RecoverySourceHandler.this + "[onFailure]");
            IOUtils.closeWhileHandlingException(releaseResources, () -> wrappedListener.onFailure(e));
        };
        final boolean softDeletesEnabled = shard.indexSettings().isSoftDeleteEnabled();
        final SetOnce<RetentionLease> retentionLeaseRef = new SetOnce<>();
        runUnderPrimaryPermit(() -> {
            final IndexShardRoutingTable routingTable = shard.getReplicationGroup().getRoutingTable();
            ShardRouting targetShardRouting = routingTable.getByAllocationId(request.targetAllocationId());
            if (targetShardRouting == null) {
                logger.debug("delaying recovery of {} as it is not listed as assigned to target node {}", request.shardId(), request.targetNode());
                throw new DelayRecoveryException("source node does not have the shard listed in its state as allocated on the node");
            }
            assert targetShardRouting.initializing() : "expected recovery target to be initializing but was " + targetShardRouting;
            retentionLeaseRef.set(shard.getRetentionLeases().get(ReplicationTracker.getPeerRecoveryRetentionLeaseId(targetShardRouting)));
        }, shardId + " validating recovery target [" + request.targetAllocationId() + "] registered ", shard, cancellableThreads, logger);
        final Engine.HistorySource historySource;
        if (softDeletesEnabled && (shard.useRetentionLeasesInPeerRecovery() || retentionLeaseRef.get() != null)) {
            historySource = Engine.HistorySource.INDEX;
        } else {
            historySource = Engine.HistorySource.TRANSLOG;
        }
        final Closeable retentionLock = shard.acquireHistoryRetentionLock(historySource);
        resources.add(retentionLock);
        final long startingSeqNo;
        final boolean isSequenceNumberBasedRecovery = request.startingSeqNo() != SequenceNumbers.UNASSIGNED_SEQ_NO && isTargetSameHistory() && shard.hasCompleteHistoryOperations("peer-recovery", historySource, request.startingSeqNo()) && (historySource == Engine.HistorySource.TRANSLOG || (retentionLeaseRef.get() != null && retentionLeaseRef.get().retainingSequenceNumber() <= request.startingSeqNo()));
        if (isSequenceNumberBasedRecovery && softDeletesEnabled && retentionLeaseRef.get() != null) {
            // all the history we need is retained by an existing retention lease, so we do not need a separate retention lock
            retentionLock.close();
            logger.trace("history is retained by {}", retentionLeaseRef.get());
        } else {
            // all the history we need is retained by the retention lock, obtained before calling shard.hasCompleteHistoryOperations()
            // and before acquiring the safe commit we'll be using, so we can be certain that all operations after the safe commit's
            // local checkpoint will be retained for the duration of this recovery.
            logger.trace("history is retained by retention lock");
        }
        final StepListener<SendFileResult> sendFileStep = new StepListener<>();
        final StepListener<TimeValue> prepareEngineStep = new StepListener<>();
        final StepListener<SendSnapshotResult> sendSnapshotStep = new StepListener<>();
        final StepListener<Void> finalizeStep = new StepListener<>();
        if (isSequenceNumberBasedRecovery) {
            logger.trace("performing sequence numbers based recovery. starting at [{}]", request.startingSeqNo());
            startingSeqNo = request.startingSeqNo();
            if (retentionLeaseRef.get() == null) {
                createRetentionLease(startingSeqNo, ActionListener.map(sendFileStep, ignored -> SendFileResult.EMPTY));
            } else {
                sendFileStep.onResponse(SendFileResult.EMPTY);
            }
        } else {
            final Engine.IndexCommitRef safeCommitRef;
            try {
                safeCommitRef = shard.acquireSafeIndexCommit();
                resources.add(safeCommitRef);
            } catch (final Exception e) {
                throw new RecoveryEngineException(shard.shardId(), 1, "snapshot failed", e);
            }
            // Try and copy enough operations to the recovering peer so that if it is promoted to primary then it has a chance of being
            // able to recover other replicas using operations-based recoveries. If we are not using retention leases then we
            // conservatively copy all available operations. If we are using retention leases then "enough operations" is just the
            // operations from the local checkpoint of the safe commit onwards, because when using soft deletes the safe commit retains
            // at least as much history as anything else. The safe commit will often contain all the history retained by the current set
            // of retention leases, but this is not guaranteed: an earlier peer recovery from a different primary might have created a
            // retention lease for some history that this primary already discarded, since we discard history when the global checkpoint
            // advances and not when creating a new safe commit. In any case this is a best-effort thing since future recoveries can
            // always fall back to file-based ones, and only really presents a problem if this primary fails before things have settled
            // down.
            startingSeqNo = softDeletesEnabled ? Long.parseLong(safeCommitRef.getIndexCommit().getUserData().get(SequenceNumbers.LOCAL_CHECKPOINT_KEY)) + 1L : 0;
            logger.trace("performing file-based recovery followed by history replay starting at [{}]", startingSeqNo);
            try {
                final int estimateNumOps = shard.estimateNumberOfHistoryOperations("peer-recovery", historySource, startingSeqNo);
                final Releasable releaseStore = acquireStore(shard.store());
                resources.add(releaseStore);
                sendFileStep.whenComplete(r -> IOUtils.close(safeCommitRef, releaseStore), e -> {
                    try {
                        IOUtils.close(safeCommitRef, releaseStore);
                    } catch (final IOException ex) {
                        logger.warn("releasing snapshot caused exception", ex);
                    }
                });
                final StepListener<ReplicationResponse> deleteRetentionLeaseStep = new StepListener<>();
                runUnderPrimaryPermit(() -> {
                    try {
                        // If the target previously had a copy of this shard then a file-based recovery might move its global
                        // checkpoint backwards. We must therefore remove any existing retention lease so that we can create a
                        // new one later on in the recovery.
                        shard.removePeerRecoveryRetentionLease(request.targetNode().getId(), new ThreadedActionListener<>(logger, shard.getThreadPool(), ThreadPool.Names.GENERIC, deleteRetentionLeaseStep, false));
                    } catch (RetentionLeaseNotFoundException e) {
                        logger.debug("no peer-recovery retention lease for " + request.targetAllocationId());
                        deleteRetentionLeaseStep.onResponse(null);
                    }
                }, shardId + " removing retention lease for [" + request.targetAllocationId() + "]", shard, cancellableThreads, logger);
                deleteRetentionLeaseStep.whenComplete(ignored -> {
                    assert Transports.assertNotTransportThread(RecoverySourceHandler.this + "[phase1]");
                    phase1(safeCommitRef.getIndexCommit(), startingSeqNo, () -> estimateNumOps, sendFileStep);
                }, onFailure);
            } catch (final Exception e) {
                throw new RecoveryEngineException(shard.shardId(), 1, "sendFileStep failed", e);
            }
        }
        assert startingSeqNo >= 0 : "startingSeqNo must be non negative. got: " + startingSeqNo;
        sendFileStep.whenComplete(r -> {
            assert Transports.assertNotTransportThread(RecoverySourceHandler.this + "[prepareTargetForTranslog]");
            // For a sequence based recovery, the target can keep its local translog
            prepareTargetForTranslog(shard.estimateNumberOfHistoryOperations("peer-recovery", historySource, startingSeqNo), prepareEngineStep);
        }, onFailure);
        prepareEngineStep.whenComplete(prepareEngineTime -> {
            assert Transports.assertNotTransportThread(RecoverySourceHandler.this + "[phase2]");
            /*
                 * add shard to replication group (shard will receive replication requests from this point on)
                 * now that engine is open. This means that any document indexed into the primary after
                 * this will be replicated to this replica as well make sure to do this before sampling
                 * the max sequence number in the next step, to ensure that we send all documents up to
                 * maxSeqNo in phase2.
                 */
            runUnderPrimaryPermit(() -> shard.initiateTracking(request.targetAllocationId()), shardId + " initiating tracking of " + request.targetAllocationId(), shard, cancellableThreads, logger);
            final long endingSeqNo = shard.seqNoStats().getMaxSeqNo();
            // CRATE_PATCH
            try {
                blobRecoveryHook();
            } catch (Exception e) {
                throw new RecoveryEngineException(shard.shardId(), 1, "blobRecoveryHook failed", e);
            }
            if (logger.isTraceEnabled()) {
                logger.trace("snapshot translog for recovery; current size is [{}]", shard.estimateNumberOfHistoryOperations("peer-recovery", historySource, startingSeqNo));
            }
            final Translog.Snapshot phase2Snapshot = shard.getHistoryOperations("peer-recovery", historySource, startingSeqNo);
            resources.add(phase2Snapshot);
            retentionLock.close();
            // we have to capture the max_seen_auto_id_timestamp and the max_seq_no_of_updates to make sure that these values
            // are at least as high as the corresponding values on the primary when any of these operations were executed on it.
            final long maxSeenAutoIdTimestamp = shard.getMaxSeenAutoIdTimestamp();
            final long maxSeqNoOfUpdatesOrDeletes = shard.getMaxSeqNoOfUpdatesOrDeletes();
            final RetentionLeases retentionLeases = shard.getRetentionLeases();
            final long mappingVersionOnPrimary = shard.indexSettings().getIndexMetadata().getMappingVersion();
            phase2(startingSeqNo, endingSeqNo, phase2Snapshot, maxSeenAutoIdTimestamp, maxSeqNoOfUpdatesOrDeletes, retentionLeases, mappingVersionOnPrimary, sendSnapshotStep);
        }, onFailure);
        // Recovery target can trim all operations >= startingSeqNo as we have sent all these operations in the phase 2
        final long trimAboveSeqNo = startingSeqNo - 1;
        sendSnapshotStep.whenComplete(r -> finalizeRecovery(r.targetLocalCheckpoint, trimAboveSeqNo, finalizeStep), onFailure);
        finalizeStep.whenComplete(r -> {
            // TODO: return the actual throttle time
            final long phase1ThrottlingWaitTime = 0L;
            final SendSnapshotResult sendSnapshotResult = sendSnapshotStep.result();
            final SendFileResult sendFileResult = sendFileStep.result();
            final RecoveryResponse response = new RecoveryResponse(sendFileResult.phase1FileNames, sendFileResult.phase1FileSizes, sendFileResult.phase1ExistingFileNames, sendFileResult.phase1ExistingFileSizes, sendFileResult.totalSize, sendFileResult.existingTotalSize, sendFileResult.took.millis(), phase1ThrottlingWaitTime, prepareEngineStep.result().millis(), sendSnapshotResult.sentOperations, sendSnapshotResult.tookTime.millis());
            try {
                wrappedListener.onResponse(response);
            } finally {
                IOUtils.close(resources);
            }
        }, onFailure);
    } catch (Exception e) {
        IOUtils.closeWhileHandlingException(releaseResources, () -> wrappedListener.onFailure(e));
    }
}
Also used : CancellableThreads(org.elasticsearch.common.util.CancellableThreads) Arrays(java.util.Arrays) IndexFormatTooNewException(org.apache.lucene.index.IndexFormatTooNewException) Releasables(org.elasticsearch.common.lease.Releasables) RecoveryEngineException(org.elasticsearch.index.engine.RecoveryEngineException) RetentionLeaseNotFoundException(org.elasticsearch.index.seqno.RetentionLeaseNotFoundException) CorruptIndexException(org.apache.lucene.index.CorruptIndexException) StoreFileMetadata(org.elasticsearch.index.store.StoreFileMetadata) Transports(org.elasticsearch.transport.Transports) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) Locale(java.util.Locale) ThreadPool(org.elasticsearch.threadpool.ThreadPool) ActionRunnable(org.elasticsearch.action.ActionRunnable) IOContext(org.apache.lucene.store.IOContext) StepListener(org.elasticsearch.action.StepListener) Releasable(org.elasticsearch.common.lease.Releasable) ByteSizeValue(org.elasticsearch.common.unit.ByteSizeValue) PlainActionFuture(org.elasticsearch.action.support.PlainActionFuture) IndexShardRoutingTable(org.elasticsearch.cluster.routing.IndexShardRoutingTable) BytesReference(org.elasticsearch.common.bytes.BytesReference) Engine(org.elasticsearch.index.engine.Engine) RemoteTransportException(org.elasticsearch.transport.RemoteTransportException) List(java.util.List) Logger(org.apache.logging.log4j.Logger) Version(org.elasticsearch.Version) InputStreamIndexInput(org.elasticsearch.common.lucene.store.InputStreamIndexInput) TimeValue(io.crate.common.unit.TimeValue) ReplicationTracker(org.elasticsearch.index.seqno.ReplicationTracker) CopyOnWriteArrayList(java.util.concurrent.CopyOnWriteArrayList) IndexCommit(org.apache.lucene.index.IndexCommit) ShardRouting(org.elasticsearch.cluster.routing.ShardRouting) IndexShardClosedException(org.elasticsearch.index.shard.IndexShardClosedException) IndexShardRelocatedException(org.elasticsearch.index.shard.IndexShardRelocatedException) CompletableFuture(java.util.concurrent.CompletableFuture) Deque(java.util.Deque) ParameterizedMessage(org.apache.logging.log4j.message.ParameterizedMessage) ArrayList(java.util.ArrayList) BytesArray(org.elasticsearch.common.bytes.BytesArray) RetentionLease(org.elasticsearch.index.seqno.RetentionLease) RetentionLeases(org.elasticsearch.index.seqno.RetentionLeases) Store(org.elasticsearch.index.store.Store) StreamSupport(java.util.stream.StreamSupport) IntSupplier(java.util.function.IntSupplier) Loggers(org.elasticsearch.common.logging.Loggers) ArrayUtil(org.apache.lucene.util.ArrayUtil) FutureUtils(org.elasticsearch.common.util.concurrent.FutureUtils) SequenceNumbers(org.elasticsearch.index.seqno.SequenceNumbers) IndexShardState(org.elasticsearch.index.shard.IndexShardState) IndexInput(org.apache.lucene.store.IndexInput) SetOnce(org.apache.lucene.util.SetOnce) IOUtils(io.crate.common.io.IOUtils) IndexShard(org.elasticsearch.index.shard.IndexShard) IOException(java.io.IOException) StopWatch(org.elasticsearch.common.StopWatch) IndexFormatTooOldException(org.apache.lucene.index.IndexFormatTooOldException) ConcurrentLinkedDeque(java.util.concurrent.ConcurrentLinkedDeque) Consumer(java.util.function.Consumer) ExceptionsHelper(org.elasticsearch.ExceptionsHelper) AtomicLong(java.util.concurrent.atomic.AtomicLong) ReplicationResponse(org.elasticsearch.action.support.replication.ReplicationResponse) Closeable(java.io.Closeable) Translog(org.elasticsearch.index.translog.Translog) ThreadedActionListener(org.elasticsearch.action.support.ThreadedActionListener) Comparator(java.util.Comparator) Collections(java.util.Collections) RateLimiter(org.apache.lucene.store.RateLimiter) ActionListener(org.elasticsearch.action.ActionListener) IndexShardRoutingTable(org.elasticsearch.cluster.routing.IndexShardRoutingTable) Closeable(java.io.Closeable) ReplicationResponse(org.elasticsearch.action.support.replication.ReplicationResponse) Translog(org.elasticsearch.index.translog.Translog) RecoveryEngineException(org.elasticsearch.index.engine.RecoveryEngineException) Engine(org.elasticsearch.index.engine.Engine) TimeValue(io.crate.common.unit.TimeValue) CancellableThreads(org.elasticsearch.common.util.CancellableThreads) SetOnce(org.apache.lucene.util.SetOnce) IOException(java.io.IOException) IndexFormatTooNewException(org.apache.lucene.index.IndexFormatTooNewException) RecoveryEngineException(org.elasticsearch.index.engine.RecoveryEngineException) RetentionLeaseNotFoundException(org.elasticsearch.index.seqno.RetentionLeaseNotFoundException) CorruptIndexException(org.apache.lucene.index.CorruptIndexException) RemoteTransportException(org.elasticsearch.transport.RemoteTransportException) IndexShardClosedException(org.elasticsearch.index.shard.IndexShardClosedException) IndexShardRelocatedException(org.elasticsearch.index.shard.IndexShardRelocatedException) IOException(java.io.IOException) IndexFormatTooOldException(org.apache.lucene.index.IndexFormatTooOldException) RetentionLeases(org.elasticsearch.index.seqno.RetentionLeases) RetentionLeaseNotFoundException(org.elasticsearch.index.seqno.RetentionLeaseNotFoundException) IndexShardClosedException(org.elasticsearch.index.shard.IndexShardClosedException) RetentionLease(org.elasticsearch.index.seqno.RetentionLease) StepListener(org.elasticsearch.action.StepListener) Releasable(org.elasticsearch.common.lease.Releasable) ShardRouting(org.elasticsearch.cluster.routing.ShardRouting)

Example 9 with RetentionLease

use of org.elasticsearch.index.seqno.RetentionLease in project crate by crate.

the class RecoverySourceHandler method createRetentionLease.

void createRetentionLease(final long startingSeqNo, ActionListener<RetentionLease> listener) {
    runUnderPrimaryPermit(() -> {
        // Clone the peer recovery retention lease belonging to the source shard. We are retaining history between the the local
        // checkpoint of the safe commit we're creating and this lease's retained seqno with the retention lock, and by cloning an
        // existing lease we (approximately) know that all our peers are also retaining history as requested by the cloned lease. If
        // the recovery now fails before copying enough history over then a subsequent attempt will find this lease, determine it is
        // not enough, and fall back to a file-based recovery.
        // 
        // (approximately) because we do not guarantee to be able to satisfy every lease on every peer.
        logger.trace("cloning primary's retention lease");
        try {
            final StepListener<ReplicationResponse> cloneRetentionLeaseStep = new StepListener<>();
            final RetentionLease clonedLease = shard.cloneLocalPeerRecoveryRetentionLease(request.targetNode().getId(), new ThreadedActionListener<>(logger, shard.getThreadPool(), ThreadPool.Names.GENERIC, cloneRetentionLeaseStep, false));
            logger.trace("cloned primary's retention lease as [{}]", clonedLease);
            cloneRetentionLeaseStep.whenComplete(rr -> listener.onResponse(clonedLease), listener::onFailure);
        } catch (RetentionLeaseNotFoundException e) {
            // recovery as a conservative estimate for the global checkpoint.
            assert shard.indexSettings().getIndexVersionCreated().before(Version.V_4_3_0) || shard.indexSettings().isSoftDeleteEnabled() == false;
            final StepListener<ReplicationResponse> addRetentionLeaseStep = new StepListener<>();
            final long estimatedGlobalCheckpoint = startingSeqNo - 1;
            final RetentionLease newLease = shard.addPeerRecoveryRetentionLease(request.targetNode().getId(), estimatedGlobalCheckpoint, new ThreadedActionListener<>(logger, shard.getThreadPool(), ThreadPool.Names.GENERIC, addRetentionLeaseStep, false));
            addRetentionLeaseStep.whenComplete(rr -> listener.onResponse(newLease), listener::onFailure);
            logger.trace("created retention lease with estimated checkpoint of [{}]", estimatedGlobalCheckpoint);
        }
    }, shardId + " establishing retention lease for [" + request.targetAllocationId() + "]", shard, cancellableThreads, logger);
}
Also used : CancellableThreads(org.elasticsearch.common.util.CancellableThreads) Arrays(java.util.Arrays) IndexFormatTooNewException(org.apache.lucene.index.IndexFormatTooNewException) Releasables(org.elasticsearch.common.lease.Releasables) RecoveryEngineException(org.elasticsearch.index.engine.RecoveryEngineException) RetentionLeaseNotFoundException(org.elasticsearch.index.seqno.RetentionLeaseNotFoundException) CorruptIndexException(org.apache.lucene.index.CorruptIndexException) StoreFileMetadata(org.elasticsearch.index.store.StoreFileMetadata) Transports(org.elasticsearch.transport.Transports) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) Locale(java.util.Locale) ThreadPool(org.elasticsearch.threadpool.ThreadPool) ActionRunnable(org.elasticsearch.action.ActionRunnable) IOContext(org.apache.lucene.store.IOContext) StepListener(org.elasticsearch.action.StepListener) Releasable(org.elasticsearch.common.lease.Releasable) ByteSizeValue(org.elasticsearch.common.unit.ByteSizeValue) PlainActionFuture(org.elasticsearch.action.support.PlainActionFuture) IndexShardRoutingTable(org.elasticsearch.cluster.routing.IndexShardRoutingTable) BytesReference(org.elasticsearch.common.bytes.BytesReference) Engine(org.elasticsearch.index.engine.Engine) RemoteTransportException(org.elasticsearch.transport.RemoteTransportException) List(java.util.List) Logger(org.apache.logging.log4j.Logger) Version(org.elasticsearch.Version) InputStreamIndexInput(org.elasticsearch.common.lucene.store.InputStreamIndexInput) TimeValue(io.crate.common.unit.TimeValue) ReplicationTracker(org.elasticsearch.index.seqno.ReplicationTracker) CopyOnWriteArrayList(java.util.concurrent.CopyOnWriteArrayList) IndexCommit(org.apache.lucene.index.IndexCommit) ShardRouting(org.elasticsearch.cluster.routing.ShardRouting) IndexShardClosedException(org.elasticsearch.index.shard.IndexShardClosedException) IndexShardRelocatedException(org.elasticsearch.index.shard.IndexShardRelocatedException) CompletableFuture(java.util.concurrent.CompletableFuture) Deque(java.util.Deque) ParameterizedMessage(org.apache.logging.log4j.message.ParameterizedMessage) ArrayList(java.util.ArrayList) BytesArray(org.elasticsearch.common.bytes.BytesArray) RetentionLease(org.elasticsearch.index.seqno.RetentionLease) RetentionLeases(org.elasticsearch.index.seqno.RetentionLeases) Store(org.elasticsearch.index.store.Store) StreamSupport(java.util.stream.StreamSupport) IntSupplier(java.util.function.IntSupplier) Loggers(org.elasticsearch.common.logging.Loggers) ArrayUtil(org.apache.lucene.util.ArrayUtil) FutureUtils(org.elasticsearch.common.util.concurrent.FutureUtils) SequenceNumbers(org.elasticsearch.index.seqno.SequenceNumbers) IndexShardState(org.elasticsearch.index.shard.IndexShardState) IndexInput(org.apache.lucene.store.IndexInput) SetOnce(org.apache.lucene.util.SetOnce) IOUtils(io.crate.common.io.IOUtils) IndexShard(org.elasticsearch.index.shard.IndexShard) IOException(java.io.IOException) StopWatch(org.elasticsearch.common.StopWatch) IndexFormatTooOldException(org.apache.lucene.index.IndexFormatTooOldException) ConcurrentLinkedDeque(java.util.concurrent.ConcurrentLinkedDeque) Consumer(java.util.function.Consumer) ExceptionsHelper(org.elasticsearch.ExceptionsHelper) AtomicLong(java.util.concurrent.atomic.AtomicLong) ReplicationResponse(org.elasticsearch.action.support.replication.ReplicationResponse) Closeable(java.io.Closeable) Translog(org.elasticsearch.index.translog.Translog) ThreadedActionListener(org.elasticsearch.action.support.ThreadedActionListener) Comparator(java.util.Comparator) Collections(java.util.Collections) RateLimiter(org.apache.lucene.store.RateLimiter) ActionListener(org.elasticsearch.action.ActionListener) RetentionLeaseNotFoundException(org.elasticsearch.index.seqno.RetentionLeaseNotFoundException) ThreadedActionListener(org.elasticsearch.action.support.ThreadedActionListener) RetentionLease(org.elasticsearch.index.seqno.RetentionLease) StepListener(org.elasticsearch.action.StepListener) ReplicationResponse(org.elasticsearch.action.support.replication.ReplicationResponse)

Example 10 with RetentionLease

use of org.elasticsearch.index.seqno.RetentionLease in project crate by crate.

the class RecoverySourceHandler method phase1.

/**
 * Perform phase1 of the recovery operations. Once this {@link IndexCommit}
 * snapshot has been performed no commit operations (files being fsync'd)
 * are effectively allowed on this index until all recovery phases are done
 * <p>
 * Phase1 examines the segment files on the target node and copies over the
 * segments that are missing. Only segments that have the same size and
 * checksum can be reused
 */
void phase1(IndexCommit snapshot, long startingSeqNo, IntSupplier translogOps, ActionListener<SendFileResult> listener) {
    cancellableThreads.checkForCancel();
    final Store store = shard.store();
    try {
        final StopWatch stopWatch = new StopWatch().start();
        final Store.MetadataSnapshot recoverySourceMetadata;
        try {
            recoverySourceMetadata = store.getMetadata(snapshot);
        } catch (CorruptIndexException | IndexFormatTooOldException | IndexFormatTooNewException ex) {
            shard.failShard("recovery", ex);
            throw ex;
        }
        for (String name : snapshot.getFileNames()) {
            final StoreFileMetadata md = recoverySourceMetadata.get(name);
            if (md == null) {
                logger.info("Snapshot differs from actual index for file: {} meta: {}", name, recoverySourceMetadata.asMap());
                throw new CorruptIndexException("Snapshot differs from actual index - maybe index was removed metadata has " + recoverySourceMetadata.asMap().size() + " files", name);
            }
        }
        if (canSkipPhase1(recoverySourceMetadata, request.metadataSnapshot()) == false) {
            final List<String> phase1FileNames = new ArrayList<>();
            final List<Long> phase1FileSizes = new ArrayList<>();
            final List<String> phase1ExistingFileNames = new ArrayList<>();
            final List<Long> phase1ExistingFileSizes = new ArrayList<>();
            // Total size of segment files that are recovered
            long totalSizeInBytes = 0;
            // Total size of segment files that were able to be re-used
            long existingTotalSizeInBytes = 0;
            // Generate a "diff" of all the identical, different, and missing
            // segment files on the target node, using the existing files on
            // the source node
            final Store.RecoveryDiff diff = recoverySourceMetadata.recoveryDiff(request.metadataSnapshot());
            for (StoreFileMetadata md : diff.identical) {
                phase1ExistingFileNames.add(md.name());
                phase1ExistingFileSizes.add(md.length());
                existingTotalSizeInBytes += md.length();
                if (logger.isTraceEnabled()) {
                    logger.trace("recovery [phase1]: not recovering [{}], exist in local store and has checksum [{}]," + " size [{}]", md.name(), md.checksum(), md.length());
                }
                totalSizeInBytes += md.length();
            }
            List<StoreFileMetadata> phase1Files = new ArrayList<>(diff.different.size() + diff.missing.size());
            phase1Files.addAll(diff.different);
            phase1Files.addAll(diff.missing);
            for (StoreFileMetadata md : phase1Files) {
                if (request.metadataSnapshot().asMap().containsKey(md.name())) {
                    logger.trace("recovery [phase1]: recovering [{}], exists in local store, but is different: remote [{}], local [{}]", md.name(), request.metadataSnapshot().asMap().get(md.name()), md);
                } else {
                    logger.trace("recovery [phase1]: recovering [{}], does not exist in remote", md.name());
                }
                phase1FileNames.add(md.name());
                phase1FileSizes.add(md.length());
                totalSizeInBytes += md.length();
            }
            logger.trace("recovery [phase1]: recovering_files [{}] with total_size [{}], reusing_files [{}] with total_size [{}]", phase1FileNames.size(), new ByteSizeValue(totalSizeInBytes), phase1ExistingFileNames.size(), new ByteSizeValue(existingTotalSizeInBytes));
            final StepListener<Void> sendFileInfoStep = new StepListener<>();
            final StepListener<Void> sendFilesStep = new StepListener<>();
            final StepListener<RetentionLease> createRetentionLeaseStep = new StepListener<>();
            final StepListener<Void> cleanFilesStep = new StepListener<>();
            cancellableThreads.checkForCancel();
            recoveryTarget.receiveFileInfo(phase1FileNames, phase1FileSizes, phase1ExistingFileNames, phase1ExistingFileSizes, translogOps.getAsInt(), sendFileInfoStep);
            sendFileInfoStep.whenComplete(r -> sendFiles(store, phase1Files.toArray(new StoreFileMetadata[0]), translogOps, sendFilesStep), listener::onFailure);
            sendFilesStep.whenComplete(r -> createRetentionLease(startingSeqNo, createRetentionLeaseStep), listener::onFailure);
            createRetentionLeaseStep.whenComplete(retentionLease -> {
                final long lastKnownGlobalCheckpoint = shard.getLastKnownGlobalCheckpoint();
                assert retentionLease == null || retentionLease.retainingSequenceNumber() - 1 <= lastKnownGlobalCheckpoint : retentionLease + " vs " + lastKnownGlobalCheckpoint;
                // Establishes new empty translog on the replica with global checkpoint set to lastKnownGlobalCheckpoint. We want
                // the commit we just copied to be a safe commit on the replica, so why not set the global checkpoint on the replica
                // to the max seqno of this commit? Because (in rare corner cases) this commit might not be a safe commit here on
                // the primary, and in these cases the max seqno would be too high to be valid as a global checkpoint.
                cleanFiles(store, recoverySourceMetadata, translogOps, lastKnownGlobalCheckpoint, cleanFilesStep);
            }, listener::onFailure);
            final long totalSize = totalSizeInBytes;
            final long existingTotalSize = existingTotalSizeInBytes;
            cleanFilesStep.whenComplete(r -> {
                final TimeValue took = stopWatch.totalTime();
                logger.trace("recovery [phase1]: took [{}]", took);
                listener.onResponse(new SendFileResult(phase1FileNames, phase1FileSizes, totalSize, phase1ExistingFileNames, phase1ExistingFileSizes, existingTotalSize, took));
            }, listener::onFailure);
        } else {
            logger.trace("skipping [phase1] since source and target have identical sync id [{}]", recoverySourceMetadata.getSyncId());
            // but we must still create a retention lease
            final StepListener<RetentionLease> createRetentionLeaseStep = new StepListener<>();
            createRetentionLease(startingSeqNo, createRetentionLeaseStep);
            createRetentionLeaseStep.whenComplete(retentionLease -> {
                final TimeValue took = stopWatch.totalTime();
                logger.trace("recovery [phase1]: took [{}]", took);
                listener.onResponse(new SendFileResult(Collections.emptyList(), Collections.emptyList(), 0L, Collections.emptyList(), Collections.emptyList(), 0L, took));
            }, listener::onFailure);
        }
    } catch (Exception e) {
        throw new RecoverFilesRecoveryException(request.shardId(), 0, new ByteSizeValue(0L), e);
    }
}
Also used : CopyOnWriteArrayList(java.util.concurrent.CopyOnWriteArrayList) ArrayList(java.util.ArrayList) ByteSizeValue(org.elasticsearch.common.unit.ByteSizeValue) Store(org.elasticsearch.index.store.Store) StoreFileMetadata(org.elasticsearch.index.store.StoreFileMetadata) IndexFormatTooOldException(org.apache.lucene.index.IndexFormatTooOldException) TimeValue(io.crate.common.unit.TimeValue) CorruptIndexException(org.apache.lucene.index.CorruptIndexException) IndexFormatTooNewException(org.apache.lucene.index.IndexFormatTooNewException) RecoveryEngineException(org.elasticsearch.index.engine.RecoveryEngineException) RetentionLeaseNotFoundException(org.elasticsearch.index.seqno.RetentionLeaseNotFoundException) CorruptIndexException(org.apache.lucene.index.CorruptIndexException) RemoteTransportException(org.elasticsearch.transport.RemoteTransportException) IndexShardClosedException(org.elasticsearch.index.shard.IndexShardClosedException) IndexShardRelocatedException(org.elasticsearch.index.shard.IndexShardRelocatedException) IOException(java.io.IOException) IndexFormatTooOldException(org.apache.lucene.index.IndexFormatTooOldException) StopWatch(org.elasticsearch.common.StopWatch) RetentionLease(org.elasticsearch.index.seqno.RetentionLease) AtomicLong(java.util.concurrent.atomic.AtomicLong) StepListener(org.elasticsearch.action.StepListener) IndexFormatTooNewException(org.apache.lucene.index.IndexFormatTooNewException)

Aggregations

RetentionLease (org.elasticsearch.index.seqno.RetentionLease)17 ArrayList (java.util.ArrayList)12 StoreFileMetadata (org.elasticsearch.index.store.StoreFileMetadata)9 RetentionLeases (org.elasticsearch.index.seqno.RetentionLeases)8 AtomicLong (java.util.concurrent.atomic.AtomicLong)7 IOException (java.io.IOException)5 RoutingAllocation (org.elasticsearch.cluster.routing.allocation.RoutingAllocation)5 Store (org.elasticsearch.index.store.Store)5 TimeValue (io.crate.common.unit.TimeValue)4 List (java.util.List)4 CopyOnWriteArrayList (java.util.concurrent.CopyOnWriteArrayList)4 CorruptIndexException (org.apache.lucene.index.CorruptIndexException)4 ActionListener (org.elasticsearch.action.ActionListener)4 StepListener (org.elasticsearch.action.StepListener)4 BytesReference (org.elasticsearch.common.bytes.BytesReference)4 RecoveryEngineException (org.elasticsearch.index.engine.RecoveryEngineException)4 IndexShardRelocatedException (org.elasticsearch.index.shard.IndexShardRelocatedException)4 IOUtils (io.crate.common.io.IOUtils)3 Closeable (java.io.Closeable)3 Arrays (java.util.Arrays)3