Search in sources :

Example 46 with AtomicReference

use of java.util.concurrent.atomic.AtomicReference in project crate by crate.

the class BulkShardProcessorTest method testNonEsRejectedExceptionDoesNotResultInRetryButAborts.

@Test
public void testNonEsRejectedExceptionDoesNotResultInRetryButAborts() throws Throwable {
    expectedException.expect(RuntimeException.class);
    expectedException.expectMessage("a random exception");
    final AtomicReference<ActionListener<ShardResponse>> ref = new AtomicReference<>();
    BulkRequestExecutor<ShardUpsertRequest> transportShardBulkAction = (request, listener) -> ref.set(listener);
    BulkRetryCoordinator bulkRetryCoordinator = new BulkRetryCoordinator(threadPool);
    BulkRetryCoordinatorPool coordinatorPool = mock(BulkRetryCoordinatorPool.class);
    when(coordinatorPool.coordinator(any(ShardId.class))).thenReturn(bulkRetryCoordinator);
    ShardUpsertRequest.Builder builder = new ShardUpsertRequest.Builder(TimeValue.timeValueMillis(10), false, false, null, new Reference[] { fooRef }, UUID.randomUUID());
    final BulkShardProcessor<ShardUpsertRequest> bulkShardProcessor = new BulkShardProcessor<>(clusterService, mock(TransportBulkCreateIndicesAction.class), new IndexNameExpressionResolver(Settings.EMPTY), Settings.EMPTY, coordinatorPool, false, 1, builder, transportShardBulkAction, UUID.randomUUID());
    bulkShardProcessor.add("foo", new ShardUpsertRequest.Item("1", null, new Object[] { "bar1" }, null), null);
    ActionListener<ShardResponse> listener = ref.get();
    listener.onFailure(new RuntimeException("a random exception"));
    assertFalse(bulkShardProcessor.add("foo", new ShardUpsertRequest.Item("2", null, new Object[] { "bar2" }, null), null));
    try {
        bulkShardProcessor.result().get();
    } catch (ExecutionException e) {
        throw e.getCause();
    } finally {
        bulkShardProcessor.close();
    }
}
Also used : TransportBulkCreateIndicesAction(org.elasticsearch.action.admin.indices.create.TransportBulkCreateIndicesAction) ShardId(org.elasticsearch.index.shard.ShardId) CoreMatchers.is(org.hamcrest.CoreMatchers.is) Matchers.isA(org.hamcrest.Matchers.isA) ShardIterator(org.elasticsearch.cluster.routing.ShardIterator) Matchers(org.mockito.Matchers) Mock(org.mockito.Mock) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) AtomicReference(java.util.concurrent.atomic.AtomicReference) Matchers.anyString(org.mockito.Matchers.anyString) ClusterState(org.elasticsearch.cluster.ClusterState) Settings(org.elasticsearch.common.settings.Settings) ClusterService(org.elasticsearch.cluster.ClusterService) TimeValue(org.elasticsearch.common.unit.TimeValue) ShardResponse(io.crate.executor.transport.ShardResponse) ThreadPool(org.elasticsearch.threadpool.ThreadPool) OperationRouting(org.elasticsearch.cluster.routing.OperationRouting) Answers(org.mockito.Answers) java.util.concurrent(java.util.concurrent) Reference(io.crate.metadata.Reference) Test(org.junit.Test) TableIdent(io.crate.metadata.TableIdent) UUID(java.util.UUID) Mockito.when(org.mockito.Mockito.when) CrateUnitTest(io.crate.test.integration.CrateUnitTest) Matchers.any(org.mockito.Matchers.any) ShardUpsertRequest(io.crate.executor.transport.ShardUpsertRequest) RowGranularity(io.crate.metadata.RowGranularity) EsRejectedExecutionException(org.elasticsearch.common.util.concurrent.EsRejectedExecutionException) DataTypes(io.crate.types.DataTypes) ReferenceIdent(io.crate.metadata.ReferenceIdent) IndexNameExpressionResolver(org.elasticsearch.cluster.metadata.IndexNameExpressionResolver) ActionListener(org.elasticsearch.action.ActionListener) Mockito.mock(org.mockito.Mockito.mock) ShardUpsertRequest(io.crate.executor.transport.ShardUpsertRequest) AtomicReference(java.util.concurrent.atomic.AtomicReference) ShardId(org.elasticsearch.index.shard.ShardId) ShardResponse(io.crate.executor.transport.ShardResponse) ActionListener(org.elasticsearch.action.ActionListener) TransportBulkCreateIndicesAction(org.elasticsearch.action.admin.indices.create.TransportBulkCreateIndicesAction) IndexNameExpressionResolver(org.elasticsearch.cluster.metadata.IndexNameExpressionResolver) EsRejectedExecutionException(org.elasticsearch.common.util.concurrent.EsRejectedExecutionException) Test(org.junit.Test) CrateUnitTest(io.crate.test.integration.CrateUnitTest)

Example 47 with AtomicReference

use of java.util.concurrent.atomic.AtomicReference in project crate by crate.

the class BulkShardProcessorTest method testThatAddAfterFailureBlocksDueToRetry.

@Test
public void testThatAddAfterFailureBlocksDueToRetry() throws Exception {
    ClusterService clusterService = mock(ClusterService.class);
    OperationRouting operationRouting = mock(OperationRouting.class);
    mockShard(operationRouting, 1);
    mockShard(operationRouting, 2);
    mockShard(operationRouting, 3);
    when(clusterService.operationRouting()).thenReturn(operationRouting);
    // listener will be executed 2 times, once for the successfully added row and once for the failure
    final CountDownLatch listenerLatch = new CountDownLatch(2);
    final AtomicReference<ActionListener<ShardResponse>> ref = new AtomicReference<>();
    BulkRequestExecutor<ShardUpsertRequest> transportShardBulkAction = (request, listener) -> {
        ref.set(listener);
        listenerLatch.countDown();
    };
    BulkRetryCoordinator bulkRetryCoordinator = new BulkRetryCoordinator(threadPool);
    BulkRetryCoordinatorPool coordinatorPool = mock(BulkRetryCoordinatorPool.class);
    when(coordinatorPool.coordinator(any(ShardId.class))).thenReturn(bulkRetryCoordinator);
    ShardUpsertRequest.Builder builder = new ShardUpsertRequest.Builder(TimeValue.timeValueMillis(10), false, false, null, new Reference[] { fooRef }, UUID.randomUUID());
    final BulkShardProcessor<ShardUpsertRequest> bulkShardProcessor = new BulkShardProcessor<>(clusterService, mock(TransportBulkCreateIndicesAction.class), new IndexNameExpressionResolver(Settings.EMPTY), Settings.EMPTY, coordinatorPool, false, 1, builder, transportShardBulkAction, UUID.randomUUID());
    bulkShardProcessor.add("foo", new ShardUpsertRequest.Item("1", null, new Object[] { "bar1" }, null), null);
    final ActionListener<ShardResponse> listener = ref.get();
    listener.onFailure(new EsRejectedExecutionException());
    // wait, failure retry lock is done in decoupled thread
    listenerLatch.await(10, TimeUnit.SECONDS);
    final ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(2);
    try {
        final AtomicBoolean hadBlocked = new AtomicBoolean(false);
        final AtomicBoolean hasBlocked = new AtomicBoolean(true);
        final CountDownLatch latch = new CountDownLatch(1);
        scheduledExecutorService.execute(new Runnable() {

            @Override
            public void run() {
                scheduledExecutorService.schedule(new Runnable() {

                    @Override
                    public void run() {
                        hadBlocked.set(hasBlocked.get());
                        latch.countDown();
                    }
                }, 10, TimeUnit.MILLISECONDS);
                bulkShardProcessor.add("foo", new ShardUpsertRequest.Item("2", null, new Object[] { "bar2" }, null), null);
                hasBlocked.set(false);
            }
        });
        latch.await();
        assertTrue(hadBlocked.get());
    } finally {
        scheduledExecutorService.shutdownNow();
    }
}
Also used : TransportBulkCreateIndicesAction(org.elasticsearch.action.admin.indices.create.TransportBulkCreateIndicesAction) ShardId(org.elasticsearch.index.shard.ShardId) CoreMatchers.is(org.hamcrest.CoreMatchers.is) Matchers.isA(org.hamcrest.Matchers.isA) ShardIterator(org.elasticsearch.cluster.routing.ShardIterator) Matchers(org.mockito.Matchers) Mock(org.mockito.Mock) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) AtomicReference(java.util.concurrent.atomic.AtomicReference) Matchers.anyString(org.mockito.Matchers.anyString) ClusterState(org.elasticsearch.cluster.ClusterState) Settings(org.elasticsearch.common.settings.Settings) ClusterService(org.elasticsearch.cluster.ClusterService) TimeValue(org.elasticsearch.common.unit.TimeValue) ShardResponse(io.crate.executor.transport.ShardResponse) ThreadPool(org.elasticsearch.threadpool.ThreadPool) OperationRouting(org.elasticsearch.cluster.routing.OperationRouting) Answers(org.mockito.Answers) java.util.concurrent(java.util.concurrent) Reference(io.crate.metadata.Reference) Test(org.junit.Test) TableIdent(io.crate.metadata.TableIdent) UUID(java.util.UUID) Mockito.when(org.mockito.Mockito.when) CrateUnitTest(io.crate.test.integration.CrateUnitTest) Matchers.any(org.mockito.Matchers.any) ShardUpsertRequest(io.crate.executor.transport.ShardUpsertRequest) RowGranularity(io.crate.metadata.RowGranularity) EsRejectedExecutionException(org.elasticsearch.common.util.concurrent.EsRejectedExecutionException) DataTypes(io.crate.types.DataTypes) ReferenceIdent(io.crate.metadata.ReferenceIdent) IndexNameExpressionResolver(org.elasticsearch.cluster.metadata.IndexNameExpressionResolver) ActionListener(org.elasticsearch.action.ActionListener) Mockito.mock(org.mockito.Mockito.mock) ShardId(org.elasticsearch.index.shard.ShardId) ShardResponse(io.crate.executor.transport.ShardResponse) OperationRouting(org.elasticsearch.cluster.routing.OperationRouting) EsRejectedExecutionException(org.elasticsearch.common.util.concurrent.EsRejectedExecutionException) ShardUpsertRequest(io.crate.executor.transport.ShardUpsertRequest) AtomicReference(java.util.concurrent.atomic.AtomicReference) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) ClusterService(org.elasticsearch.cluster.ClusterService) ActionListener(org.elasticsearch.action.ActionListener) TransportBulkCreateIndicesAction(org.elasticsearch.action.admin.indices.create.TransportBulkCreateIndicesAction) IndexNameExpressionResolver(org.elasticsearch.cluster.metadata.IndexNameExpressionResolver) Test(org.junit.Test) CrateUnitTest(io.crate.test.integration.CrateUnitTest)

Example 48 with AtomicReference

use of java.util.concurrent.atomic.AtomicReference in project crate by crate.

the class BulkShardProcessorTest method testKill.

@Test
public void testKill() throws Exception {
    ClusterService clusterService = mock(ClusterService.class);
    OperationRouting operationRouting = mock(OperationRouting.class);
    mockShard(operationRouting, 1);
    mockShard(operationRouting, 2);
    mockShard(operationRouting, 3);
    when(clusterService.operationRouting()).thenReturn(operationRouting);
    final AtomicReference<ActionListener<ShardResponse>> ref = new AtomicReference<>();
    BulkRequestExecutor<ShardUpsertRequest> transportShardBulkAction = (request, listener) -> ref.set(listener);
    BulkRetryCoordinator bulkRetryCoordinator = new BulkRetryCoordinator(threadPool);
    BulkRetryCoordinatorPool coordinatorPool = mock(BulkRetryCoordinatorPool.class);
    when(coordinatorPool.coordinator(any(ShardId.class))).thenReturn(bulkRetryCoordinator);
    ShardUpsertRequest.Builder builder = new ShardUpsertRequest.Builder(TimeValue.timeValueMillis(10), false, false, null, new Reference[] { fooRef }, UUID.randomUUID());
    final BulkShardProcessor<ShardUpsertRequest> bulkShardProcessor = new BulkShardProcessor<>(clusterService, mock(TransportBulkCreateIndicesAction.class), new IndexNameExpressionResolver(Settings.EMPTY), Settings.EMPTY, coordinatorPool, false, 1, builder, transportShardBulkAction, UUID.randomUUID());
    assertThat(bulkShardProcessor.add("foo", new ShardUpsertRequest.Item("1", null, new Object[] { "bar1" }, null), null), is(true));
    bulkShardProcessor.kill(new InterruptedException());
    // A InterruptedException is thrown
    expectedException.expect(ExecutionException.class);
    expectedException.expectCause(isA(InterruptedException.class));
    bulkShardProcessor.result().get();
    // it's not possible to add more
    assertThat(bulkShardProcessor.add("foo", new ShardUpsertRequest.Item("1", null, new Object[] { "bar1" }, null), null), is(false));
}
Also used : TransportBulkCreateIndicesAction(org.elasticsearch.action.admin.indices.create.TransportBulkCreateIndicesAction) ShardId(org.elasticsearch.index.shard.ShardId) CoreMatchers.is(org.hamcrest.CoreMatchers.is) Matchers.isA(org.hamcrest.Matchers.isA) ShardIterator(org.elasticsearch.cluster.routing.ShardIterator) Matchers(org.mockito.Matchers) Mock(org.mockito.Mock) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) AtomicReference(java.util.concurrent.atomic.AtomicReference) Matchers.anyString(org.mockito.Matchers.anyString) ClusterState(org.elasticsearch.cluster.ClusterState) Settings(org.elasticsearch.common.settings.Settings) ClusterService(org.elasticsearch.cluster.ClusterService) TimeValue(org.elasticsearch.common.unit.TimeValue) ShardResponse(io.crate.executor.transport.ShardResponse) ThreadPool(org.elasticsearch.threadpool.ThreadPool) OperationRouting(org.elasticsearch.cluster.routing.OperationRouting) Answers(org.mockito.Answers) java.util.concurrent(java.util.concurrent) Reference(io.crate.metadata.Reference) Test(org.junit.Test) TableIdent(io.crate.metadata.TableIdent) UUID(java.util.UUID) Mockito.when(org.mockito.Mockito.when) CrateUnitTest(io.crate.test.integration.CrateUnitTest) Matchers.any(org.mockito.Matchers.any) ShardUpsertRequest(io.crate.executor.transport.ShardUpsertRequest) RowGranularity(io.crate.metadata.RowGranularity) EsRejectedExecutionException(org.elasticsearch.common.util.concurrent.EsRejectedExecutionException) DataTypes(io.crate.types.DataTypes) ReferenceIdent(io.crate.metadata.ReferenceIdent) IndexNameExpressionResolver(org.elasticsearch.cluster.metadata.IndexNameExpressionResolver) ActionListener(org.elasticsearch.action.ActionListener) Mockito.mock(org.mockito.Mockito.mock) ShardUpsertRequest(io.crate.executor.transport.ShardUpsertRequest) AtomicReference(java.util.concurrent.atomic.AtomicReference) ShardId(org.elasticsearch.index.shard.ShardId) ClusterService(org.elasticsearch.cluster.ClusterService) ActionListener(org.elasticsearch.action.ActionListener) OperationRouting(org.elasticsearch.cluster.routing.OperationRouting) TransportBulkCreateIndicesAction(org.elasticsearch.action.admin.indices.create.TransportBulkCreateIndicesAction) IndexNameExpressionResolver(org.elasticsearch.cluster.metadata.IndexNameExpressionResolver) Test(org.junit.Test) CrateUnitTest(io.crate.test.integration.CrateUnitTest)

Example 49 with AtomicReference

use of java.util.concurrent.atomic.AtomicReference in project crate by crate.

the class BlobRecoverySourceHandler method phase1.

/**
     * Perform phase1 of the recovery operations. Once this {@link SnapshotIndexCommit}
     * snapshot has been performed no commit operations (files being fsync'd)
     * are effectively allowed on this index until all recovery phases are done
     * <p/>
     * Phase1 examines the segment files on the target node and copies over the
     * segments that are missing. Only segments that have the same size and
     * checksum can be reused
     */
public void phase1(final SnapshotIndexCommit snapshot, final Translog.View translogView) {
    cancellableThreads.checkForCancel();
    // Total size of segment files that are recovered
    long totalSize = 0;
    // Total size of segment files that were able to be re-used
    long existingTotalSize = 0;
    final Store store = shard.store();
    store.incRef();
    try {
        // CRATE CHANGE
        if (blobRecoveryHandler != null) {
            blobRecoveryHandler.phase1();
        }
        StopWatch stopWatch = new StopWatch().start();
        final Store.MetadataSnapshot recoverySourceMetadata;
        try {
            recoverySourceMetadata = store.getMetadata(snapshot);
        } catch (CorruptIndexException | IndexFormatTooOldException | IndexFormatTooNewException ex) {
            shard.engine().failEngine("recovery", ex);
            throw ex;
        }
        for (String name : snapshot.getFiles()) {
            final StoreFileMetaData md = recoverySourceMetadata.get(name);
            if (md == null) {
                logger.info("Snapshot differs from actual index for file: {} meta: {}", name, recoverySourceMetadata.asMap());
                throw new CorruptIndexException("Snapshot differs from actual index - maybe index was removed metadata has " + recoverySourceMetadata.asMap().size() + " files", name);
            }
        }
        // Generate a "diff" of all the identical, different, and missing
        // segment files on the target node, using the existing files on
        // the source node
        String recoverySourceSyncId = recoverySourceMetadata.getSyncId();
        String recoveryTargetSyncId = request.metadataSnapshot().getSyncId();
        final boolean recoverWithSyncId = recoverySourceSyncId != null && recoverySourceSyncId.equals(recoveryTargetSyncId);
        if (recoverWithSyncId) {
            final long numDocsTarget = request.metadataSnapshot().getNumDocs();
            final long numDocsSource = recoverySourceMetadata.getNumDocs();
            if (numDocsTarget != numDocsSource) {
                throw new IllegalStateException("try to recover " + request.shardId() + " from primary shard with sync id but number of docs differ: " + numDocsTarget + " (" + request.sourceNode().getName() + ", primary) vs " + numDocsSource + "(" + request.targetNode().getName() + ")");
            }
            // we shortcut recovery here because we have nothing to copy. but we must still start the engine on the target.
            // so we don't return here
            logger.trace("[{}][{}] skipping [phase1] to {} - identical sync id [{}] found on both source and target", indexName, shardId, request.targetNode(), recoverySourceSyncId);
        } else {
            final Store.RecoveryDiff diff = recoverySourceMetadata.recoveryDiff(request.metadataSnapshot());
            for (StoreFileMetaData md : diff.identical) {
                response.phase1ExistingFileNames.add(md.name());
                response.phase1ExistingFileSizes.add(md.length());
                existingTotalSize += md.length();
                if (logger.isTraceEnabled()) {
                    logger.trace("[{}][{}] recovery [phase1] to {}: not recovering [{}], exists in local store and has checksum [{}], size [{}]", indexName, shardId, request.targetNode(), md.name(), md.checksum(), md.length());
                }
                totalSize += md.length();
            }
            for (StoreFileMetaData md : Iterables.concat(diff.different, diff.missing)) {
                if (request.metadataSnapshot().asMap().containsKey(md.name())) {
                    logger.trace("[{}][{}] recovery [phase1] to {}: recovering [{}], exists in local store, but is different: remote [{}], local [{}]", indexName, shardId, request.targetNode(), md.name(), request.metadataSnapshot().asMap().get(md.name()), md);
                } else {
                    logger.trace("[{}][{}] recovery [phase1] to {}: recovering [{}], does not exists in remote", indexName, shardId, request.targetNode(), md.name());
                }
                response.phase1FileNames.add(md.name());
                response.phase1FileSizes.add(md.length());
                totalSize += md.length();
            }
            response.phase1TotalSize = totalSize;
            response.phase1ExistingTotalSize = existingTotalSize;
            logger.trace("[{}][{}] recovery [phase1] to {}: recovering_files [{}] with total_size [{}], reusing_files [{}] with total_size [{}]", indexName, shardId, request.targetNode(), response.phase1FileNames.size(), new ByteSizeValue(totalSize), response.phase1ExistingFileNames.size(), new ByteSizeValue(existingTotalSize));
            cancellableThreads.execute(new Interruptable() {

                @Override
                public void run() throws InterruptedException {
                    RecoveryFilesInfoRequest recoveryInfoFilesRequest = new RecoveryFilesInfoRequest(request.recoveryId(), request.shardId(), response.phase1FileNames, response.phase1FileSizes, response.phase1ExistingFileNames, response.phase1ExistingFileSizes, translogView.totalOperations());
                    transportService.submitRequest(request.targetNode(), RecoveryTarget.Actions.FILES_INFO, recoveryInfoFilesRequest, TransportRequestOptions.builder().withTimeout(recoverySettings.internalActionTimeout()).build(), EmptyTransportResponseHandler.INSTANCE_SAME).txGet();
                }
            });
            // This latch will be used to wait until all files have been transferred to the target node
            final CountDownLatch latch = new CountDownLatch(response.phase1FileNames.size());
            final CopyOnWriteArrayList<Throwable> exceptions = new CopyOnWriteArrayList<>();
            final AtomicReference<Throwable> corruptedEngine = new AtomicReference<>();
            int fileIndex = 0;
            ThreadPoolExecutor pool;
            // How many bytes we've copied since we last called RateLimiter.pause
            final AtomicLong bytesSinceLastPause = new AtomicLong();
            for (final String name : response.phase1FileNames) {
                long fileSize = response.phase1FileSizes.get(fileIndex);
                // separately.
                if (fileSize > RecoverySettings.SMALL_FILE_CUTOFF_BYTES) {
                    pool = recoverySettings.concurrentStreamPool();
                } else {
                    pool = recoverySettings.concurrentSmallFileStreamPool();
                }
                pool.execute(new AbstractRunnable() {

                    @Override
                    public void onFailure(Throwable t) {
                        // we either got rejected or the store can't be incremented / we are canceled
                        logger.debug("Failed to transfer file [" + name + "] on recovery");
                    }

                    @Override
                    public void onAfter() {
                        // Signify this file has completed by decrementing the latch
                        latch.countDown();
                    }

                    @Override
                    protected void doRun() {
                        cancellableThreads.checkForCancel();
                        store.incRef();
                        final StoreFileMetaData md = recoverySourceMetadata.get(name);
                        try (final IndexInput indexInput = store.directory().openInput(name, IOContext.READONCE)) {
                            // at least one!
                            final int BUFFER_SIZE = (int) Math.max(1, recoverySettings.fileChunkSize().getBytes());
                            final byte[] buf = new byte[BUFFER_SIZE];
                            boolean shouldCompressRequest = recoverySettings.compress();
                            if (CompressorFactory.isCompressed(indexInput)) {
                                shouldCompressRequest = false;
                            }
                            final long len = indexInput.length();
                            long readCount = 0;
                            final TransportRequestOptions requestOptions = TransportRequestOptions.builder().withCompress(shouldCompressRequest).withType(TransportRequestOptions.Type.RECOVERY).withTimeout(recoverySettings.internalActionTimeout()).build();
                            while (readCount < len) {
                                if (shard.state() == IndexShardState.CLOSED) {
                                    // check if the shard got closed on us
                                    throw new IndexShardClosedException(shard.shardId());
                                }
                                int toRead = readCount + BUFFER_SIZE > len ? (int) (len - readCount) : BUFFER_SIZE;
                                final long position = indexInput.getFilePointer();
                                // Pause using the rate limiter, if desired, to throttle the recovery
                                RateLimiter rl = recoverySettings.rateLimiter();
                                long throttleTimeInNanos = 0;
                                if (rl != null) {
                                    long bytes = bytesSinceLastPause.addAndGet(toRead);
                                    if (bytes > rl.getMinPauseCheckBytes()) {
                                        // Time to pause
                                        bytesSinceLastPause.addAndGet(-bytes);
                                        throttleTimeInNanos = rl.pause(bytes);
                                        shard.recoveryStats().addThrottleTime(throttleTimeInNanos);
                                    }
                                }
                                indexInput.readBytes(buf, 0, toRead, false);
                                final BytesArray content = new BytesArray(buf, 0, toRead);
                                readCount += toRead;
                                final boolean lastChunk = readCount == len;
                                final RecoveryFileChunkRequest fileChunkRequest = new RecoveryFileChunkRequest(request.recoveryId(), request.shardId(), md, position, content, lastChunk, translogView.totalOperations(), throttleTimeInNanos);
                                cancellableThreads.execute(new Interruptable() {

                                    @Override
                                    public void run() throws InterruptedException {
                                        // Actually send the file chunk to the target node, waiting for it to complete
                                        transportService.submitRequest(request.targetNode(), RecoveryTarget.Actions.FILE_CHUNK, fileChunkRequest, requestOptions, EmptyTransportResponseHandler.INSTANCE_SAME).txGet();
                                    }
                                });
                            }
                        } catch (Throwable e) {
                            final Throwable corruptIndexException;
                            if ((corruptIndexException = ExceptionsHelper.unwrapCorruption(e)) != null) {
                                if (store.checkIntegrityNoException(md) == false) {
                                    // we are corrupted on the primary -- fail!
                                    logger.warn("{} Corrupted file detected {} checksum mismatch", shard.shardId(), md);
                                    if (corruptedEngine.compareAndSet(null, corruptIndexException) == false) {
                                        // if we are not the first exception, add ourselves as suppressed to the main one:
                                        corruptedEngine.get().addSuppressed(e);
                                    }
                                } else {
                                    // corruption has happened on the way to replica
                                    RemoteTransportException exception = new RemoteTransportException("File corruption occurred on recovery but checksums are ok", null);
                                    exception.addSuppressed(e);
                                    // last exception first
                                    exceptions.add(0, exception);
                                    logger.warn("{} Remote file corruption on node {}, recovering {}. local checksum OK", corruptIndexException, shard.shardId(), request.targetNode(), md);
                                }
                            } else {
                                // last exceptions first
                                exceptions.add(0, e);
                            }
                        } finally {
                            store.decRef();
                        }
                    }
                });
                fileIndex++;
            }
            cancellableThreads.execute(new Interruptable() {

                @Override
                public void run() throws InterruptedException {
                    // Wait for all files that need to be transferred to finish transferring
                    latch.await();
                }
            });
            if (corruptedEngine.get() != null) {
                shard.engine().failEngine("recovery", corruptedEngine.get());
                throw corruptedEngine.get();
            } else {
                ExceptionsHelper.rethrowAndSuppress(exceptions);
            }
            cancellableThreads.execute(new Interruptable() {

                @Override
                public void run() throws InterruptedException {
                    // are deleted
                    try {
                        transportService.submitRequest(request.targetNode(), RecoveryTarget.Actions.CLEAN_FILES, new RecoveryCleanFilesRequest(request.recoveryId(), shard.shardId(), recoverySourceMetadata, translogView.totalOperations()), TransportRequestOptions.builder().withTimeout(recoverySettings.internalActionTimeout()).build(), EmptyTransportResponseHandler.INSTANCE_SAME).txGet();
                    } catch (RemoteTransportException remoteException) {
                        final IOException corruptIndexException;
                        //   - maybe due to old segments without checksums or length only checks
                        if ((corruptIndexException = ExceptionsHelper.unwrapCorruption(remoteException)) != null) {
                            try {
                                final Store.MetadataSnapshot recoverySourceMetadata = store.getMetadata(snapshot);
                                StoreFileMetaData[] metadata = Iterables.toArray(recoverySourceMetadata, StoreFileMetaData.class);
                                ArrayUtil.timSort(metadata, new Comparator<StoreFileMetaData>() {

                                    @Override
                                    public int compare(StoreFileMetaData o1, StoreFileMetaData o2) {
                                        // check small files first
                                        return Long.compare(o1.length(), o2.length());
                                    }
                                });
                                for (StoreFileMetaData md : metadata) {
                                    logger.debug("{} checking integrity for file {} after remove corruption exception", shard.shardId(), md);
                                    if (store.checkIntegrityNoException(md) == false) {
                                        // we are corrupted on the primary -- fail!
                                        shard.engine().failEngine("recovery", corruptIndexException);
                                        logger.warn("{} Corrupted file detected {} checksum mismatch", shard.shardId(), md);
                                        throw corruptIndexException;
                                    }
                                }
                            } catch (IOException ex) {
                                remoteException.addSuppressed(ex);
                                throw remoteException;
                            }
                            // corruption has happened on the way to replica
                            RemoteTransportException exception = new RemoteTransportException("File corruption occurred on recovery but checksums are ok", null);
                            exception.addSuppressed(remoteException);
                            logger.warn("{} Remote file corruption during finalization on node {}, recovering {}. local checksum OK", corruptIndexException, shard.shardId(), request.targetNode());
                            throw exception;
                        } else {
                            throw remoteException;
                        }
                    }
                }
            });
        }
        prepareTargetForTranslog(translogView);
        logger.trace("[{}][{}] recovery [phase1] to {}: took [{}]", indexName, shardId, request.targetNode(), stopWatch.totalTime());
        response.phase1Time = stopWatch.totalTime().millis();
    } catch (Throwable e) {
        throw new RecoverFilesRecoveryException(request.shardId(), response.phase1FileNames.size(), new ByteSizeValue(totalSize), e);
    } finally {
        store.decRef();
    }
}
Also used : AbstractRunnable(org.elasticsearch.common.util.concurrent.AbstractRunnable) ByteSizeValue(org.elasticsearch.common.unit.ByteSizeValue) Store(org.elasticsearch.index.store.Store) IndexFormatTooOldException(org.apache.lucene.index.IndexFormatTooOldException) StoreFileMetaData(org.elasticsearch.index.store.StoreFileMetaData) IndexInput(org.apache.lucene.store.IndexInput) TransportRequestOptions(org.elasticsearch.transport.TransportRequestOptions) RemoteTransportException(org.elasticsearch.transport.RemoteTransportException) BytesArray(org.elasticsearch.common.bytes.BytesArray) CorruptIndexException(org.apache.lucene.index.CorruptIndexException) Interruptable(org.elasticsearch.common.util.CancellableThreads.Interruptable) AtomicReference(java.util.concurrent.atomic.AtomicReference) IOException(java.io.IOException) CountDownLatch(java.util.concurrent.CountDownLatch) RateLimiter(org.apache.lucene.store.RateLimiter) StopWatch(org.elasticsearch.common.StopWatch) AtomicLong(java.util.concurrent.atomic.AtomicLong) IndexShardClosedException(org.elasticsearch.index.shard.IndexShardClosedException) IndexFormatTooNewException(org.apache.lucene.index.IndexFormatTooNewException) ThreadPoolExecutor(java.util.concurrent.ThreadPoolExecutor) CopyOnWriteArrayList(java.util.concurrent.CopyOnWriteArrayList)

Example 50 with AtomicReference

use of java.util.concurrent.atomic.AtomicReference in project crate by crate.

the class BlobRecoveryHandler method phase1.

public void phase1() throws Exception {
    logger.debug("[{}][{}] recovery [phase1] to {}: start", request.shardId().index().name(), request.shardId().id(), request.targetNode().getName());
    StopWatch stopWatch = new StopWatch().start();
    blobTransferTarget.startRecovery();
    blobTransferTarget.createActiveTransfersSnapshot();
    sendStartRecoveryRequest();
    final AtomicReference<Exception> lastException = new AtomicReference<Exception>();
    try {
        syncVarFiles(lastException);
    } catch (InterruptedException ex) {
        throw new ElasticsearchException("blob recovery phase1 failed", ex);
    }
    Exception exception = lastException.get();
    if (exception != null) {
        throw exception;
    }
    /**
         * as soon as the recovery starts the target node will receive PutChunkReplicaRequests
         * the target node will then request the bytes it is missing from the source node
         * (it is missing bytes from PutChunk/StartBlob requests that happened before the recovery)
         * here we need to block so that the target node has enough time to request the head chunks
         *
         * e.g.
         *      Target Node receives Chunk X with bytes 10-19
         *      Target Node requests bytes 0-9 from Source Node
         *      Source Node sends bytes 0-9
         *      Source Node sets transferTakenOver
         */
    blobTransferTarget.waitForGetHeadRequests(GET_HEAD_TIMEOUT, TimeUnit.SECONDS);
    blobTransferTarget.createActivePutHeadChunkTransfersSnapshot();
    /**
         * After receiving a getHeadRequest the source node starts to send HeadChunks to the target
         * wait for all PutHeadChunk-Runnables to finish before ending the recovery.
         */
    blobTransferTarget.waitUntilPutHeadChunksAreFinished();
    sendFinalizeRecoveryRequest();
    blobTransferTarget.stopRecovery();
    stopWatch.stop();
    logger.debug("[{}][{}] recovery [phase1] to {}: took [{}]", request.shardId().index().name(), request.shardId().id(), request.targetNode().getName(), stopWatch.totalTime());
}
Also used : AtomicReference(java.util.concurrent.atomic.AtomicReference) ElasticsearchException(org.elasticsearch.ElasticsearchException) ElasticsearchException(org.elasticsearch.ElasticsearchException) IndexShardClosedException(org.elasticsearch.index.shard.IndexShardClosedException) IOException(java.io.IOException) StopWatch(org.elasticsearch.common.StopWatch)

Aggregations

AtomicReference (java.util.concurrent.atomic.AtomicReference)1331 Test (org.junit.Test)668 CountDownLatch (java.util.concurrent.CountDownLatch)437 IOException (java.io.IOException)263 AtomicBoolean (java.util.concurrent.atomic.AtomicBoolean)205 AtomicInteger (java.util.concurrent.atomic.AtomicInteger)159 ArrayList (java.util.ArrayList)108 HashMap (java.util.HashMap)105 List (java.util.List)95 Map (java.util.Map)77 Test (org.testng.annotations.Test)76 File (java.io.File)64 ExecutionException (java.util.concurrent.ExecutionException)60 HashSet (java.util.HashSet)54 URI (java.net.URI)48 TimeoutException (java.util.concurrent.TimeoutException)48 HttpServletRequest (javax.servlet.http.HttpServletRequest)48 HttpServletResponse (javax.servlet.http.HttpServletResponse)46 MockResponse (okhttp3.mockwebserver.MockResponse)46 ByteBuffer (java.nio.ByteBuffer)44