Search in sources :

Example 1 with FailedShardEntry

use of org.opensearch.cluster.action.shard.ShardStateAction.FailedShardEntry in project OpenSearch by opensearch-project.

the class ShardStateActionTests method testFailedShardEntrySerialization.

public void testFailedShardEntrySerialization() throws Exception {
    final ShardId shardId = new ShardId(randomRealisticUnicodeOfLengthBetween(10, 100), UUID.randomUUID().toString(), between(0, 1000));
    final String allocationId = randomRealisticUnicodeOfCodepointLengthBetween(10, 100);
    final long primaryTerm = randomIntBetween(0, 100);
    final String message = randomRealisticUnicodeOfCodepointLengthBetween(10, 100);
    final Exception failure = randomBoolean() ? null : getSimulatedFailure();
    final boolean markAsStale = randomBoolean();
    final Version version = randomFrom(randomCompatibleVersion(random(), Version.CURRENT));
    final FailedShardEntry failedShardEntry = new FailedShardEntry(shardId, allocationId, primaryTerm, message, failure, markAsStale);
    try (StreamInput in = serialize(failedShardEntry, version).streamInput()) {
        in.setVersion(version);
        final FailedShardEntry deserialized = new FailedShardEntry(in);
        assertThat(deserialized.shardId, equalTo(shardId));
        assertThat(deserialized.allocationId, equalTo(allocationId));
        assertThat(deserialized.primaryTerm, equalTo(primaryTerm));
        assertThat(deserialized.message, equalTo(message));
        if (failure != null) {
            assertThat(deserialized.failure, notNullValue());
            assertThat(deserialized.failure.getClass(), equalTo(failure.getClass()));
            assertThat(deserialized.failure.getMessage(), equalTo(failure.getMessage()));
        } else {
            assertThat(deserialized.failure, nullValue());
        }
        assertThat(deserialized.markAsStale, equalTo(markAsStale));
        assertEquals(failedShardEntry, deserialized);
    }
}
Also used : ShardId(org.opensearch.index.shard.ShardId) VersionUtils.randomCompatibleVersion(org.opensearch.test.VersionUtils.randomCompatibleVersion) Version(org.opensearch.Version) StreamInput(org.opensearch.common.io.stream.StreamInput) FailedShardEntry(org.opensearch.cluster.action.shard.ShardStateAction.FailedShardEntry) CorruptIndexException(org.apache.lucene.index.CorruptIndexException) NotMasterException(org.opensearch.cluster.NotMasterException) NodeNotConnectedException(org.opensearch.transport.NodeNotConnectedException) FailedToCommitClusterStateException(org.opensearch.cluster.coordination.FailedToCommitClusterStateException) NodeDisconnectedException(org.opensearch.transport.NodeDisconnectedException) TransportException(org.opensearch.transport.TransportException) IOException(java.io.IOException)

Example 2 with FailedShardEntry

use of org.opensearch.cluster.action.shard.ShardStateAction.FailedShardEntry in project OpenSearch by opensearch-project.

the class InSyncAllocationIdTests method testPrimaryFailureBatchedWithReplicaFailure.

/**
 * Assume following scenario: indexing request is written to primary, but fails to be replicated to active replica.
 * The primary instructs master to fail replica before acknowledging write to client. In the meanwhile, primary fails for an unrelated
 * reason. Master now batches both requests to fail primary and replica. We have to make sure that only the allocation id of the primary
 * is kept in the in-sync allocation set before we acknowledge request to client. Otherwise we would acknowledge a write that made it
 * into the primary but not the replica but the replica is still considered non-stale.
 */
public void testPrimaryFailureBatchedWithReplicaFailure() throws Exception {
    ClusterState clusterState = createOnePrimaryOneReplicaClusterState(allocation);
    IndexShardRoutingTable shardRoutingTable = clusterState.routingTable().index("test").shard(0);
    ShardRouting primaryShard = shardRoutingTable.primaryShard();
    ShardRouting replicaShard = shardRoutingTable.replicaShards().get(0);
    long primaryTerm = clusterState.metadata().index("test").primaryTerm(0);
    List<FailedShardEntry> failureEntries = new ArrayList<>();
    failureEntries.add(new FailedShardEntry(shardRoutingTable.shardId(), primaryShard.allocationId().getId(), 0L, "dummy", null, true));
    failureEntries.add(new FailedShardEntry(shardRoutingTable.shardId(), replicaShard.allocationId().getId(), primaryTerm, "dummy", null, true));
    Collections.shuffle(failureEntries, random());
    logger.info("Failing {}", failureEntries);
    clusterState = failedClusterStateTaskExecutor.execute(clusterState, failureEntries).resultingState;
    assertThat(clusterState.metadata().index("test").inSyncAllocationIds(0), equalTo(Collections.singleton(primaryShard.allocationId().getId())));
    // resend shard failures to check if they are ignored
    clusterState = failedClusterStateTaskExecutor.execute(clusterState, failureEntries).resultingState;
    assertThat(clusterState.metadata().index("test").inSyncAllocationIds(0), equalTo(Collections.singleton(primaryShard.allocationId().getId())));
}
Also used : ClusterState(org.opensearch.cluster.ClusterState) IndexShardRoutingTable(org.opensearch.cluster.routing.IndexShardRoutingTable) ArrayList(java.util.ArrayList) FailedShardEntry(org.opensearch.cluster.action.shard.ShardStateAction.FailedShardEntry) ShardRouting(org.opensearch.cluster.routing.ShardRouting)

Example 3 with FailedShardEntry

use of org.opensearch.cluster.action.shard.ShardStateAction.FailedShardEntry in project OpenSearch by opensearch-project.

the class InSyncAllocationIdTests method testDeadNodesBeforeReplicaFailed.

/**
 * Assume following scenario: indexing request is written to primary, but fails to be replicated to active replica.
 * The primary instructs master to fail replica before acknowledging write to client. In the meanwhile, the node of the replica was
 * removed from the cluster (disassociateDeadNodes). This means that the ShardRouting of the replica was failed, but it's allocation
 * id is still part of the in-sync set. We have to make sure that the failShard request from the primary removes the allocation id
 * from the in-sync set.
 */
public void testDeadNodesBeforeReplicaFailed() throws Exception {
    ClusterState clusterState = createOnePrimaryOneReplicaClusterState(allocation);
    logger.info("remove replica node");
    IndexShardRoutingTable shardRoutingTable = clusterState.routingTable().index("test").shard(0);
    ShardRouting replicaShard = shardRoutingTable.replicaShards().get(0);
    clusterState = ClusterState.builder(clusterState).nodes(DiscoveryNodes.builder(clusterState.nodes()).remove(replicaShard.currentNodeId())).build();
    clusterState = allocation.disassociateDeadNodes(clusterState, true, "reroute");
    assertThat(clusterState.metadata().index("test").inSyncAllocationIds(0).size(), equalTo(2));
    logger.info("fail replica (for which there is no shard routing in the CS anymore)");
    assertNull(clusterState.getRoutingNodes().getByAllocationId(replicaShard.shardId(), replicaShard.allocationId().getId()));
    ShardStateAction.ShardFailedClusterStateTaskExecutor failedClusterStateTaskExecutor = new ShardStateAction.ShardFailedClusterStateTaskExecutor(allocation, null, () -> Priority.NORMAL, logger);
    long primaryTerm = clusterState.metadata().index("test").primaryTerm(0);
    clusterState = failedClusterStateTaskExecutor.execute(clusterState, Arrays.asList(new FailedShardEntry(shardRoutingTable.shardId(), replicaShard.allocationId().getId(), primaryTerm, "dummy", null, true))).resultingState;
    assertThat(clusterState.metadata().index("test").inSyncAllocationIds(0).size(), equalTo(1));
}
Also used : ClusterState(org.opensearch.cluster.ClusterState) IndexShardRoutingTable(org.opensearch.cluster.routing.IndexShardRoutingTable) ShardStateAction(org.opensearch.cluster.action.shard.ShardStateAction) FailedShardEntry(org.opensearch.cluster.action.shard.ShardStateAction.FailedShardEntry) ShardRouting(org.opensearch.cluster.routing.ShardRouting)

Example 4 with FailedShardEntry

use of org.opensearch.cluster.action.shard.ShardStateAction.FailedShardEntry in project OpenSearch by opensearch-project.

the class ShardFailedClusterStateTaskExecutorTests method testNonExistentShardsAreMarkedAsSuccessful.

public void testNonExistentShardsAreMarkedAsSuccessful() throws Exception {
    String reason = "test non existent shards are marked as successful";
    ClusterState currentState = createClusterStateWithStartedShards(reason);
    List<FailedShardEntry> tasks = createNonExistentShards(currentState, reason);
    ClusterStateTaskExecutor.ClusterTasksResult<FailedShardEntry> result = executor.execute(clusterState, tasks);
    assertTasksSuccessful(tasks, result, clusterState, false);
}
Also used : ClusterState(org.opensearch.cluster.ClusterState) ClusterStateTaskExecutor(org.opensearch.cluster.ClusterStateTaskExecutor) FailedShardEntry(org.opensearch.cluster.action.shard.ShardStateAction.FailedShardEntry)

Example 5 with FailedShardEntry

use of org.opensearch.cluster.action.shard.ShardStateAction.FailedShardEntry in project OpenSearch by opensearch-project.

the class ShardFailedClusterStateTaskExecutorTests method testIllegalShardFailureRequests.

public void testIllegalShardFailureRequests() throws Exception {
    String reason = "test illegal shard failure requests";
    ClusterState currentState = createClusterStateWithStartedShards(reason);
    List<ShardStateAction.FailedShardEntry> failingTasks = createExistingShards(currentState, reason);
    List<ShardStateAction.FailedShardEntry> tasks = new ArrayList<>();
    for (ShardStateAction.FailedShardEntry failingTask : failingTasks) {
        long primaryTerm = currentState.metadata().index(failingTask.shardId.getIndex()).primaryTerm(failingTask.shardId.id());
        tasks.add(new FailedShardEntry(failingTask.shardId, failingTask.allocationId, randomIntBetween(1, (int) primaryTerm - 1), failingTask.message, failingTask.failure, randomBoolean()));
    }
    List<Tuple<FailedShardEntry, ClusterStateTaskExecutor.TaskResult>> taskResultList = tasks.stream().map(task -> Tuple.tuple(task, ClusterStateTaskExecutor.TaskResult.failure(new ShardStateAction.NoLongerPrimaryShardException(task.shardId, "primary term [" + task.primaryTerm + "] did not match current primary term [" + currentState.metadata().index(task.shardId.getIndex()).primaryTerm(task.shardId.id()) + "]")))).collect(Collectors.toList());
    ClusterStateTaskExecutor.ClusterTasksResult<FailedShardEntry> result = executor.execute(currentState, tasks);
    assertTaskResults(taskResultList, result, currentState, false);
}
Also used : IntStream(java.util.stream.IntStream) DiscoveryNodes(org.opensearch.cluster.node.DiscoveryNodes) FailedShardEntry(org.opensearch.cluster.action.shard.ShardStateAction.FailedShardEntry) Metadata(org.opensearch.cluster.metadata.Metadata) IndexMetadata(org.opensearch.cluster.metadata.IndexMetadata) ShardIterator(org.opensearch.cluster.routing.ShardIterator) CoreMatchers.equalTo(org.hamcrest.CoreMatchers.equalTo) AllocationService(org.opensearch.cluster.routing.allocation.AllocationService) ClusterRebalanceAllocationDecider(org.opensearch.cluster.routing.allocation.decider.ClusterRebalanceAllocationDecider) Version(org.opensearch.Version) Priority(org.opensearch.common.Priority) CorruptIndexException(org.apache.lucene.index.CorruptIndexException) ArrayList(java.util.ArrayList) CoreMatchers.instanceOf(org.hamcrest.CoreMatchers.instanceOf) ClusterState(org.opensearch.cluster.ClusterState) OpenSearchAllocationTestCase(org.opensearch.cluster.OpenSearchAllocationTestCase) ShardRoutingState(org.opensearch.cluster.routing.ShardRoutingState) UUIDs(org.opensearch.common.UUIDs) IndexShardRoutingTable(org.opensearch.cluster.routing.IndexShardRoutingTable) Index(org.opensearch.index.Index) Set(java.util.Set) ClusterStateTaskExecutor(org.opensearch.cluster.ClusterStateTaskExecutor) Settings(org.opensearch.common.settings.Settings) ObjectCursor(com.carrotsearch.hppc.cursors.ObjectCursor) Collectors(java.util.stream.Collectors) Tuple(org.opensearch.common.collect.Tuple) ShardRouting(org.opensearch.cluster.routing.ShardRouting) ShardId(org.opensearch.index.shard.ShardId) GroupShardsIterator(org.opensearch.cluster.routing.GroupShardsIterator) TestShardRouting(org.opensearch.cluster.routing.TestShardRouting) Sets(org.opensearch.common.util.set.Sets) List(java.util.List) FailedShard(org.opensearch.cluster.routing.allocation.FailedShard) StaleShard(org.opensearch.cluster.routing.allocation.StaleShard) Matchers.contains(org.hamcrest.Matchers.contains) ClusterName(org.opensearch.cluster.ClusterName) RoutingTable(org.opensearch.cluster.routing.RoutingTable) Collections(java.util.Collections) ClusterState(org.opensearch.cluster.ClusterState) FailedShardEntry(org.opensearch.cluster.action.shard.ShardStateAction.FailedShardEntry) ArrayList(java.util.ArrayList) ClusterStateTaskExecutor(org.opensearch.cluster.ClusterStateTaskExecutor) FailedShardEntry(org.opensearch.cluster.action.shard.ShardStateAction.FailedShardEntry) Tuple(org.opensearch.common.collect.Tuple)

Aggregations

FailedShardEntry (org.opensearch.cluster.action.shard.ShardStateAction.FailedShardEntry)9 ClusterState (org.opensearch.cluster.ClusterState)7 ClusterStateTaskExecutor (org.opensearch.cluster.ClusterStateTaskExecutor)5 ShardRouting (org.opensearch.cluster.routing.ShardRouting)5 IndexShardRoutingTable (org.opensearch.cluster.routing.IndexShardRoutingTable)4 ArrayList (java.util.ArrayList)3 List (java.util.List)2 CorruptIndexException (org.apache.lucene.index.CorruptIndexException)2 Version (org.opensearch.Version)2 TestShardRouting (org.opensearch.cluster.routing.TestShardRouting)2 Tuple (org.opensearch.common.collect.Tuple)2 ShardId (org.opensearch.index.shard.ShardId)2 ObjectCursor (com.carrotsearch.hppc.cursors.ObjectCursor)1 IOException (java.io.IOException)1 Collections (java.util.Collections)1 Set (java.util.Set)1 Collectors (java.util.stream.Collectors)1 IntStream (java.util.stream.IntStream)1 CoreMatchers.equalTo (org.hamcrest.CoreMatchers.equalTo)1 CoreMatchers.instanceOf (org.hamcrest.CoreMatchers.instanceOf)1