Search in sources :

Example 1 with ReplicaPlan

use of org.apache.cassandra.locator.ReplicaPlan in project cassandra by apache.

the class ShortReadPartitionsProtection method makeAndExecuteFetchAdditionalPartitionReadCommand.

private UnfilteredPartitionIterator makeAndExecuteFetchAdditionalPartitionReadCommand(int toQuery) {
    PartitionRangeReadCommand cmd = (PartitionRangeReadCommand) command;
    DataLimits newLimits = cmd.limits().forShortReadRetry(toQuery);
    AbstractBounds<PartitionPosition> bounds = cmd.dataRange().keyRange();
    AbstractBounds<PartitionPosition> newBounds = bounds.inclusiveRight() ? new Range<>(lastPartitionKey, bounds.right) : new ExcludingBounds<>(lastPartitionKey, bounds.right);
    DataRange newDataRange = cmd.dataRange().forSubRange(newBounds);
    ReplicaPlan.ForRangeRead replicaPlan = ReplicaPlans.forSingleReplicaRead(Keyspace.open(command.metadata().keyspace), cmd.dataRange().keyRange(), source, 1);
    return executeReadCommand(cmd.withUpdatedLimitsAndDataRange(newLimits, newDataRange), ReplicaPlan.shared(replicaPlan));
}
Also used : ReplicaPlan(org.apache.cassandra.locator.ReplicaPlan) PartitionRangeReadCommand(org.apache.cassandra.db.PartitionRangeReadCommand) PartitionPosition(org.apache.cassandra.db.PartitionPosition) DataRange(org.apache.cassandra.db.DataRange) DataLimits(org.apache.cassandra.db.filter.DataLimits)

Example 2 with ReplicaPlan

use of org.apache.cassandra.locator.ReplicaPlan in project cassandra by apache.

the class StorageProxy method doPaxos.

/**
 * Performs the Paxos rounds for a given proposal, retrying when preempted until the timeout.
 *
 * <p>The main 'configurable' of this method is the {@code createUpdateProposal} method: it is called by the method
 * once a ballot has been successfully 'prepared' to generate the update to 'propose' (and commit if the proposal is
 * successful). That method also generates the result that the whole method will return. Note that due to retrying,
 * this method may be called multiple times and does not have to return the same results.
 *
 * @param metadata the table to update with Paxos.
 * @param key the partition updated.
 * @param consistencyForPaxos the serial consistency of the operation (either {@link ConsistencyLevel#SERIAL} or
 *     {@link ConsistencyLevel#LOCAL_SERIAL}).
 * @param consistencyForReplayCommits the consistency for the commit phase of "replayed" in-progress operations.
 * @param consistencyForCommit the consistency for the commit phase of _this_ operation update.
 * @param queryStartNanoTime the nano time for the start of the query this is part of. This is the base time for
 *     timeouts.
 * @param casMetrics the metrics to update for this operation.
 * @param createUpdateProposal method called after a successful 'prepare' phase to obtain 1) the actual update of
 *     this operation and 2) the result that the whole method should return. This can return {@code null} in the
 *     special where, after having "prepared" (and thus potentially replayed in-progress upgdates), we don't want
 *     to propose anything (the whole method then return {@code null}).
 * @return the second element of the pair returned by {@code createUpdateProposal} (for the last call of that method
 *     if that method is called multiple times due to retries).
 */
private static RowIterator doPaxos(TableMetadata metadata, DecoratedKey key, ConsistencyLevel consistencyForPaxos, ConsistencyLevel consistencyForReplayCommits, ConsistencyLevel consistencyForCommit, long queryStartNanoTime, CASClientRequestMetrics casMetrics, Supplier<Pair<PartitionUpdate, RowIterator>> createUpdateProposal) throws UnavailableException, IsBootstrappingException, RequestFailureException, RequestTimeoutException, InvalidRequestException {
    int contentions = 0;
    Keyspace keyspace = Keyspace.open(metadata.keyspace);
    AbstractReplicationStrategy latestRs = keyspace.getReplicationStrategy();
    try {
        consistencyForPaxos.validateForCas();
        consistencyForReplayCommits.validateForCasCommit(latestRs);
        consistencyForCommit.validateForCasCommit(latestRs);
        long timeoutNanos = DatabaseDescriptor.getCasContentionTimeout(NANOSECONDS);
        while (nanoTime() - queryStartNanoTime < timeoutNanos) {
            // for simplicity, we'll do a single liveness check at the start of each attempt
            ReplicaPlan.ForPaxosWrite replicaPlan = ReplicaPlans.forPaxos(keyspace, key, consistencyForPaxos);
            latestRs = replicaPlan.replicationStrategy();
            PaxosBallotAndContention pair = beginAndRepairPaxos(queryStartNanoTime, key, metadata, replicaPlan, consistencyForPaxos, consistencyForReplayCommits, casMetrics);
            final UUID ballot = pair.ballot;
            contentions += pair.contentions;
            Pair<PartitionUpdate, RowIterator> proposalPair = createUpdateProposal.get();
            // See method javadoc: null here is code for "stop here and return null".
            if (proposalPair == null)
                return null;
            Commit proposal = Commit.newProposal(ballot, proposalPair.left);
            Tracing.trace("CAS precondition is met; proposing client-requested updates for {}", ballot);
            if (proposePaxos(proposal, replicaPlan, true, queryStartNanoTime)) {
                // them), this is worth bothering.
                if (!proposal.update.isEmpty())
                    commitPaxos(proposal, consistencyForCommit, true, queryStartNanoTime);
                RowIterator result = proposalPair.right;
                if (result != null)
                    Tracing.trace("CAS did not apply");
                else
                    Tracing.trace("CAS applied successfully");
                return result;
            }
            Tracing.trace("Paxos proposal not accepted (pre-empted by a higher ballot)");
            contentions++;
            Uninterruptibles.sleepUninterruptibly(ThreadLocalRandom.current().nextInt(100), TimeUnit.MILLISECONDS);
        // continue to retry
        }
    } catch (CasWriteTimeoutException e) {
        // Might be thrown by beginRepairAndPaxos. In that case, any contention that happened within the method and
        // led up to the timeout was not accounted in our local 'contentions' variable and we add it now so it the
        // contention recorded in the finally is correct.
        contentions += e.contentions;
        throw e;
    } catch (WriteTimeoutException e) {
        // Might be thrown by proposePaxos or commitPaxos
        throw new CasWriteTimeoutException(e.writeType, e.consistency, e.received, e.blockFor, contentions);
    } finally {
        recordCasContention(metadata, key, casMetrics, contentions);
    }
    throw new CasWriteTimeoutException(WriteType.CAS, consistencyForPaxos, 0, consistencyForPaxos.blockFor(latestRs), contentions);
}
Also used : ReplicaPlan(org.apache.cassandra.locator.ReplicaPlan) Hint(org.apache.cassandra.hints.Hint) CasWriteTimeoutException(org.apache.cassandra.exceptions.CasWriteTimeoutException) WriteTimeoutException(org.apache.cassandra.exceptions.WriteTimeoutException) Keyspace(org.apache.cassandra.db.Keyspace) RowIterator(org.apache.cassandra.db.rows.RowIterator) AbstractReplicationStrategy(org.apache.cassandra.locator.AbstractReplicationStrategy) UUID(java.util.UUID) CasWriteTimeoutException(org.apache.cassandra.exceptions.CasWriteTimeoutException) PartitionUpdate(org.apache.cassandra.db.partitions.PartitionUpdate)

Example 3 with ReplicaPlan

use of org.apache.cassandra.locator.ReplicaPlan in project cassandra by apache.

the class StorageProxy method wrapBatchResponseHandler.

// same as performWrites except does not initiate writes (but does perform availability checks).
private static WriteResponseHandlerWrapper wrapBatchResponseHandler(Mutation mutation, ConsistencyLevel consistencyLevel, ConsistencyLevel batchConsistencyLevel, WriteType writeType, BatchlogResponseHandler.BatchlogCleanup cleanup, long queryStartNanoTime) {
    Keyspace keyspace = Keyspace.open(mutation.getKeyspaceName());
    Token tk = mutation.key().getToken();
    ReplicaPlan.ForTokenWrite replicaPlan = ReplicaPlans.forWrite(keyspace, consistencyLevel, tk, ReplicaPlans.writeNormal);
    AbstractReplicationStrategy rs = replicaPlan.replicationStrategy();
    AbstractWriteResponseHandler<IMutation> writeHandler = rs.getWriteResponseHandler(replicaPlan, null, writeType, queryStartNanoTime);
    BatchlogResponseHandler<IMutation> batchHandler = new BatchlogResponseHandler<>(writeHandler, batchConsistencyLevel.blockFor(rs), cleanup, queryStartNanoTime);
    return new WriteResponseHandlerWrapper(batchHandler, mutation);
}
Also used : ReplicaPlan(org.apache.cassandra.locator.ReplicaPlan) IMutation(org.apache.cassandra.db.IMutation) Keyspace(org.apache.cassandra.db.Keyspace) EndpointsForToken(org.apache.cassandra.locator.EndpointsForToken) Token(org.apache.cassandra.dht.Token) AbstractReplicationStrategy(org.apache.cassandra.locator.AbstractReplicationStrategy)

Example 4 with ReplicaPlan

use of org.apache.cassandra.locator.ReplicaPlan in project cassandra by apache.

the class StorageProxy method wrapViewBatchResponseHandler.

/**
 * Same as performWrites except does not initiate writes (but does perform availability checks).
 * Keeps track of ViewWriteMetrics
 */
private static WriteResponseHandlerWrapper wrapViewBatchResponseHandler(Mutation mutation, ConsistencyLevel consistencyLevel, ConsistencyLevel batchConsistencyLevel, ReplicaLayout.ForTokenWrite liveAndDown, AtomicLong baseComplete, WriteType writeType, BatchlogResponseHandler.BatchlogCleanup cleanup, long queryStartNanoTime) {
    Keyspace keyspace = Keyspace.open(mutation.getKeyspaceName());
    ReplicaPlan.ForTokenWrite replicaPlan = ReplicaPlans.forWrite(keyspace, consistencyLevel, liveAndDown, ReplicaPlans.writeAll);
    AbstractReplicationStrategy replicationStrategy = replicaPlan.replicationStrategy();
    AbstractWriteResponseHandler<IMutation> writeHandler = replicationStrategy.getWriteResponseHandler(replicaPlan, () -> {
        long delay = Math.max(0, currentTimeMillis() - baseComplete.get());
        viewWriteMetrics.viewWriteLatency.update(delay, MILLISECONDS);
    }, writeType, queryStartNanoTime);
    BatchlogResponseHandler<IMutation> batchHandler = new ViewWriteMetricsWrapped(writeHandler, batchConsistencyLevel.blockFor(replicationStrategy), cleanup, queryStartNanoTime);
    return new WriteResponseHandlerWrapper(batchHandler, mutation);
}
Also used : ReplicaPlan(org.apache.cassandra.locator.ReplicaPlan) IMutation(org.apache.cassandra.db.IMutation) Keyspace(org.apache.cassandra.db.Keyspace) AbstractReplicationStrategy(org.apache.cassandra.locator.AbstractReplicationStrategy)

Example 5 with ReplicaPlan

use of org.apache.cassandra.locator.ReplicaPlan in project cassandra by apache.

the class StorageProxy method mutateAtomically.

/**
 * See mutate. Adds additional steps before and after writing a batch.
 * Before writing the batch (but after doing availability check against the FD for the row replicas):
 *      write the entire batch to a batchlog elsewhere in the cluster.
 * After: remove the batchlog entry (after writing hints for the batch rows, if necessary).
 *
 * @param mutations the Mutations to be applied across the replicas
 * @param consistency_level the consistency level for the operation
 * @param requireQuorumForRemove at least a quorum of nodes will see update before deleting batchlog
 * @param queryStartNanoTime the value of nanoTime() when the query started to be processed
 */
public static void mutateAtomically(Collection<Mutation> mutations, ConsistencyLevel consistency_level, boolean requireQuorumForRemove, long queryStartNanoTime) throws UnavailableException, OverloadedException, WriteTimeoutException {
    Tracing.trace("Determining replicas for atomic batch");
    long startTime = nanoTime();
    List<WriteResponseHandlerWrapper> wrappers = new ArrayList<>(mutations.size());
    if (mutations.stream().anyMatch(mutation -> Keyspace.open(mutation.getKeyspaceName()).getReplicationStrategy().hasTransientReplicas()))
        throw new AssertionError("Logged batches are unsupported with transient replication");
    try {
        // If we are requiring quorum nodes for removal, we upgrade consistency level to QUORUM unless we already
        // require ALL, or EACH_QUORUM. This is so that *at least* QUORUM nodes see the update.
        ConsistencyLevel batchConsistencyLevel = requireQuorumForRemove ? ConsistencyLevel.QUORUM : consistency_level;
        switch(consistency_level) {
            case ALL:
            case EACH_QUORUM:
                batchConsistencyLevel = consistency_level;
        }
        ReplicaPlan.ForTokenWrite replicaPlan = ReplicaPlans.forBatchlogWrite(batchConsistencyLevel == ConsistencyLevel.ANY);
        final UUID batchUUID = UUIDGen.getTimeUUID();
        BatchlogCleanup cleanup = new BatchlogCleanup(mutations.size(), () -> asyncRemoveFromBatchlog(replicaPlan, batchUUID));
        // add a handler for each mutation - includes checking availability, but doesn't initiate any writes, yet
        for (Mutation mutation : mutations) {
            if (hasLocalMutation(mutation))
                writeMetrics.localRequests.mark();
            else
                writeMetrics.remoteRequests.mark();
            WriteResponseHandlerWrapper wrapper = wrapBatchResponseHandler(mutation, consistency_level, batchConsistencyLevel, WriteType.BATCH, cleanup, queryStartNanoTime);
            // exit early if we can't fulfill the CL at this time.
            wrappers.add(wrapper);
        }
        // write to the batchlog
        syncWriteToBatchlog(mutations, replicaPlan, batchUUID, queryStartNanoTime);
        // now actually perform the writes and wait for them to complete
        syncWriteBatchedMutations(wrappers, Stage.MUTATION);
    } catch (UnavailableException e) {
        writeMetrics.unavailables.mark();
        writeMetricsForLevel(consistency_level).unavailables.mark();
        Tracing.trace("Unavailable");
        throw e;
    } catch (WriteTimeoutException e) {
        writeMetrics.timeouts.mark();
        writeMetricsForLevel(consistency_level).timeouts.mark();
        Tracing.trace("Write timeout; received {} of {} required replies", e.received, e.blockFor);
        throw e;
    } catch (WriteFailureException e) {
        writeMetrics.failures.mark();
        writeMetricsForLevel(consistency_level).failures.mark();
        Tracing.trace("Write failure; received {} of {} required replies", e.received, e.blockFor);
        throw e;
    } finally {
        long latency = nanoTime() - startTime;
        writeMetrics.addNano(latency);
        writeMetricsForLevel(consistency_level).addNano(latency);
        updateCoordinatorWriteLatencyTableMetric(mutations, latency);
    }
}
Also used : ReplicaPlan(org.apache.cassandra.locator.ReplicaPlan) ArrayList(java.util.ArrayList) UnavailableException(org.apache.cassandra.exceptions.UnavailableException) ConsistencyLevel(org.apache.cassandra.db.ConsistencyLevel) CasWriteTimeoutException(org.apache.cassandra.exceptions.CasWriteTimeoutException) WriteTimeoutException(org.apache.cassandra.exceptions.WriteTimeoutException) WriteFailureException(org.apache.cassandra.exceptions.WriteFailureException) BatchlogCleanup(org.apache.cassandra.service.BatchlogResponseHandler.BatchlogCleanup) Mutation(org.apache.cassandra.db.Mutation) CounterMutation(org.apache.cassandra.db.CounterMutation) IMutation(org.apache.cassandra.db.IMutation) UUID(java.util.UUID)

Aggregations

ReplicaPlan (org.apache.cassandra.locator.ReplicaPlan)14 Keyspace (org.apache.cassandra.db.Keyspace)7 AbstractReplicationStrategy (org.apache.cassandra.locator.AbstractReplicationStrategy)6 EndpointsForToken (org.apache.cassandra.locator.EndpointsForToken)6 IMutation (org.apache.cassandra.db.IMutation)5 Replica (org.apache.cassandra.locator.Replica)5 Token (org.apache.cassandra.dht.Token)4 ArrayList (java.util.ArrayList)3 UUID (java.util.UUID)3 ColumnFamilyStore (org.apache.cassandra.db.ColumnFamilyStore)3 ConsistencyLevel (org.apache.cassandra.db.ConsistencyLevel)3 Mutation (org.apache.cassandra.db.Mutation)3 ReadCommand (org.apache.cassandra.db.ReadCommand)3 UnavailableException (org.apache.cassandra.exceptions.UnavailableException)3 List (java.util.List)2 CounterMutation (org.apache.cassandra.db.CounterMutation)2 DecoratedKey (org.apache.cassandra.db.DecoratedKey)2 PartitionRangeReadCommand (org.apache.cassandra.db.PartitionRangeReadCommand)2 PartitionIterator (org.apache.cassandra.db.partitions.PartitionIterator)2 CasWriteTimeoutException (org.apache.cassandra.exceptions.CasWriteTimeoutException)2