Search in sources :

Example 1 with CasWriteTimeoutException

use of org.apache.cassandra.exceptions.CasWriteTimeoutException in project cassandra by apache.

the class StorageProxy method doPaxos.

/**
 * Performs the Paxos rounds for a given proposal, retrying when preempted until the timeout.
 *
 * <p>The main 'configurable' of this method is the {@code createUpdateProposal} method: it is called by the method
 * once a ballot has been successfully 'prepared' to generate the update to 'propose' (and commit if the proposal is
 * successful). That method also generates the result that the whole method will return. Note that due to retrying,
 * this method may be called multiple times and does not have to return the same results.
 *
 * @param metadata the table to update with Paxos.
 * @param key the partition updated.
 * @param consistencyForPaxos the serial consistency of the operation (either {@link ConsistencyLevel#SERIAL} or
 *     {@link ConsistencyLevel#LOCAL_SERIAL}).
 * @param consistencyForReplayCommits the consistency for the commit phase of "replayed" in-progress operations.
 * @param consistencyForCommit the consistency for the commit phase of _this_ operation update.
 * @param queryStartNanoTime the nano time for the start of the query this is part of. This is the base time for
 *     timeouts.
 * @param casMetrics the metrics to update for this operation.
 * @param createUpdateProposal method called after a successful 'prepare' phase to obtain 1) the actual update of
 *     this operation and 2) the result that the whole method should return. This can return {@code null} in the
 *     special where, after having "prepared" (and thus potentially replayed in-progress upgdates), we don't want
 *     to propose anything (the whole method then return {@code null}).
 * @return the second element of the pair returned by {@code createUpdateProposal} (for the last call of that method
 *     if that method is called multiple times due to retries).
 */
private static RowIterator doPaxos(TableMetadata metadata, DecoratedKey key, ConsistencyLevel consistencyForPaxos, ConsistencyLevel consistencyForReplayCommits, ConsistencyLevel consistencyForCommit, long queryStartNanoTime, CASClientRequestMetrics casMetrics, Supplier<Pair<PartitionUpdate, RowIterator>> createUpdateProposal) throws UnavailableException, IsBootstrappingException, RequestFailureException, RequestTimeoutException, InvalidRequestException {
    int contentions = 0;
    Keyspace keyspace = Keyspace.open(metadata.keyspace);
    AbstractReplicationStrategy latestRs = keyspace.getReplicationStrategy();
    try {
        consistencyForPaxos.validateForCas();
        consistencyForReplayCommits.validateForCasCommit(latestRs);
        consistencyForCommit.validateForCasCommit(latestRs);
        long timeoutNanos = DatabaseDescriptor.getCasContentionTimeout(NANOSECONDS);
        while (nanoTime() - queryStartNanoTime < timeoutNanos) {
            // for simplicity, we'll do a single liveness check at the start of each attempt
            ReplicaPlan.ForPaxosWrite replicaPlan = ReplicaPlans.forPaxos(keyspace, key, consistencyForPaxos);
            latestRs = replicaPlan.replicationStrategy();
            PaxosBallotAndContention pair = beginAndRepairPaxos(queryStartNanoTime, key, metadata, replicaPlan, consistencyForPaxos, consistencyForReplayCommits, casMetrics);
            final UUID ballot = pair.ballot;
            contentions += pair.contentions;
            Pair<PartitionUpdate, RowIterator> proposalPair = createUpdateProposal.get();
            // See method javadoc: null here is code for "stop here and return null".
            if (proposalPair == null)
                return null;
            Commit proposal = Commit.newProposal(ballot, proposalPair.left);
            Tracing.trace("CAS precondition is met; proposing client-requested updates for {}", ballot);
            if (proposePaxos(proposal, replicaPlan, true, queryStartNanoTime)) {
                // them), this is worth bothering.
                if (!proposal.update.isEmpty())
                    commitPaxos(proposal, consistencyForCommit, true, queryStartNanoTime);
                RowIterator result = proposalPair.right;
                if (result != null)
                    Tracing.trace("CAS did not apply");
                else
                    Tracing.trace("CAS applied successfully");
                return result;
            }
            Tracing.trace("Paxos proposal not accepted (pre-empted by a higher ballot)");
            contentions++;
            Uninterruptibles.sleepUninterruptibly(ThreadLocalRandom.current().nextInt(100), TimeUnit.MILLISECONDS);
        // continue to retry
        }
    } catch (CasWriteTimeoutException e) {
        // Might be thrown by beginRepairAndPaxos. In that case, any contention that happened within the method and
        // led up to the timeout was not accounted in our local 'contentions' variable and we add it now so it the
        // contention recorded in the finally is correct.
        contentions += e.contentions;
        throw e;
    } catch (WriteTimeoutException e) {
        // Might be thrown by proposePaxos or commitPaxos
        throw new CasWriteTimeoutException(e.writeType, e.consistency, e.received, e.blockFor, contentions);
    } finally {
        recordCasContention(metadata, key, casMetrics, contentions);
    }
    throw new CasWriteTimeoutException(WriteType.CAS, consistencyForPaxos, 0, consistencyForPaxos.blockFor(latestRs), contentions);
}
Also used : ReplicaPlan(org.apache.cassandra.locator.ReplicaPlan) Hint(org.apache.cassandra.hints.Hint) CasWriteTimeoutException(org.apache.cassandra.exceptions.CasWriteTimeoutException) WriteTimeoutException(org.apache.cassandra.exceptions.WriteTimeoutException) Keyspace(org.apache.cassandra.db.Keyspace) RowIterator(org.apache.cassandra.db.rows.RowIterator) AbstractReplicationStrategy(org.apache.cassandra.locator.AbstractReplicationStrategy) UUID(java.util.UUID) CasWriteTimeoutException(org.apache.cassandra.exceptions.CasWriteTimeoutException) PartitionUpdate(org.apache.cassandra.db.partitions.PartitionUpdate)

Example 2 with CasWriteTimeoutException

use of org.apache.cassandra.exceptions.CasWriteTimeoutException in project cassandra by apache.

the class StorageProxy method cas.

/**
 * Apply @param updates if and only if the current values in the row for @param key
 * match the provided @param conditions.  The algorithm is "raw" Paxos: that is, Paxos
 * minus leader election -- any node in the cluster may propose changes for any row,
 * which (that is, the row) is the unit of values being proposed, not single columns.
 *
 * The Paxos cohort is only the replicas for the given key, not the entire cluster.
 * So we expect performance to be reasonable, but CAS is still intended to be used
 * "when you really need it," not for all your updates.
 *
 * There are three phases to Paxos:
 *  1. Prepare: the coordinator generates a ballot (timeUUID in our case) and asks replicas to (a) promise
 *     not to accept updates from older ballots and (b) tell us about the most recent update it has already
 *     accepted.
 *  2. Accept: if a majority of replicas respond, the coordinator asks replicas to accept the value of the
 *     highest proposal ballot it heard about, or a new value if no in-progress proposals were reported.
 *  3. Commit (Learn): if a majority of replicas acknowledge the accept request, we can commit the new
 *     value.
 *
 *  Commit procedure is not covered in "Paxos Made Simple," and only briefly mentioned in "Paxos Made Live,"
 *  so here is our approach:
 *   3a. The coordinator sends a commit message to all replicas with the ballot and value.
 *   3b. Because of 1-2, this will be the highest-seen commit ballot.  The replicas will note that,
 *       and send it with subsequent promise replies.  This allows us to discard acceptance records
 *       for successfully committed replicas, without allowing incomplete proposals to commit erroneously
 *       later on.
 *
 *  Note that since we are performing a CAS rather than a simple update, we perform a read (of committed
 *  values) between the prepare and accept phases.  This gives us a slightly longer window for another
 *  coordinator to come along and trump our own promise with a newer one but is otherwise safe.
 *
 * @param keyspaceName the keyspace for the CAS
 * @param cfName the column family for the CAS
 * @param key the row key for the row to CAS
 * @param request the conditions for the CAS to apply as well as the update to perform if the conditions hold.
 * @param consistencyForPaxos the consistency for the paxos prepare and propose round. This can only be either SERIAL or LOCAL_SERIAL.
 * @param consistencyForCommit the consistency for write done during the commit phase. This can be anything, except SERIAL or LOCAL_SERIAL.
 *
 * @return null if the operation succeeds in updating the row, or the current values corresponding to conditions.
 * (since, if the CAS doesn't succeed, it means the current value do not match the conditions).
 */
public static RowIterator cas(String keyspaceName, String cfName, DecoratedKey key, CASRequest request, ConsistencyLevel consistencyForPaxos, ConsistencyLevel consistencyForCommit, ClientState state, int nowInSeconds, long queryStartNanoTime) throws UnavailableException, IsBootstrappingException, RequestFailureException, RequestTimeoutException, InvalidRequestException, CasWriteUnknownResultException {
    final long startTimeForMetrics = nanoTime();
    try {
        TableMetadata metadata = Schema.instance.validateTable(keyspaceName, cfName);
        if (DatabaseDescriptor.getPartitionDenylistEnabled() && DatabaseDescriptor.getDenylistWritesEnabled() && !partitionDenylist.isKeyPermitted(keyspaceName, cfName, key.getKey())) {
            denylistMetrics.incrementWritesRejected();
            throw new InvalidRequestException(String.format("Unable to CAS write to denylisted partition [0x%s] in %s/%s", key.toString(), keyspaceName, cfName));
        }
        Supplier<Pair<PartitionUpdate, RowIterator>> updateProposer = () -> {
            // read the current values and check they validate the conditions
            Tracing.trace("Reading existing values for CAS precondition");
            SinglePartitionReadCommand readCommand = (SinglePartitionReadCommand) request.readCommand(nowInSeconds);
            ConsistencyLevel readConsistency = consistencyForPaxos == ConsistencyLevel.LOCAL_SERIAL ? ConsistencyLevel.LOCAL_QUORUM : ConsistencyLevel.QUORUM;
            FilteredPartition current;
            try (RowIterator rowIter = readOne(readCommand, readConsistency, queryStartNanoTime)) {
                current = FilteredPartition.create(rowIter);
            }
            if (!request.appliesTo(current)) {
                Tracing.trace("CAS precondition does not match current values {}", current);
                casWriteMetrics.conditionNotMet.inc();
                return Pair.create(PartitionUpdate.emptyUpdate(metadata, key), current.rowIterator());
            }
            // Create the desired updates
            PartitionUpdate updates = request.makeUpdates(current, state);
            long size = updates.dataSize();
            casWriteMetrics.mutationSize.update(size);
            writeMetricsForLevel(consistencyForPaxos).mutationSize.update(size);
            // Apply triggers to cas updates. A consideration here is that
            // triggers emit Mutations, and so a given trigger implementation
            // may generate mutations for partitions other than the one this
            // paxos round is scoped for. In this case, TriggerExecutor will
            // validate that the generated mutations are targetted at the same
            // partition as the initial updates and reject (via an
            // InvalidRequestException) any which aren't.
            updates = TriggerExecutor.instance.execute(updates);
            return Pair.create(updates, null);
        };
        return doPaxos(metadata, key, consistencyForPaxos, consistencyForCommit, consistencyForCommit, queryStartNanoTime, casWriteMetrics, updateProposer);
    } catch (CasWriteUnknownResultException e) {
        casWriteMetrics.unknownResult.mark();
        throw e;
    } catch (CasWriteTimeoutException wte) {
        casWriteMetrics.timeouts.mark();
        writeMetricsForLevel(consistencyForPaxos).timeouts.mark();
        throw new CasWriteTimeoutException(wte.writeType, wte.consistency, wte.received, wte.blockFor, wte.contentions);
    } catch (ReadTimeoutException e) {
        casWriteMetrics.timeouts.mark();
        writeMetricsForLevel(consistencyForPaxos).timeouts.mark();
        throw e;
    } catch (ReadAbortException e) {
        casWriteMetrics.markAbort(e);
        writeMetricsForLevel(consistencyForPaxos).markAbort(e);
        throw e;
    } catch (WriteFailureException | ReadFailureException e) {
        casWriteMetrics.failures.mark();
        writeMetricsForLevel(consistencyForPaxos).failures.mark();
        throw e;
    } catch (UnavailableException e) {
        casWriteMetrics.unavailables.mark();
        writeMetricsForLevel(consistencyForPaxos).unavailables.mark();
        throw e;
    } finally {
        final long latency = nanoTime() - startTimeForMetrics;
        casWriteMetrics.addNano(latency);
        writeMetricsForLevel(consistencyForPaxos).addNano(latency);
    }
}
Also used : TableMetadata(org.apache.cassandra.schema.TableMetadata) ReadFailureException(org.apache.cassandra.exceptions.ReadFailureException) ReadTimeoutException(org.apache.cassandra.exceptions.ReadTimeoutException) SinglePartitionReadCommand(org.apache.cassandra.db.SinglePartitionReadCommand) UnavailableException(org.apache.cassandra.exceptions.UnavailableException) FilteredPartition(org.apache.cassandra.db.partitions.FilteredPartition) ReadAbortException(org.apache.cassandra.exceptions.ReadAbortException) CasWriteUnknownResultException(org.apache.cassandra.exceptions.CasWriteUnknownResultException) ConsistencyLevel(org.apache.cassandra.db.ConsistencyLevel) WriteFailureException(org.apache.cassandra.exceptions.WriteFailureException) RowIterator(org.apache.cassandra.db.rows.RowIterator) InvalidRequestException(org.apache.cassandra.exceptions.InvalidRequestException) CasWriteTimeoutException(org.apache.cassandra.exceptions.CasWriteTimeoutException) PartitionUpdate(org.apache.cassandra.db.partitions.PartitionUpdate) Pair(org.apache.cassandra.utils.Pair)

Example 3 with CasWriteTimeoutException

use of org.apache.cassandra.exceptions.CasWriteTimeoutException in project cassandra by apache.

the class ErrorMessageTest method testV4CasWriteTimeoutSerDeser.

@Test
public void testV4CasWriteTimeoutSerDeser() {
    int contentions = 1;
    int receivedBlockFor = 3;
    ConsistencyLevel consistencyLevel = ConsistencyLevel.SERIAL;
    CasWriteTimeoutException ex = new CasWriteTimeoutException(WriteType.CAS, consistencyLevel, receivedBlockFor, receivedBlockFor, contentions);
    ErrorMessage deserialized = encodeThenDecode(ErrorMessage.fromException(ex), ProtocolVersion.V4);
    assertTrue(deserialized.error instanceof WriteTimeoutException);
    assertFalse(deserialized.error instanceof CasWriteTimeoutException);
    WriteTimeoutException deserializedEx = (WriteTimeoutException) deserialized.error;
    assertEquals(WriteType.CAS, deserializedEx.writeType);
    assertEquals(consistencyLevel, deserializedEx.consistency);
    assertEquals(receivedBlockFor, deserializedEx.received);
    assertEquals(receivedBlockFor, deserializedEx.blockFor);
}
Also used : ConsistencyLevel(org.apache.cassandra.db.ConsistencyLevel) CasWriteTimeoutException(org.apache.cassandra.exceptions.CasWriteTimeoutException) WriteTimeoutException(org.apache.cassandra.exceptions.WriteTimeoutException) ErrorMessage(org.apache.cassandra.transport.messages.ErrorMessage) CasWriteTimeoutException(org.apache.cassandra.exceptions.CasWriteTimeoutException) Test(org.junit.Test)

Example 4 with CasWriteTimeoutException

use of org.apache.cassandra.exceptions.CasWriteTimeoutException in project cassandra by apache.

the class StorageProxy method beginAndRepairPaxos.

/**
 * begin a Paxos session by sending a prepare request and completing any in-progress requests seen in the replies
 *
 * @return the Paxos ballot promised by the replicas if no in-progress requests were seen and a quorum of
 * nodes have seen the mostRecentCommit.  Otherwise, return null.
 */
private static PaxosBallotAndContention beginAndRepairPaxos(long queryStartNanoTime, DecoratedKey key, TableMetadata metadata, ReplicaPlan.ForPaxosWrite paxosPlan, ConsistencyLevel consistencyForPaxos, ConsistencyLevel consistencyForCommit, CASClientRequestMetrics casMetrics) throws WriteTimeoutException, WriteFailureException {
    long timeoutNanos = DatabaseDescriptor.getCasContentionTimeout(NANOSECONDS);
    PrepareCallback summary = null;
    int contentions = 0;
    while (nanoTime() - queryStartNanoTime < timeoutNanos) {
        // We want a timestamp that is guaranteed to be unique for that node (so that the ballot is globally unique), but if we've got a prepare rejected
        // already we also want to make sure we pick a timestamp that has a chance to be promised, i.e. one that is greater that the most recently known
        // in progress (#5667). Lastly, we don't want to use a timestamp that is older than the last one assigned by ClientState or operations may appear
        // out-of-order (#7801).
        long minTimestampMicrosToUse = summary == null ? Long.MIN_VALUE : 1 + UUIDGen.microsTimestamp(summary.mostRecentInProgressCommit.ballot);
        long ballotMicros = nextBallotTimestampMicros(minTimestampMicrosToUse);
        // Note that ballotMicros is not guaranteed to be unique if two proposal are being handled concurrently by the same coordinator. But we still
        // need ballots to be unique for each proposal so we have to use getRandomTimeUUIDFromMicros.
        UUID ballot = randomBallot(ballotMicros, consistencyForPaxos == SERIAL);
        // prepare
        try {
            Tracing.trace("Preparing {}", ballot);
            Commit toPrepare = Commit.newPrepare(key, metadata, ballot);
            summary = preparePaxos(toPrepare, paxosPlan, queryStartNanoTime);
            if (!summary.promised) {
                Tracing.trace("Some replicas have already promised a higher ballot than ours; aborting");
                contentions++;
                // sleep a random amount to give the other proposer a chance to finish
                Uninterruptibles.sleepUninterruptibly(ThreadLocalRandom.current().nextInt(100), MILLISECONDS);
                continue;
            }
            Commit inProgress = summary.mostRecentInProgressCommit;
            Commit mostRecent = summary.mostRecentCommit;
            // doing is more efficient, so we do so.
            if (!inProgress.update.isEmpty() && inProgress.isAfter(mostRecent)) {
                Tracing.trace("Finishing incomplete paxos round {}", inProgress);
                casMetrics.unfinishedCommit.inc();
                Commit refreshedInProgress = Commit.newProposal(ballot, inProgress.update);
                if (proposePaxos(refreshedInProgress, paxosPlan, false, queryStartNanoTime)) {
                    commitPaxos(refreshedInProgress, consistencyForCommit, false, queryStartNanoTime);
                } else {
                    Tracing.trace("Some replicas have already promised a higher ballot than ours; aborting");
                    // sleep a random amount to give the other proposer a chance to finish
                    contentions++;
                    Uninterruptibles.sleepUninterruptibly(ThreadLocalRandom.current().nextInt(100), MILLISECONDS);
                }
                continue;
            }
            // To be able to propose our value on a new round, we need a quorum of replica to have learn the previous one. Why is explained at:
            // https://issues.apache.org/jira/browse/CASSANDRA-5062?focusedCommentId=13619810&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13619810)
            // Since we waited for quorum nodes, if some of them haven't seen the last commit (which may just be a timing issue, but may also
            // mean we lost messages), we pro-actively "repair" those nodes, and retry.
            int nowInSec = Ints.checkedCast(TimeUnit.MICROSECONDS.toSeconds(ballotMicros));
            Iterable<InetAddressAndPort> missingMRC = summary.replicasMissingMostRecentCommit(metadata, nowInSec);
            if (Iterables.size(missingMRC) > 0) {
                Tracing.trace("Repairing replicas that missed the most recent commit");
                sendCommit(mostRecent, missingMRC);
                // latter ticket, we can pass CL.ALL to the commit above and remove the 'continue'.
                continue;
            }
            return new PaxosBallotAndContention(ballot, contentions);
        } catch (WriteTimeoutException e) {
            // We're still doing preparation for the paxos rounds, so we want to use the CAS (see CASSANDRA-8672)
            throw new CasWriteTimeoutException(WriteType.CAS, e.consistency, e.received, e.blockFor, contentions);
        }
    }
    throw new CasWriteTimeoutException(WriteType.CAS, consistencyForPaxos, 0, consistencyForPaxos.blockFor(paxosPlan.replicationStrategy()), contentions);
}
Also used : InetAddressAndPort(org.apache.cassandra.locator.InetAddressAndPort) CasWriteTimeoutException(org.apache.cassandra.exceptions.CasWriteTimeoutException) WriteTimeoutException(org.apache.cassandra.exceptions.WriteTimeoutException) UUID(java.util.UUID) CasWriteTimeoutException(org.apache.cassandra.exceptions.CasWriteTimeoutException) Hint(org.apache.cassandra.hints.Hint)

Example 5 with CasWriteTimeoutException

use of org.apache.cassandra.exceptions.CasWriteTimeoutException in project cassandra by apache.

the class ErrorMessageTest method testV5CasWriteTimeoutSerDeser.

@Test
public void testV5CasWriteTimeoutSerDeser() {
    int contentions = 1;
    int receivedBlockFor = 3;
    ConsistencyLevel consistencyLevel = ConsistencyLevel.SERIAL;
    CasWriteTimeoutException ex = new CasWriteTimeoutException(WriteType.CAS, consistencyLevel, receivedBlockFor, receivedBlockFor, contentions);
    ErrorMessage deserialized = encodeThenDecode(ErrorMessage.fromException(ex), ProtocolVersion.V5);
    assertTrue(deserialized.error instanceof CasWriteTimeoutException);
    CasWriteTimeoutException deserializedEx = (CasWriteTimeoutException) deserialized.error;
    assertEquals(WriteType.CAS, deserializedEx.writeType);
    assertEquals(contentions, deserializedEx.contentions);
    assertEquals(consistencyLevel, deserializedEx.consistency);
    assertEquals(receivedBlockFor, deserializedEx.received);
    assertEquals(receivedBlockFor, deserializedEx.blockFor);
    assertEquals(ex.getMessage(), deserializedEx.getMessage());
    assertTrue(deserializedEx.getMessage().contains("CAS operation timed out - encountered contentions"));
}
Also used : ConsistencyLevel(org.apache.cassandra.db.ConsistencyLevel) ErrorMessage(org.apache.cassandra.transport.messages.ErrorMessage) CasWriteTimeoutException(org.apache.cassandra.exceptions.CasWriteTimeoutException) Test(org.junit.Test)

Aggregations

CasWriteTimeoutException (org.apache.cassandra.exceptions.CasWriteTimeoutException)5 ConsistencyLevel (org.apache.cassandra.db.ConsistencyLevel)3 WriteTimeoutException (org.apache.cassandra.exceptions.WriteTimeoutException)3 UUID (java.util.UUID)2 PartitionUpdate (org.apache.cassandra.db.partitions.PartitionUpdate)2 RowIterator (org.apache.cassandra.db.rows.RowIterator)2 Hint (org.apache.cassandra.hints.Hint)2 ErrorMessage (org.apache.cassandra.transport.messages.ErrorMessage)2 Test (org.junit.Test)2 Keyspace (org.apache.cassandra.db.Keyspace)1 SinglePartitionReadCommand (org.apache.cassandra.db.SinglePartitionReadCommand)1 FilteredPartition (org.apache.cassandra.db.partitions.FilteredPartition)1 CasWriteUnknownResultException (org.apache.cassandra.exceptions.CasWriteUnknownResultException)1 InvalidRequestException (org.apache.cassandra.exceptions.InvalidRequestException)1 ReadAbortException (org.apache.cassandra.exceptions.ReadAbortException)1 ReadFailureException (org.apache.cassandra.exceptions.ReadFailureException)1 ReadTimeoutException (org.apache.cassandra.exceptions.ReadTimeoutException)1 UnavailableException (org.apache.cassandra.exceptions.UnavailableException)1 WriteFailureException (org.apache.cassandra.exceptions.WriteFailureException)1 AbstractReplicationStrategy (org.apache.cassandra.locator.AbstractReplicationStrategy)1