Search in sources :

Example 1 with PrepareCallback

use of org.apache.cassandra.service.paxos.PrepareCallback in project cassandra by apache.

the class StorageProxy method beginAndRepairPaxos.

/**
     * begin a Paxos session by sending a prepare request and completing any in-progress requests seen in the replies
     *
     * @return the Paxos ballot promised by the replicas if no in-progress requests were seen and a quorum of
     * nodes have seen the mostRecentCommit.  Otherwise, return null.
     */
private static Pair<UUID, Integer> beginAndRepairPaxos(long queryStartNanoTime, DecoratedKey key, TableMetadata metadata, List<InetAddress> liveEndpoints, int requiredParticipants, ConsistencyLevel consistencyForPaxos, ConsistencyLevel consistencyForCommit, final boolean isWrite, ClientState state) throws WriteTimeoutException, WriteFailureException {
    long timeout = TimeUnit.MILLISECONDS.toNanos(DatabaseDescriptor.getCasContentionTimeout());
    PrepareCallback summary = null;
    int contentions = 0;
    while (System.nanoTime() - queryStartNanoTime < timeout) {
        // We want a timestamp that is guaranteed to be unique for that node (so that the ballot is globally unique), but if we've got a prepare rejected
        // already we also want to make sure we pick a timestamp that has a chance to be promised, i.e. one that is greater that the most recently known
        // in progress (#5667). Lastly, we don't want to use a timestamp that is older than the last one assigned by ClientState or operations may appear
        // out-of-order (#7801).
        long minTimestampMicrosToUse = summary == null ? Long.MIN_VALUE : 1 + UUIDGen.microsTimestamp(summary.mostRecentInProgressCommit.ballot);
        long ballotMicros = state.getTimestampForPaxos(minTimestampMicrosToUse);
        // Note that ballotMicros is not guaranteed to be unique if two proposal are being handled concurrently by the same coordinator. But we still
        // need ballots to be unique for each proposal so we have to use getRandomTimeUUIDFromMicros.
        UUID ballot = UUIDGen.getRandomTimeUUIDFromMicros(ballotMicros);
        // prepare
        Tracing.trace("Preparing {}", ballot);
        Commit toPrepare = Commit.newPrepare(key, metadata, ballot);
        summary = preparePaxos(toPrepare, liveEndpoints, requiredParticipants, consistencyForPaxos, queryStartNanoTime);
        if (!summary.promised) {
            Tracing.trace("Some replicas have already promised a higher ballot than ours; aborting");
            contentions++;
            // sleep a random amount to give the other proposer a chance to finish
            Uninterruptibles.sleepUninterruptibly(ThreadLocalRandom.current().nextInt(100), TimeUnit.MILLISECONDS);
            continue;
        }
        Commit inProgress = summary.mostRecentInProgressCommitWithUpdate;
        Commit mostRecent = summary.mostRecentCommit;
        // needs to be completed, so do it.
        if (!inProgress.update.isEmpty() && inProgress.isAfter(mostRecent)) {
            Tracing.trace("Finishing incomplete paxos round {}", inProgress);
            if (isWrite)
                casWriteMetrics.unfinishedCommit.inc();
            else
                casReadMetrics.unfinishedCommit.inc();
            Commit refreshedInProgress = Commit.newProposal(ballot, inProgress.update);
            if (proposePaxos(refreshedInProgress, liveEndpoints, requiredParticipants, false, consistencyForPaxos, queryStartNanoTime)) {
                try {
                    commitPaxos(refreshedInProgress, consistencyForCommit, false, queryStartNanoTime);
                } catch (WriteTimeoutException e) {
                    recordCasContention(contentions);
                    // We're still doing preparation for the paxos rounds, so we want to use the CAS (see CASSANDRA-8672)
                    throw new WriteTimeoutException(WriteType.CAS, e.consistency, e.received, e.blockFor);
                }
            } else {
                Tracing.trace("Some replicas have already promised a higher ballot than ours; aborting");
                // sleep a random amount to give the other proposer a chance to finish
                contentions++;
                Uninterruptibles.sleepUninterruptibly(ThreadLocalRandom.current().nextInt(100), TimeUnit.MILLISECONDS);
            }
            continue;
        }
        // To be able to propose our value on a new round, we need a quorum of replica to have learn the previous one. Why is explained at:
        // https://issues.apache.org/jira/browse/CASSANDRA-5062?focusedCommentId=13619810&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13619810)
        // Since we waited for quorum nodes, if some of them haven't seen the last commit (which may just be a timing issue, but may also
        // mean we lost messages), we pro-actively "repair" those nodes, and retry.
        int nowInSec = Ints.checkedCast(TimeUnit.MICROSECONDS.toSeconds(ballotMicros));
        Iterable<InetAddress> missingMRC = summary.replicasMissingMostRecentCommit(metadata, nowInSec);
        if (Iterables.size(missingMRC) > 0) {
            Tracing.trace("Repairing replicas that missed the most recent commit");
            sendCommit(mostRecent, missingMRC);
            // latter ticket, we can pass CL.ALL to the commit above and remove the 'continue'.
            continue;
        }
        return Pair.create(ballot, contentions);
    }
    recordCasContention(contentions);
    throw new WriteTimeoutException(WriteType.CAS, consistencyForPaxos, 0, consistencyForPaxos.blockFor(Keyspace.open(metadata.keyspace)));
}
Also used : Commit(org.apache.cassandra.service.paxos.Commit) PrepareCallback(org.apache.cassandra.service.paxos.PrepareCallback) InetAddress(java.net.InetAddress) Hint(org.apache.cassandra.hints.Hint)

Example 2 with PrepareCallback

use of org.apache.cassandra.service.paxos.PrepareCallback in project cassandra by apache.

the class StorageProxy method preparePaxos.

private static PrepareCallback preparePaxos(Commit toPrepare, List<InetAddress> endpoints, int requiredParticipants, ConsistencyLevel consistencyForPaxos, long queryStartNanoTime) throws WriteTimeoutException {
    PrepareCallback callback = new PrepareCallback(toPrepare.update.partitionKey(), toPrepare.update.metadata(), requiredParticipants, consistencyForPaxos, queryStartNanoTime);
    MessageOut<Commit> message = new MessageOut<Commit>(MessagingService.Verb.PAXOS_PREPARE, toPrepare, Commit.serializer);
    for (InetAddress target : endpoints) MessagingService.instance().sendRR(message, target, callback);
    callback.await();
    return callback;
}
Also used : Commit(org.apache.cassandra.service.paxos.Commit) PrepareCallback(org.apache.cassandra.service.paxos.PrepareCallback) InetAddress(java.net.InetAddress)

Aggregations

InetAddress (java.net.InetAddress)2 Commit (org.apache.cassandra.service.paxos.Commit)2 PrepareCallback (org.apache.cassandra.service.paxos.PrepareCallback)2 Hint (org.apache.cassandra.hints.Hint)1