Search in sources :

Example 6 with Checkpoint

use of com.couchbase.connector.dcp.Checkpoint in project couchbase-elasticsearch-connector by couchbase.

the class ElasticsearchWriter method write.

/**
 * Appends the given event to the write buffer.
 * Must be followed by a call to {@link #flush}.
 * <p>
 * The writer assumes ownership of the event (is responsible for releasing it).
 */
public void write(Event event) throws InterruptedException {
    // Regarding the order of bulk operations, Elastic Team Member Adrien Grand says:
    // "You can rely on the fact that operations on the same document
    // (same _index, _type and _id) will be in order. However you can't assume
    // anything for documents that have different indices/types/ids."
    // 
    // https://discuss.elastic.co/t/order-of--bulk-request-operations/98124/2
    // 
    // This *might* mean it's perfectly safe for a document to be modified
    // more than once in the same Elasticsearch batch, but I'm not sure, especially
    // when it comes to retrying individual actions.
    // 
    // Let's say a document is first created and then deleted in the same batch.
    // Is it possible for the creation to fail with TOO_MANY_REQUESTS due to a
    // full bulk queue, but for the deletion to succeed? If so, after the creation
    // is successfully retried, Elasticsearch will be in an inconsistent state;
    // the document will exist but it should not.
    // 
    // I do not know whether Elasticsearch guarantees that if an action in a bulk request
    // fails with TOO_MANY_REQUESTS, all subsequent actions also fail with that same
    // error code. All the documentation I've seen suggest that items in a bulk request
    // are completely independent. If you learn otherwise, feel free to banish this
    // paranoid code and change the buffer from a Map into a List.
    // 
    // Another possibility would be to use the DCP sequence number as an external version.
    // This would prevent earlier versions from overwriting later ones. The only
    // problem is that a rollback would *really* screw things up. A rolled back
    // document would be stuck in the bad state until being modified with a higher
    // seqno than before the rollback. Anyway, let's revisit this if the
    // "one action per-document per-batch" strategy is identified as a bottleneck.
    final EventDocWriteRequest request = requestFactory.newDocWriteRequest(event);
    if (request == null) {
        try {
            if (LOGGER.isTraceEnabled()) {
                LOGGER.trace("Skipping event, no matching type: {}", redactUser(event));
            }
            if (buffer.isEmpty()) {
                // can ignore immediately
                final Checkpoint checkpoint = event.getCheckpoint();
                if (isMetadata(event)) {
                    // Avoid cycle where writing the checkpoints triggers another DCP event.
                    LOGGER.debug("Ignoring metadata, not updating checkpoint for {}", event);
                    checkpointService.setWithoutMarkingDirty(event.getVbucket(), event.getCheckpoint());
                } else {
                    LOGGER.debug("Ignoring event, immediately updating checkpoint for {}", event);
                    checkpointService.set(event.getVbucket(), checkpoint);
                }
            } else {
                // ignore later after we've completed a bulk request and saved
                ignoreBuffer.put(event.getVbucket(), event.getCheckpoint());
            }
            return;
        } finally {
            event.release();
        }
    }
    bufferBytes += request.estimatedSizeInBytes();
    // Ensure every (documentID, dest index) pair is unique within a batch.
    // Do this *after* skipping unrecognized / ignored events, so that
    // an ignored deletion does not evict a previously buffered mutation.
    final EventDocWriteRequest evicted = buffer.put(event.getKey() + '\0' + request.index(), request);
    if (evicted != null) {
        String qualifiedDocId = event.getKey(true);
        String evictedQualifiedDocId = evicted.getEvent().getKey(true);
        if (!qualifiedDocId.equals(evictedQualifiedDocId)) {
            LOGGER.warn("DOCUMENT ID COLLISION DETECTED:" + " Documents '{}' and '{}' are from different collections" + " but have the same destination index '{}'.", qualifiedDocId, evictedQualifiedDocId, request.index());
        }
        DocumentLifecycle.logSkippedBecauseNewerVersionReceived(evicted.getEvent(), event.getTracingToken());
        bufferBytes -= evicted.estimatedSizeInBytes();
        evicted.getEvent().release();
    }
    if (bufferIsFull()) {
        flush();
    }
}
Also used : Checkpoint(com.couchbase.connector.dcp.Checkpoint)

Aggregations

Checkpoint (com.couchbase.connector.dcp.Checkpoint)6 Bucket (com.couchbase.client.java.Bucket)3 CheckpointDao (com.couchbase.connector.dcp.CheckpointDao)3 CouchbaseCheckpointDao (com.couchbase.connector.dcp.CouchbaseCheckpointDao)3 ResolvedBucketConfig (com.couchbase.connector.dcp.ResolvedBucketConfig)3 ObjectMapper (com.fasterxml.jackson.databind.ObjectMapper)2 HashMap (java.util.HashMap)2 LinkedHashMap (java.util.LinkedHashMap)2 Map (java.util.Map)2 SeedNode (com.couchbase.client.core.env.SeedNode)1 RedactableArgument.redactUser (com.couchbase.client.core.logging.RedactableArgument.redactUser)1 Client (com.couchbase.client.dcp.Client)1 PartitionState (com.couchbase.client.dcp.state.PartitionState)1 SessionState (com.couchbase.client.dcp.state.SessionState)1 Cluster (com.couchbase.client.java.Cluster)1 Collection (com.couchbase.client.java.Collection)1 ClusterEnvironment (com.couchbase.client.java.env.ClusterEnvironment)1 BulkRequestConfig (com.couchbase.connector.config.es.BulkRequestConfig)1 CheckpointService (com.couchbase.connector.dcp.CheckpointService)1 CouchbaseHelper.createCluster (com.couchbase.connector.dcp.CouchbaseHelper.createCluster)1