use of com.couchbase.connector.dcp.Checkpoint in project couchbase-elasticsearch-connector by couchbase.
the class ElasticsearchWriter method write.
/**
* Appends the given event to the write buffer.
* Must be followed by a call to {@link #flush}.
* <p>
* The writer assumes ownership of the event (is responsible for releasing it).
*/
public void write(Event event) throws InterruptedException {
// Regarding the order of bulk operations, Elastic Team Member Adrien Grand says:
// "You can rely on the fact that operations on the same document
// (same _index, _type and _id) will be in order. However you can't assume
// anything for documents that have different indices/types/ids."
//
// https://discuss.elastic.co/t/order-of--bulk-request-operations/98124/2
//
// This *might* mean it's perfectly safe for a document to be modified
// more than once in the same Elasticsearch batch, but I'm not sure, especially
// when it comes to retrying individual actions.
//
// Let's say a document is first created and then deleted in the same batch.
// Is it possible for the creation to fail with TOO_MANY_REQUESTS due to a
// full bulk queue, but for the deletion to succeed? If so, after the creation
// is successfully retried, Elasticsearch will be in an inconsistent state;
// the document will exist but it should not.
//
// I do not know whether Elasticsearch guarantees that if an action in a bulk request
// fails with TOO_MANY_REQUESTS, all subsequent actions also fail with that same
// error code. All the documentation I've seen suggest that items in a bulk request
// are completely independent. If you learn otherwise, feel free to banish this
// paranoid code and change the buffer from a Map into a List.
//
// Another possibility would be to use the DCP sequence number as an external version.
// This would prevent earlier versions from overwriting later ones. The only
// problem is that a rollback would *really* screw things up. A rolled back
// document would be stuck in the bad state until being modified with a higher
// seqno than before the rollback. Anyway, let's revisit this if the
// "one action per-document per-batch" strategy is identified as a bottleneck.
final EventDocWriteRequest request = requestFactory.newDocWriteRequest(event);
if (request == null) {
try {
if (LOGGER.isTraceEnabled()) {
LOGGER.trace("Skipping event, no matching type: {}", redactUser(event));
}
if (buffer.isEmpty()) {
// can ignore immediately
final Checkpoint checkpoint = event.getCheckpoint();
if (isMetadata(event)) {
// Avoid cycle where writing the checkpoints triggers another DCP event.
LOGGER.debug("Ignoring metadata, not updating checkpoint for {}", event);
checkpointService.setWithoutMarkingDirty(event.getVbucket(), event.getCheckpoint());
} else {
LOGGER.debug("Ignoring event, immediately updating checkpoint for {}", event);
checkpointService.set(event.getVbucket(), checkpoint);
}
} else {
// ignore later after we've completed a bulk request and saved
ignoreBuffer.put(event.getVbucket(), event.getCheckpoint());
}
return;
} finally {
event.release();
}
}
bufferBytes += request.estimatedSizeInBytes();
// Ensure every (documentID, dest index) pair is unique within a batch.
// Do this *after* skipping unrecognized / ignored events, so that
// an ignored deletion does not evict a previously buffered mutation.
final EventDocWriteRequest evicted = buffer.put(event.getKey() + '\0' + request.index(), request);
if (evicted != null) {
String qualifiedDocId = event.getKey(true);
String evictedQualifiedDocId = evicted.getEvent().getKey(true);
if (!qualifiedDocId.equals(evictedQualifiedDocId)) {
LOGGER.warn("DOCUMENT ID COLLISION DETECTED:" + " Documents '{}' and '{}' are from different collections" + " but have the same destination index '{}'.", qualifiedDocId, evictedQualifiedDocId, request.index());
}
DocumentLifecycle.logSkippedBecauseNewerVersionReceived(evicted.getEvent(), event.getTracingToken());
bufferBytes -= evicted.estimatedSizeInBytes();
evicted.getEvent().release();
}
if (bufferIsFull()) {
flush();
}
}
Aggregations