use of org.apache.beam.sdk.transforms.DoFn.ProcessContinuation in project beam by apache.
the class QueryChangeStreamAction method run.
* This method will dispatch a change stream query for the given partition, it delegate the
* processing of the records to one of the corresponding action classes registered and it will
* keep the state of the partition up to date in the Connector's metadata table.
* <p>The algorithm is as follows:
* <ol>
* <li>A change stream query for the partition is performed.
* <li>For each record, we check the type of the record and dispatch the processing to one of
* the actions registered.
* <li>If an {@link Optional} with a {@link ProcessContinuation#stop()} is returned from the
* actions, we stop processing and return.
* <li>Before returning we register a bundle finalizer callback to update the watermark of the
* partition in the metadata tables to the latest processed timestamp.
* <li>When a change stream query finishes successfully (no more records) we update the
* partition state to FINISHED.
* </ol>
* There might be cases where due to a split at the exact end timestamp of a partition's change
* stream query, this function could process a residual with an invalid timestamp. In this case,
* the error is ignored and no work is done for the residual.
* @param partition the current partition being processed
* @param tracker the restriction tracker of the {@link
*} SDF
* @param receiver the output receiver of the {@link
*} SDF
* @param watermarkEstimator the watermark estimator of the {@link
*} SDF
* @param bundleFinalizer the bundle finalizer for {@link
*} SDF
* bundles
* @return a {@link ProcessContinuation#stop()} if a record timestamp could not be claimed or if
* the partition processing has finished
public ProcessContinuation run(PartitionMetadata partition, RestrictionTracker<OffsetRange, Long> tracker, OutputReceiver<DataChangeRecord> receiver, ManualWatermarkEstimator<Instant> watermarkEstimator, BundleFinalizer bundleFinalizer) {
final String token = partition.getPartitionToken();
final Timestamp endTimestamp = partition.getEndTimestamp();
* FIXME(b/202802422): Workaround until the backend is fixed.
* The change stream API returns invalid argument if we try to use a child partition start
* timestamp for a previously returned query. If we split at that exact time, we won't be able
* to obtain the child partition on the residual restriction, since it will start at the child
* partition start time.
* To circumvent this, we always start querying one microsecond before the restriction start
* time, and ignore any records that are before the restriction start time. This way the child
* partition should be returned within the query.
final Timestamp restrictionStartTimestamp = Timestamp.ofTimeMicroseconds(tracker.currentRestriction().getFrom());
final Timestamp previousStartTimestamp = Timestamp.ofTimeMicroseconds(TimestampConverter.timestampToMicros(restrictionStartTimestamp) - 1);
final boolean isFirstRun = restrictionStartTimestamp.compareTo(partition.getStartTimestamp()) == 0;
final Timestamp startTimestamp = isFirstRun ? restrictionStartTimestamp : previousStartTimestamp;
try (Scope scope = TRACER.spanBuilder("QueryChangeStreamAction").setRecordEvents(true).startScopedSpan()) {
TRACER.getCurrentSpan().putAttribute(PARTITION_ID_ATTRIBUTE_LABEL, AttributeValue.stringAttributeValue(token));
// TODO: Potentially we can avoid this fetch, by enriching the runningAt timestamp when the
// ReadChangeStreamPartitionDoFn#processElement is called
final PartitionMetadata updatedPartition = Optional.ofNullable(partitionMetadataDao.getPartition(token)).map(partitionMetadataMapper::from).orElseThrow(() -> new IllegalStateException("Partition " + token + " not found in metadata table"));
try (ChangeStreamResultSet resultSet = changeStreamDao.changeStreamQuery(token, startTimestamp, endTimestamp, partition.getHeartbeatMillis())) {
while ( {
final List<ChangeStreamRecord> records = changeStreamRecordMapper.toChangeStreamRecords(updatedPartition, resultSet.getCurrentRowAsStruct(), resultSet.getMetadata());
Optional<ProcessContinuation> maybeContinuation;
for (final ChangeStreamRecord record : records) {
if (record.getRecordTimestamp().compareTo(restrictionStartTimestamp) < 0) {
if (record instanceof DataChangeRecord) {
maybeContinuation =, (DataChangeRecord) record, tracker, receiver, watermarkEstimator);
} else if (record instanceof HeartbeatRecord) {
maybeContinuation =, (HeartbeatRecord) record, tracker, watermarkEstimator);
} else if (record instanceof ChildPartitionsRecord) {
maybeContinuation =, (ChildPartitionsRecord) record, tracker, watermarkEstimator);
} else {
LOG.error("[" + token + "] Unknown record type " + record.getClass());
throw new IllegalArgumentException("Unknown record type " + record.getClass());
if (maybeContinuation.isPresent()) {
LOG.debug("[" + token + "] Continuation present, returning " + maybeContinuation);
bundleFinalizer.afterBundleCommit(, updateWatermarkCallback(token, watermarkEstimator));
return maybeContinuation.get();
bundleFinalizer.afterBundleCommit(, updateWatermarkCallback(token, watermarkEstimator));
} catch (SpannerException e) {
if (isTimestampOutOfRange(e)) {
LOG.debug("[" + token + "] query change stream is out of range for " + startTimestamp + " to " + endTimestamp + ", finishing stream");
} else {
throw e;
final long endMicros = TimestampConverter.timestampToMicros(endTimestamp);
LOG.debug("[" + token + "] change stream completed successfully");
if (tracker.tryClaim(endMicros)) {
LOG.debug("[" + token + "] Finishing partition");
partitionMetadataDao.updateToFinished(token);"[" + token + "] Partition finished");
return ProcessContinuation.stop();
use of org.apache.beam.sdk.transforms.DoFn.ProcessContinuation in project beam by apache.
the class ReadFromKafkaDoFnTest method testProcessElementWithEmptyPoll.
public void testProcessElementWithEmptyPoll() throws Exception {
MockOutputReceiver receiver = new MockOutputReceiver();
OffsetRangeTracker tracker = new OffsetRangeTracker(new OffsetRange(0L, Long.MAX_VALUE));
ProcessContinuation result = dofnInstance.processElement(KafkaSourceDescriptor.of(topicPartition, null, null, null, null, null), tracker, null, (OutputReceiver) receiver);
assertEquals(ProcessContinuation.resume(), result);
use of org.apache.beam.sdk.transforms.DoFn.ProcessContinuation in project beam by apache.
the class ReadFromKafkaDoFnTest method testProcessElementWhenTopicPartitionIsStopped.
public void testProcessElementWhenTopicPartitionIsStopped() throws Exception {
MockOutputReceiver receiver = new MockOutputReceiver();
ReadFromKafkaDoFn<String, String> instance = new ReadFromKafkaDoFn(makeReadSourceDescriptor(consumer).toBuilder().setCheckStopReadingFn(new SerializableFunction<TopicPartition, Boolean>() {
public Boolean apply(TopicPartition input) {
return true;
OffsetRangeTracker tracker = new OffsetRangeTracker(new OffsetRange(0L, Long.MAX_VALUE));
ProcessContinuation result = instance.processElement(KafkaSourceDescriptor.of(topicPartition, null, null, null, null, null), tracker, null, (OutputReceiver) receiver);
assertEquals(ProcessContinuation.stop(), result);
use of org.apache.beam.sdk.transforms.DoFn.ProcessContinuation in project beam by apache.
the class ReadFromKafkaDoFnTest method testProcessElementWhenTopicPartitionIsRemoved.
public void testProcessElementWhenTopicPartitionIsRemoved() throws Exception {
MockOutputReceiver receiver = new MockOutputReceiver();
OffsetRangeTracker tracker = new OffsetRangeTracker(new OffsetRange(0L, Long.MAX_VALUE));
ProcessContinuation result = dofnInstance.processElement(KafkaSourceDescriptor.of(topicPartition, null, null, null, null, null), tracker, null, (OutputReceiver) receiver);
assertEquals(ProcessContinuation.stop(), result);
use of org.apache.beam.sdk.transforms.DoFn.ProcessContinuation in project beam by apache.
the class WatchTest method testPollingGrowthTrackerUsesElementTimestampIfNoWatermarkProvided.
public void testPollingGrowthTrackerUsesElementTimestampIfNoWatermarkProvided() throws Exception {
Instant now =;
Watch.Growth<String, String, String> growth = Watch.growthOf(new Watch.Growth.PollFn<String, String>() {
public PollResult<String> apply(String element, Context c) throws Exception {
// We specifically test an unsorted list.
return PollResult.incomplete(Arrays.asList(TimestampedValue.of("d",, TimestampedValue.of("c",, TimestampedValue.of("a",, TimestampedValue.of("b",;
WatchGrowthFn<String, String, String, Integer> growthFn = new WatchGrowthFn(growth, StringUtf8Coder.of(), SerializableFunctions.identity(), StringUtf8Coder.of());
GrowthTracker<String, Integer> tracker = newPollingGrowthTracker();
DoFn.ProcessContext context = mock(DoFn.ProcessContext.class);
ManualWatermarkEstimator<Instant> watermarkEstimator = new WatermarkEstimators.Manual(BoundedWindow.TIMESTAMP_MIN_VALUE);
ProcessContinuation processContinuation = growthFn.process(context, tracker, watermarkEstimator);
assertEquals(, watermarkEstimator.currentWatermark());