Search in sources :

Example 1 with ChangeStreamResultSet

use of org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSet in project beam by apache.

the class QueryChangeStreamAction method run.

/**
 * This method will dispatch a change stream query for the given partition, it delegate the
 * processing of the records to one of the corresponding action classes registered and it will
 * keep the state of the partition up to date in the Connector's metadata table.
 *
 * <p>The algorithm is as follows:
 *
 * <ol>
 *   <li>A change stream query for the partition is performed.
 *   <li>For each record, we check the type of the record and dispatch the processing to one of
 *       the actions registered.
 *   <li>If an {@link Optional} with a {@link ProcessContinuation#stop()} is returned from the
 *       actions, we stop processing and return.
 *   <li>Before returning we register a bundle finalizer callback to update the watermark of the
 *       partition in the metadata tables to the latest processed timestamp.
 *   <li>When a change stream query finishes successfully (no more records) we update the
 *       partition state to FINISHED.
 * </ol>
 *
 * There might be cases where due to a split at the exact end timestamp of a partition's change
 * stream query, this function could process a residual with an invalid timestamp. In this case,
 * the error is ignored and no work is done for the residual.
 *
 * @param partition the current partition being processed
 * @param tracker the restriction tracker of the {@link
 *     org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.ReadChangeStreamPartitionDoFn} SDF
 * @param receiver the output receiver of the {@link
 *     org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.ReadChangeStreamPartitionDoFn} SDF
 * @param watermarkEstimator the watermark estimator of the {@link
 *     org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.ReadChangeStreamPartitionDoFn} SDF
 * @param bundleFinalizer the bundle finalizer for {@link
 *     org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.ReadChangeStreamPartitionDoFn} SDF
 *     bundles
 * @return a {@link ProcessContinuation#stop()} if a record timestamp could not be claimed or if
 *     the partition processing has finished
 */
@SuppressWarnings("nullness")
@VisibleForTesting
public ProcessContinuation run(PartitionMetadata partition, RestrictionTracker<OffsetRange, Long> tracker, OutputReceiver<DataChangeRecord> receiver, ManualWatermarkEstimator<Instant> watermarkEstimator, BundleFinalizer bundleFinalizer) {
    final String token = partition.getPartitionToken();
    final Timestamp endTimestamp = partition.getEndTimestamp();
    /*
     * FIXME(b/202802422): Workaround until the backend is fixed.
     * The change stream API returns invalid argument if we try to use a child partition start
     * timestamp for a previously returned query. If we split at that exact time, we won't be able
     * to obtain the child partition on the residual restriction, since it will start at the child
     * partition start time.
     * To circumvent this, we always start querying one microsecond before the restriction start
     * time, and ignore any records that are before the restriction start time. This way the child
     * partition should be returned within the query.
     */
    final Timestamp restrictionStartTimestamp = Timestamp.ofTimeMicroseconds(tracker.currentRestriction().getFrom());
    final Timestamp previousStartTimestamp = Timestamp.ofTimeMicroseconds(TimestampConverter.timestampToMicros(restrictionStartTimestamp) - 1);
    final boolean isFirstRun = restrictionStartTimestamp.compareTo(partition.getStartTimestamp()) == 0;
    final Timestamp startTimestamp = isFirstRun ? restrictionStartTimestamp : previousStartTimestamp;
    try (Scope scope = TRACER.spanBuilder("QueryChangeStreamAction").setRecordEvents(true).startScopedSpan()) {
        TRACER.getCurrentSpan().putAttribute(PARTITION_ID_ATTRIBUTE_LABEL, AttributeValue.stringAttributeValue(token));
        // TODO: Potentially we can avoid this fetch, by enriching the runningAt timestamp when the
        // ReadChangeStreamPartitionDoFn#processElement is called
        final PartitionMetadata updatedPartition = Optional.ofNullable(partitionMetadataDao.getPartition(token)).map(partitionMetadataMapper::from).orElseThrow(() -> new IllegalStateException("Partition " + token + " not found in metadata table"));
        try (ChangeStreamResultSet resultSet = changeStreamDao.changeStreamQuery(token, startTimestamp, endTimestamp, partition.getHeartbeatMillis())) {
            while (resultSet.next()) {
                final List<ChangeStreamRecord> records = changeStreamRecordMapper.toChangeStreamRecords(updatedPartition, resultSet.getCurrentRowAsStruct(), resultSet.getMetadata());
                Optional<ProcessContinuation> maybeContinuation;
                for (final ChangeStreamRecord record : records) {
                    if (record.getRecordTimestamp().compareTo(restrictionStartTimestamp) < 0) {
                        continue;
                    }
                    if (record instanceof DataChangeRecord) {
                        maybeContinuation = dataChangeRecordAction.run(updatedPartition, (DataChangeRecord) record, tracker, receiver, watermarkEstimator);
                    } else if (record instanceof HeartbeatRecord) {
                        maybeContinuation = heartbeatRecordAction.run(updatedPartition, (HeartbeatRecord) record, tracker, watermarkEstimator);
                    } else if (record instanceof ChildPartitionsRecord) {
                        maybeContinuation = childPartitionsRecordAction.run(updatedPartition, (ChildPartitionsRecord) record, tracker, watermarkEstimator);
                    } else {
                        LOG.error("[" + token + "] Unknown record type " + record.getClass());
                        throw new IllegalArgumentException("Unknown record type " + record.getClass());
                    }
                    if (maybeContinuation.isPresent()) {
                        LOG.debug("[" + token + "] Continuation present, returning " + maybeContinuation);
                        bundleFinalizer.afterBundleCommit(Instant.now().plus(BUNDLE_FINALIZER_TIMEOUT), updateWatermarkCallback(token, watermarkEstimator));
                        return maybeContinuation.get();
                    }
                }
            }
            bundleFinalizer.afterBundleCommit(Instant.now().plus(BUNDLE_FINALIZER_TIMEOUT), updateWatermarkCallback(token, watermarkEstimator));
        } catch (SpannerException e) {
            if (isTimestampOutOfRange(e)) {
                LOG.debug("[" + token + "] query change stream is out of range for " + startTimestamp + " to " + endTimestamp + ", finishing stream");
            } else {
                throw e;
            }
        }
    }
    final long endMicros = TimestampConverter.timestampToMicros(endTimestamp);
    LOG.debug("[" + token + "] change stream completed successfully");
    if (tracker.tryClaim(endMicros)) {
        LOG.debug("[" + token + "] Finishing partition");
        partitionMetadataDao.updateToFinished(token);
        LOG.info("[" + token + "] Partition finished");
    }
    return ProcessContinuation.stop();
}
Also used : DataChangeRecord(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.DataChangeRecord) HeartbeatRecord(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.HeartbeatRecord) Timestamp(com.google.cloud.Timestamp) ProcessContinuation(org.apache.beam.sdk.transforms.DoFn.ProcessContinuation) ChangeStreamResultSet(org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSet) Scope(io.opencensus.common.Scope) PartitionMetadata(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.PartitionMetadata) ChildPartitionsRecord(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartitionsRecord) SpannerException(com.google.cloud.spanner.SpannerException) ChangeStreamRecord(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChangeStreamRecord) VisibleForTesting(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting)

Example 2 with ChangeStreamResultSet

use of org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSet in project beam by apache.

the class QueryChangeStreamActionTest method testQueryChangeStreamWithDataChangeRecord.

@Test
public void testQueryChangeStreamWithDataChangeRecord() {
    final Struct rowAsStruct = mock(Struct.class);
    final ChangeStreamResultSetMetadata resultSetMetadata = mock(ChangeStreamResultSetMetadata.class);
    final ChangeStreamResultSet resultSet = mock(ChangeStreamResultSet.class);
    final DataChangeRecord record1 = mock(DataChangeRecord.class);
    final DataChangeRecord record2 = mock(DataChangeRecord.class);
    when(record1.getRecordTimestamp()).thenReturn(PARTITION_START_TIMESTAMP);
    when(record2.getRecordTimestamp()).thenReturn(PARTITION_START_TIMESTAMP);
    when(changeStreamDao.changeStreamQuery(PARTITION_TOKEN, PARTITION_START_TIMESTAMP, PARTITION_END_TIMESTAMP, PARTITION_HEARTBEAT_MILLIS)).thenReturn(resultSet);
    when(resultSet.next()).thenReturn(true);
    when(resultSet.getCurrentRowAsStruct()).thenReturn(rowAsStruct);
    when(resultSet.getMetadata()).thenReturn(resultSetMetadata);
    when(changeStreamRecordMapper.toChangeStreamRecords(partition, rowAsStruct, resultSetMetadata)).thenReturn(Arrays.asList(record1, record2));
    when(dataChangeRecordAction.run(partition, record1, restrictionTracker, outputReceiver, watermarkEstimator)).thenReturn(Optional.empty());
    when(dataChangeRecordAction.run(partition, record2, restrictionTracker, outputReceiver, watermarkEstimator)).thenReturn(Optional.of(ProcessContinuation.stop()));
    when(watermarkEstimator.currentWatermark()).thenReturn(WATERMARK);
    final ProcessContinuation result = action.run(partition, restrictionTracker, outputReceiver, watermarkEstimator, bundleFinalizer);
    assertEquals(ProcessContinuation.stop(), result);
    verify(dataChangeRecordAction).run(partition, record1, restrictionTracker, outputReceiver, watermarkEstimator);
    verify(dataChangeRecordAction).run(partition, record2, restrictionTracker, outputReceiver, watermarkEstimator);
    verify(partitionMetadataDao).updateWatermark(PARTITION_TOKEN, WATERMARK_TIMESTAMP);
    verify(heartbeatRecordAction, never()).run(any(), any(), any(), any());
    verify(childPartitionsRecordAction, never()).run(any(), any(), any(), any());
    verify(restrictionTracker, never()).tryClaim(any());
}
Also used : ChangeStreamResultSet(org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSet) ChangeStreamResultSetMetadata(org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSetMetadata) DataChangeRecord(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.DataChangeRecord) Struct(com.google.cloud.spanner.Struct) ProcessContinuation(org.apache.beam.sdk.transforms.DoFn.ProcessContinuation) Test(org.junit.Test)

Example 3 with ChangeStreamResultSet

use of org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSet in project beam by apache.

the class QueryChangeStreamActionTest method testQueryChangeStreamWithHeartbeatRecord.

@Test
public void testQueryChangeStreamWithHeartbeatRecord() {
    final Struct rowAsStruct = mock(Struct.class);
    final ChangeStreamResultSetMetadata resultSetMetadata = mock(ChangeStreamResultSetMetadata.class);
    final ChangeStreamResultSet resultSet = mock(ChangeStreamResultSet.class);
    final HeartbeatRecord record1 = mock(HeartbeatRecord.class);
    final HeartbeatRecord record2 = mock(HeartbeatRecord.class);
    when(record1.getRecordTimestamp()).thenReturn(PARTITION_START_TIMESTAMP);
    when(record2.getRecordTimestamp()).thenReturn(PARTITION_START_TIMESTAMP);
    when(changeStreamDao.changeStreamQuery(PARTITION_TOKEN, PARTITION_START_TIMESTAMP, PARTITION_END_TIMESTAMP, PARTITION_HEARTBEAT_MILLIS)).thenReturn(resultSet);
    when(resultSet.next()).thenReturn(true);
    when(resultSet.getCurrentRowAsStruct()).thenReturn(rowAsStruct);
    when(resultSet.getMetadata()).thenReturn(resultSetMetadata);
    when(changeStreamRecordMapper.toChangeStreamRecords(partition, rowAsStruct, resultSetMetadata)).thenReturn(Arrays.asList(record1, record2));
    when(heartbeatRecordAction.run(partition, record1, restrictionTracker, watermarkEstimator)).thenReturn(Optional.empty());
    when(heartbeatRecordAction.run(partition, record2, restrictionTracker, watermarkEstimator)).thenReturn(Optional.of(ProcessContinuation.stop()));
    when(watermarkEstimator.currentWatermark()).thenReturn(WATERMARK);
    final ProcessContinuation result = action.run(partition, restrictionTracker, outputReceiver, watermarkEstimator, bundleFinalizer);
    assertEquals(ProcessContinuation.stop(), result);
    verify(heartbeatRecordAction).run(partition, record1, restrictionTracker, watermarkEstimator);
    verify(heartbeatRecordAction).run(partition, record2, restrictionTracker, watermarkEstimator);
    verify(partitionMetadataDao).updateWatermark(PARTITION_TOKEN, WATERMARK_TIMESTAMP);
    verify(dataChangeRecordAction, never()).run(any(), any(), any(), any(), any());
    verify(childPartitionsRecordAction, never()).run(any(), any(), any(), any());
    verify(restrictionTracker, never()).tryClaim(any());
}
Also used : ChangeStreamResultSet(org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSet) ChangeStreamResultSetMetadata(org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSetMetadata) HeartbeatRecord(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.HeartbeatRecord) Struct(com.google.cloud.spanner.Struct) ProcessContinuation(org.apache.beam.sdk.transforms.DoFn.ProcessContinuation) Test(org.junit.Test)

Example 4 with ChangeStreamResultSet

use of org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSet in project beam by apache.

the class QueryChangeStreamActionTest method testQueryChangeStreamWithChildPartitionsRecord.

@Test
public void testQueryChangeStreamWithChildPartitionsRecord() {
    final Struct rowAsStruct = mock(Struct.class);
    final ChangeStreamResultSetMetadata resultSetMetadata = mock(ChangeStreamResultSetMetadata.class);
    final ChangeStreamResultSet resultSet = mock(ChangeStreamResultSet.class);
    final ChildPartitionsRecord record1 = mock(ChildPartitionsRecord.class);
    final ChildPartitionsRecord record2 = mock(ChildPartitionsRecord.class);
    when(record1.getRecordTimestamp()).thenReturn(PARTITION_START_TIMESTAMP);
    when(record2.getRecordTimestamp()).thenReturn(PARTITION_START_TIMESTAMP);
    when(changeStreamDao.changeStreamQuery(PARTITION_TOKEN, PARTITION_START_TIMESTAMP, PARTITION_END_TIMESTAMP, PARTITION_HEARTBEAT_MILLIS)).thenReturn(resultSet);
    when(resultSet.next()).thenReturn(true);
    when(resultSet.getCurrentRowAsStruct()).thenReturn(rowAsStruct);
    when(resultSet.getMetadata()).thenReturn(resultSetMetadata);
    when(changeStreamRecordMapper.toChangeStreamRecords(partition, rowAsStruct, resultSetMetadata)).thenReturn(Arrays.asList(record1, record2));
    when(childPartitionsRecordAction.run(partition, record1, restrictionTracker, watermarkEstimator)).thenReturn(Optional.empty());
    when(childPartitionsRecordAction.run(partition, record2, restrictionTracker, watermarkEstimator)).thenReturn(Optional.of(ProcessContinuation.stop()));
    when(watermarkEstimator.currentWatermark()).thenReturn(WATERMARK);
    final ProcessContinuation result = action.run(partition, restrictionTracker, outputReceiver, watermarkEstimator, bundleFinalizer);
    assertEquals(ProcessContinuation.stop(), result);
    verify(childPartitionsRecordAction).run(partition, record1, restrictionTracker, watermarkEstimator);
    verify(childPartitionsRecordAction).run(partition, record2, restrictionTracker, watermarkEstimator);
    verify(partitionMetadataDao).updateWatermark(PARTITION_TOKEN, WATERMARK_TIMESTAMP);
    verify(dataChangeRecordAction, never()).run(any(), any(), any(), any(), any());
    verify(heartbeatRecordAction, never()).run(any(), any(), any(), any());
    verify(restrictionTracker, never()).tryClaim(any());
}
Also used : ChangeStreamResultSet(org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSet) ChangeStreamResultSetMetadata(org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSetMetadata) ChildPartitionsRecord(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartitionsRecord) Struct(com.google.cloud.spanner.Struct) ProcessContinuation(org.apache.beam.sdk.transforms.DoFn.ProcessContinuation) Test(org.junit.Test)

Example 5 with ChangeStreamResultSet

use of org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSet in project beam by apache.

the class QueryChangeStreamActionTest method testQueryChangeStreamWithRestrictionStartAfterPartitionStart.

@Test
public void testQueryChangeStreamWithRestrictionStartAfterPartitionStart() {
    final Struct rowAsStruct = mock(Struct.class);
    final ChangeStreamResultSetMetadata resultSetMetadata = mock(ChangeStreamResultSetMetadata.class);
    final ChangeStreamResultSet resultSet = mock(ChangeStreamResultSet.class);
    final ChildPartitionsRecord record1 = mock(ChildPartitionsRecord.class);
    final ChildPartitionsRecord record2 = mock(ChildPartitionsRecord.class);
    // One microsecond after partition start timestamp
    when(restriction.getFrom()).thenReturn(11L);
    // This record should be ignored because it is before restriction.getFrom
    when(record1.getRecordTimestamp()).thenReturn(Timestamp.ofTimeMicroseconds(10L));
    // This record should be included because it is at the restriction.getFrom
    when(record2.getRecordTimestamp()).thenReturn(Timestamp.ofTimeMicroseconds(11L));
    // We should start the query 1 microsecond before the restriction.getFrom
    when(changeStreamDao.changeStreamQuery(PARTITION_TOKEN, Timestamp.ofTimeMicroseconds(10L), PARTITION_END_TIMESTAMP, PARTITION_HEARTBEAT_MILLIS)).thenReturn(resultSet);
    when(resultSet.next()).thenReturn(true);
    when(resultSet.getCurrentRowAsStruct()).thenReturn(rowAsStruct);
    when(resultSet.getMetadata()).thenReturn(resultSetMetadata);
    when(changeStreamRecordMapper.toChangeStreamRecords(partition, rowAsStruct, resultSetMetadata)).thenReturn(Arrays.asList(record1, record2));
    when(childPartitionsRecordAction.run(partition, record2, restrictionTracker, watermarkEstimator)).thenReturn(Optional.of(ProcessContinuation.stop()));
    when(watermarkEstimator.currentWatermark()).thenReturn(WATERMARK);
    final ProcessContinuation result = action.run(partition, restrictionTracker, outputReceiver, watermarkEstimator, bundleFinalizer);
    assertEquals(ProcessContinuation.stop(), result);
    verify(childPartitionsRecordAction).run(partition, record2, restrictionTracker, watermarkEstimator);
    verify(partitionMetadataDao).updateWatermark(PARTITION_TOKEN, WATERMARK_TIMESTAMP);
    verify(childPartitionsRecordAction, never()).run(partition, record1, restrictionTracker, watermarkEstimator);
    verify(dataChangeRecordAction, never()).run(any(), any(), any(), any(), any());
    verify(heartbeatRecordAction, never()).run(any(), any(), any(), any());
    verify(restrictionTracker, never()).tryClaim(any());
}
Also used : ChangeStreamResultSet(org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSet) ChangeStreamResultSetMetadata(org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSetMetadata) ChildPartitionsRecord(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartitionsRecord) Struct(com.google.cloud.spanner.Struct) ProcessContinuation(org.apache.beam.sdk.transforms.DoFn.ProcessContinuation) Test(org.junit.Test)

Aggregations

ChangeStreamResultSet (org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSet)6 ProcessContinuation (org.apache.beam.sdk.transforms.DoFn.ProcessContinuation)6 Test (org.junit.Test)5 Struct (com.google.cloud.spanner.Struct)4 ChangeStreamResultSetMetadata (org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.ChangeStreamResultSetMetadata)4 ChildPartitionsRecord (org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartitionsRecord)3 DataChangeRecord (org.apache.beam.sdk.io.gcp.spanner.changestreams.model.DataChangeRecord)2 HeartbeatRecord (org.apache.beam.sdk.io.gcp.spanner.changestreams.model.HeartbeatRecord)2 Timestamp (com.google.cloud.Timestamp)1 SpannerException (com.google.cloud.spanner.SpannerException)1 Scope (io.opencensus.common.Scope)1 ChangeStreamRecord (org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChangeStreamRecord)1 PartitionMetadata (org.apache.beam.sdk.io.gcp.spanner.changestreams.model.PartitionMetadata)1 VisibleForTesting (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting)1