Search in sources :

Example 1 with ChildPartition

use of org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartition in project beam by apache.

the class ChildPartitionsRecordAction method run.

/**
 * This is the main processing function for a {@link ChildPartitionsRecord}. It returns an {@link
 * Optional} of {@link ProcessContinuation} to indicate if the calling function should stop or
 * not. If the {@link Optional} returned is empty, it means that the calling function can continue
 * with the processing. If an {@link Optional} of {@link ProcessContinuation#stop()} is returned,
 * it means that this function was unable to claim the timestamp of the {@link
 * ChildPartitionsRecord}, so the caller should stop.
 *
 * <p>When processing the {@link ChildPartitionsRecord} the following procedure is applied:
 *
 * <ol>
 *   <li>We try to claim the child partition record timestamp. If it is not possible, we stop here
 *       and return.
 *   <li>We update the watermark to the child partition record timestamp.
 *   <li>For each child partition, we try to insert them in the metadata tables if they do not
 *       exist.
 *   <li>For each child partition, we check if they originate from a split or a merge and
 *       increment the corresponding metric.
 * </ol>
 *
 * Dealing with partition splits and merge cases is detailed below:
 *
 * <ul>
 *   <li>Partition Splits: child partition tokens should not exist in the partition metadata
 *       table, so new rows are just added to such table. In case of a bundle retry, we silently
 *       ignore duplicate entries.
 *   <li>Partition Merges: the first parent partition that receives the child token should succeed
 *       in inserting it. The remaining parents will silently ignore and skip the insertion.
 * </ul>
 *
 * @param partition the current partition being processed
 * @param record the change stream child partition record received
 * @param tracker the restriction tracker of the {@link
 *     org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.ReadChangeStreamPartitionDoFn} SDF
 * @param watermarkEstimator the watermark estimator of the {@link
 *     org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.ReadChangeStreamPartitionDoFn} SDF
 * @return {@link Optional#empty()} if the caller can continue processing more records. A non
 *     empty {@link Optional} with {@link ProcessContinuation#stop()} if this function was unable
 *     to claim the {@link ChildPartitionsRecord} timestamp
 */
@VisibleForTesting
public Optional<ProcessContinuation> run(PartitionMetadata partition, ChildPartitionsRecord record, RestrictionTracker<OffsetRange, Long> tracker, ManualWatermarkEstimator<Instant> watermarkEstimator) {
    final String token = partition.getPartitionToken();
    try (Scope scope = TRACER.spanBuilder("ChildPartitionsRecordAction").setRecordEvents(true).startScopedSpan()) {
        TRACER.getCurrentSpan().putAttribute(PARTITION_ID_ATTRIBUTE_LABEL, AttributeValue.stringAttributeValue(token));
        LOG.debug("[" + token + "] Processing child partition record " + record);
        final Timestamp startTimestamp = record.getStartTimestamp();
        final Instant startInstant = new Instant(startTimestamp.toSqlTimestamp().getTime());
        final long startMicros = TimestampConverter.timestampToMicros(startTimestamp);
        if (!tracker.tryClaim(startMicros)) {
            LOG.debug("[" + token + "] Could not claim queryChangeStream(" + startTimestamp + "), stopping");
            return Optional.of(ProcessContinuation.stop());
        }
        watermarkEstimator.setWatermark(startInstant);
        for (ChildPartition childPartition : record.getChildPartitions()) {
            processChildPartition(partition, record, childPartition);
        }
        LOG.debug("[" + token + "] Child partitions action completed successfully");
        return Optional.empty();
    }
}
Also used : Scope(io.opencensus.common.Scope) ChildPartition(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartition) Instant(org.joda.time.Instant) Timestamp(com.google.cloud.Timestamp) VisibleForTesting(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting)

Example 2 with ChildPartition

use of org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartition in project beam by apache.

the class ChildPartitionsRecordAction method processChildPartition.

// Unboxing of runInTransaction result will not produce a null value, we can ignore it
@SuppressWarnings("nullness")
private void processChildPartition(PartitionMetadata partition, ChildPartitionsRecord record, ChildPartition childPartition) {
    try (Scope scope = TRACER.spanBuilder("ChildPartitionsRecordAction.processChildPartition").setRecordEvents(true).startScopedSpan()) {
        TRACER.getCurrentSpan().putAttribute(PARTITION_ID_ATTRIBUTE_LABEL, AttributeValue.stringAttributeValue(partition.getPartitionToken()));
        final String partitionToken = partition.getPartitionToken();
        final String childPartitionToken = childPartition.getToken();
        final boolean isSplit = isSplit(childPartition);
        LOG.debug("[" + partitionToken + "] Processing child partition" + (isSplit ? " split" : " merge") + " event");
        final PartitionMetadata row = toPartitionMetadata(record.getStartTimestamp(), partition.getEndTimestamp(), partition.getHeartbeatMillis(), childPartition);
        LOG.debug("[" + partitionToken + "] Inserting child partition token " + childPartitionToken);
        final Boolean insertedRow = partitionMetadataDao.runInTransaction(transaction -> {
            if (transaction.getPartition(childPartitionToken) == null) {
                transaction.insert(row);
                return true;
            } else {
                return false;
            }
        }).getResult();
        if (insertedRow && isSplit) {
            metrics.incPartitionRecordSplitCount();
        } else if (insertedRow) {
            metrics.incPartitionRecordMergeCount();
        } else {
            LOG.debug("[" + partitionToken + "] Child token " + childPartitionToken + " already exists, skipping...");
        }
    }
}
Also used : Tracer(io.opencensus.trace.Tracer) Logger(org.slf4j.Logger) AttributeValue(io.opencensus.trace.AttributeValue) ChangeStreamMetrics(org.apache.beam.sdk.io.gcp.spanner.changestreams.ChangeStreamMetrics) ChildPartition(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartition) PartitionMetadataDao(org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.PartitionMetadataDao) Scope(io.opencensus.common.Scope) LoggerFactory(org.slf4j.LoggerFactory) ProcessContinuation(org.apache.beam.sdk.transforms.DoFn.ProcessContinuation) PARTITION_ID_ATTRIBUTE_LABEL(org.apache.beam.sdk.io.gcp.spanner.changestreams.ChangeStreamMetrics.PARTITION_ID_ATTRIBUTE_LABEL) Timestamp(com.google.cloud.Timestamp) TimestampConverter(org.apache.beam.sdk.io.gcp.spanner.changestreams.TimestampConverter) ManualWatermarkEstimator(org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator) PartitionMetadata(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.PartitionMetadata) ChildPartitionsRecord(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartitionsRecord) VisibleForTesting(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting) CREATED(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.PartitionMetadata.State.CREATED) Instant(org.joda.time.Instant) Optional(java.util.Optional) Tracing(io.opencensus.trace.Tracing) OffsetRange(org.apache.beam.sdk.io.range.OffsetRange) RestrictionTracker(org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker) Nullable(javax.annotation.Nullable) Scope(io.opencensus.common.Scope) PartitionMetadata(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.PartitionMetadata)

Example 3 with ChildPartition

use of org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartition in project beam by apache.

the class ChildPartitionsRecordActionTest method testRestrictionClaimedAnsIsSplitCaseAndChildExists.

@Test
public void testRestrictionClaimedAnsIsSplitCaseAndChildExists() {
    final String partitionToken = "partitionToken";
    final long heartbeat = 30L;
    final Timestamp startTimestamp = Timestamp.ofTimeMicroseconds(10L);
    final Timestamp endTimestamp = Timestamp.ofTimeMicroseconds(20L);
    final PartitionMetadata partition = mock(PartitionMetadata.class);
    final ChildPartitionsRecord record = new ChildPartitionsRecord(startTimestamp, "recordSequence", Arrays.asList(new ChildPartition("childPartition1", partitionToken), new ChildPartition("childPartition2", partitionToken)), null);
    when(partition.getEndTimestamp()).thenReturn(endTimestamp);
    when(partition.getHeartbeatMillis()).thenReturn(heartbeat);
    when(partition.getPartitionToken()).thenReturn(partitionToken);
    when(tracker.tryClaim(10L)).thenReturn(true);
    when(transaction.getPartition("childPartition1")).thenReturn(mock(Struct.class));
    when(transaction.getPartition("childPartition2")).thenReturn(mock(Struct.class));
    final Optional<ProcessContinuation> maybeContinuation = action.run(partition, record, tracker, watermarkEstimator);
    assertEquals(Optional.empty(), maybeContinuation);
    verify(watermarkEstimator).setWatermark(new Instant(startTimestamp.toSqlTimestamp().getTime()));
}
Also used : ChildPartition(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartition) Instant(org.joda.time.Instant) PartitionMetadata(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.PartitionMetadata) ChildPartitionsRecord(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartitionsRecord) Timestamp(com.google.cloud.Timestamp) Struct(com.google.cloud.spanner.Struct) ProcessContinuation(org.apache.beam.sdk.transforms.DoFn.ProcessContinuation) Test(org.junit.Test)

Example 4 with ChildPartition

use of org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartition in project beam by apache.

the class ChildPartitionsRecordActionTest method testRestrictionNotClaimed.

@Test
public void testRestrictionNotClaimed() {
    final String partitionToken = "partitionToken";
    final Timestamp startTimestamp = Timestamp.ofTimeMicroseconds(10L);
    final PartitionMetadata partition = mock(PartitionMetadata.class);
    final ChildPartitionsRecord record = new ChildPartitionsRecord(startTimestamp, "recordSequence", Arrays.asList(new ChildPartition("childPartition1", partitionToken), new ChildPartition("childPartition2", partitionToken)), null);
    when(partition.getPartitionToken()).thenReturn(partitionToken);
    when(tracker.tryClaim(10L)).thenReturn(false);
    final Optional<ProcessContinuation> maybeContinuation = action.run(partition, record, tracker, watermarkEstimator);
    assertEquals(Optional.of(ProcessContinuation.stop()), maybeContinuation);
    verify(watermarkEstimator, never()).setWatermark(any());
    verify(dao, never()).insert(any());
}
Also used : ChildPartition(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartition) PartitionMetadata(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.PartitionMetadata) ChildPartitionsRecord(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartitionsRecord) Timestamp(com.google.cloud.Timestamp) ProcessContinuation(org.apache.beam.sdk.transforms.DoFn.ProcessContinuation) Test(org.junit.Test)

Example 5 with ChildPartition

use of org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartition in project beam by apache.

the class ChangeStreamRecordMapperTest method testMappingStructRowFromInitialPartitionToChildPartitionRecord.

/**
 * Adds the default parent partition token as a parent of each child partition.
 */
@Test
public void testMappingStructRowFromInitialPartitionToChildPartitionRecord() {
    final Struct struct = recordsToStructWithStrings(new ChildPartitionsRecord(Timestamp.ofTimeSecondsAndNanos(10L, 20), "1", Arrays.asList(new ChildPartition("childToken1", Sets.newHashSet()), new ChildPartition("childToken2", Sets.newHashSet())), null));
    final ChildPartitionsRecord expected = new ChildPartitionsRecord(Timestamp.ofTimeSecondsAndNanos(10L, 20), "1", Arrays.asList(new ChildPartition("childToken1", Sets.newHashSet(InitialPartition.PARTITION_TOKEN)), new ChildPartition("childToken2", Sets.newHashSet(InitialPartition.PARTITION_TOKEN))), null);
    final PartitionMetadata initialPartition = partition.toBuilder().setPartitionToken(InitialPartition.PARTITION_TOKEN).build();
    assertEquals(Collections.singletonList(expected), mapper.toChangeStreamRecords(initialPartition, struct, resultSetMetadata));
}
Also used : ChildPartition(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartition) PartitionMetadata(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.PartitionMetadata) ChildPartitionsRecord(org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartitionsRecord) Struct(com.google.cloud.spanner.Struct) Test(org.junit.Test)

Aggregations

ChildPartition (org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartition)9 ChildPartitionsRecord (org.apache.beam.sdk.io.gcp.spanner.changestreams.model.ChildPartitionsRecord)8 Timestamp (com.google.cloud.Timestamp)7 PartitionMetadata (org.apache.beam.sdk.io.gcp.spanner.changestreams.model.PartitionMetadata)7 Test (org.junit.Test)7 ProcessContinuation (org.apache.beam.sdk.transforms.DoFn.ProcessContinuation)6 Instant (org.joda.time.Instant)6 Struct (com.google.cloud.spanner.Struct)4 Scope (io.opencensus.common.Scope)2 VisibleForTesting (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting)2 AttributeValue (io.opencensus.trace.AttributeValue)1 Tracer (io.opencensus.trace.Tracer)1 Tracing (io.opencensus.trace.Tracing)1 Optional (java.util.Optional)1 Nullable (javax.annotation.Nullable)1 ChangeStreamMetrics (org.apache.beam.sdk.io.gcp.spanner.changestreams.ChangeStreamMetrics)1 PARTITION_ID_ATTRIBUTE_LABEL (org.apache.beam.sdk.io.gcp.spanner.changestreams.ChangeStreamMetrics.PARTITION_ID_ATTRIBUTE_LABEL)1 TimestampConverter (org.apache.beam.sdk.io.gcp.spanner.changestreams.TimestampConverter)1 PartitionMetadataDao (org.apache.beam.sdk.io.gcp.spanner.changestreams.dao.PartitionMetadataDao)1 CREATED (org.apache.beam.sdk.io.gcp.spanner.changestreams.model.PartitionMetadata.State.CREATED)1