Search in sources :

Example 1 with SpannerConfig

use of org.apache.beam.sdk.io.gcp.spanner.SpannerConfig in project java-docs-samples by GoogleCloudPlatform.

the class TransactionalRead method main.

public static void main(String[] args) {
    Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
    Pipeline p = Pipeline.create(options);
    String instanceId = options.getInstanceId();
    String databaseId = options.getDatabaseId();
    // [START spanner_dataflow_txread]
    SpannerConfig spannerConfig = SpannerConfig.create().withInstanceId(instanceId).withDatabaseId(databaseId);
    PCollectionView<Transaction> tx = p.apply(SpannerIO.createTransaction().withSpannerConfig(spannerConfig).withTimestampBound(TimestampBound.strong()));
    PCollection<Struct> singers = p.apply(SpannerIO.read().withSpannerConfig(spannerConfig).withQuery("SELECT SingerID, FirstName, LastName FROM Singers").withTransaction(tx));
    PCollection<Struct> albums = p.apply(SpannerIO.read().withSpannerConfig(spannerConfig).withQuery("SELECT SingerId, AlbumId, AlbumTitle FROM Albums").withTransaction(tx));
    // [END spanner_dataflow_txread]
    singers.apply(MapElements.via(new SimpleFunction<Struct, String>() {

        @Override
        public String apply(Struct input) {
            return Joiner.on(DELIMITER).join(input.getLong(0), input.getString(1), input.getString(2));
        }
    })).apply(TextIO.write().to(options.getSingersFilename()).withoutSharding());
    albums.apply(MapElements.via(new SimpleFunction<Struct, String>() {

        @Override
        public String apply(Struct input) {
            return Joiner.on(DELIMITER).join(input.getLong(0), input.getLong(1), input.getString(2));
        }
    })).apply(TextIO.write().to(options.getAlbumsFilename()).withoutSharding());
    p.run().waitUntilFinish();
}
Also used : SpannerConfig(org.apache.beam.sdk.io.gcp.spanner.SpannerConfig) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) Transaction(org.apache.beam.sdk.io.gcp.spanner.Transaction) Pipeline(org.apache.beam.sdk.Pipeline) Struct(com.google.cloud.spanner.Struct)

Example 2 with SpannerConfig

use of org.apache.beam.sdk.io.gcp.spanner.SpannerConfig in project java-docs-samples by GoogleCloudPlatform.

the class SpannerReadAll method main.

public static void main(String[] args) {
    Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
    Pipeline p = Pipeline.create(options);
    SpannerConfig spannerConfig = SpannerConfig.create().withInstanceId(options.getInstanceId()).withDatabaseId(options.getDatabaseId());
    // [START spanner_dataflow_readall]
    PCollection<Struct> allRecords = p.apply(SpannerIO.read().withSpannerConfig(spannerConfig).withQuery("SELECT t.table_name FROM information_schema.tables AS t WHERE t" + ".table_catalog = '' AND t.table_schema = ''")).apply(MapElements.into(TypeDescriptor.of(ReadOperation.class)).via((SerializableFunction<Struct, ReadOperation>) input -> {
        String tableName = input.getString(0);
        return ReadOperation.create().withQuery("SELECT * FROM " + tableName);
    })).apply(SpannerIO.readAll().withSpannerConfig(spannerConfig));
    // [END spanner_dataflow_readall]
    PCollection<Long> dbEstimatedSize = allRecords.apply(EstimateSize.create()).apply(Sum.longsGlobally());
    dbEstimatedSize.apply(ToString.elements()).apply(TextIO.write().to(options.getOutput()).withoutSharding());
    p.run().waitUntilFinish();
}
Also used : SpannerConfig(org.apache.beam.sdk.io.gcp.spanner.SpannerConfig) MapElements(org.apache.beam.sdk.transforms.MapElements) ToString(org.apache.beam.sdk.transforms.ToString) TypeDescriptor(org.apache.beam.sdk.values.TypeDescriptor) Sum(org.apache.beam.sdk.transforms.Sum) SerializableFunction(org.apache.beam.sdk.transforms.SerializableFunction) PipelineOptionsFactory(org.apache.beam.sdk.options.PipelineOptionsFactory) PCollection(org.apache.beam.sdk.values.PCollection) SpannerIO(org.apache.beam.sdk.io.gcp.spanner.SpannerIO) SpannerConfig(org.apache.beam.sdk.io.gcp.spanner.SpannerConfig) Description(org.apache.beam.sdk.options.Description) ReadOperation(org.apache.beam.sdk.io.gcp.spanner.ReadOperation) Struct(com.google.cloud.spanner.Struct) Validation(org.apache.beam.sdk.options.Validation) Pipeline(org.apache.beam.sdk.Pipeline) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) TextIO(org.apache.beam.sdk.io.TextIO) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) ReadOperation(org.apache.beam.sdk.io.gcp.spanner.ReadOperation) ToString(org.apache.beam.sdk.transforms.ToString) Pipeline(org.apache.beam.sdk.Pipeline) Struct(com.google.cloud.spanner.Struct)

Example 3 with SpannerConfig

use of org.apache.beam.sdk.io.gcp.spanner.SpannerConfig in project beam by apache.

the class SpannerChangeStreamOrderedWithinKeyGloballyIT method testOrderedWithinKey.

@Test
public void testOrderedWithinKey() {
    final SpannerConfig spannerConfig = SpannerConfig.create().withProjectId(projectId).withInstanceId(instanceId).withDatabaseId(databaseId);
    // Get the time increment interval at which to flush data changes ordered by key.
    final long timeIncrementInSeconds = 70;
    // Commit a initial transaction to get the timestamp to start reading from.
    List<Mutation> mutations = new ArrayList<>();
    mutations.add(insertRecordMutation(0));
    final com.google.cloud.Timestamp startTimestamp = databaseClient.write(mutations);
    // This will be the first batch of transactions that will have strict timestamp ordering
    // per key.
    writeTransactionsToDatabase();
    // Sleep the time increment interval.
    try {
        Thread.sleep(timeIncrementInSeconds * 1000);
    } catch (InterruptedException e) {
        System.out.println(e);
    }
    // This will be the second batch of transactions that will have strict timestamp ordering
    // per key.
    writeTransactionsToDatabase();
    // Sleep the time increment interval.
    try {
        Thread.sleep(timeIncrementInSeconds * 1000);
    } catch (InterruptedException e) {
        System.out.println(e);
    }
    // This will be the final batch of transactions that will have strict timestamp ordering
    // per key.
    com.google.cloud.Timestamp endTimestamp = writeTransactionsToDatabase();
    LOG.debug("Reading change streams from {} to {}", startTimestamp.toString(), endTimestamp.toString());
    final PCollection<String> tokens = pipeline.apply(SpannerIO.readChangeStream().withSpannerConfig(spannerConfig).withChangeStreamName(changeStreamName).withMetadataDatabase(databaseId).withInclusiveStartAt(startTimestamp).withInclusiveEndAt(endTimestamp)).apply(ParDo.of(new BreakRecordByModFn())).apply(ParDo.of(new KeyByIdFn())).apply(ParDo.of(new KeyValueByCommitTimestampAndTransactionIdFn<>())).apply(ParDo.of(new BufferKeyUntilOutputTimestamp(endTimestamp, timeIncrementInSeconds))).apply(ParDo.of(new ToStringFn()));
    // Assert that the returned PCollection contains one entry per key for the committed
    // transactions, and that each entry contains the mutations in commit timestamp order.
    // Note that if inserts and updates to the same key are in the same transaction, the change
    // record for that transaction will only contain a record for the last update for that key.
    // Note that if an insert then a delete for a key happens in the same transaction, there will be
    // change records for that key.
    PAssert.that(tokens).containsInAnyOrder(// First batch of records ordered within key.
    "{\"SingerId\":\"0\"}\n" + "{\"FirstName\":\"Inserting mutation 0\",\"LastName\":null,\"SingerInfo\":null};" + "Deleted record;", "{\"SingerId\":\"1\"}\n" + "{\"FirstName\":\"Inserting mutation 1\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 1\"};" + "Deleted record;" + "{\"FirstName\":\"Inserting mutation 1\",\"LastName\":null,\"SingerInfo\":null};" + "Deleted record;", "{\"SingerId\":\"2\"}\n" + "{\"FirstName\":\"Inserting mutation 2\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 2\"};" + "Deleted record;", "{\"SingerId\":\"3\"}\n" + "{\"FirstName\":\"Inserting mutation 3\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 3\"};" + "Deleted record;", "{\"SingerId\":\"4\"}\n" + "{\"FirstName\":\"Inserting mutation 4\",\"LastName\":null,\"SingerInfo\":null};" + "Deleted record;", "{\"SingerId\":\"5\"}\n" + "{\"FirstName\":\"Updating mutation 5\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 5\"};" + "Deleted record;", // Second batch of records ordered within key.
    "{\"SingerId\":\"1\"}\n" + "{\"FirstName\":\"Inserting mutation 1\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 1\"};" + "Deleted record;" + "{\"FirstName\":\"Inserting mutation 1\",\"LastName\":null,\"SingerInfo\":null};" + "Deleted record;", "{\"SingerId\":\"2\"}\n" + "{\"FirstName\":\"Inserting mutation 2\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 2\"};" + "Deleted record;", "{\"SingerId\":\"3\"}\n" + "{\"FirstName\":\"Inserting mutation 3\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 3\"};" + "Deleted record;", "{\"SingerId\":\"4\"}\n" + "{\"FirstName\":\"Inserting mutation 4\",\"LastName\":null,\"SingerInfo\":null};" + "Deleted record;", "{\"SingerId\":\"5\"}\n" + "{\"FirstName\":\"Updating mutation 5\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 5\"};" + "Deleted record;", // Third batch of records ordered within key.
    "{\"SingerId\":\"1\"}\n" + "{\"FirstName\":\"Inserting mutation 1\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 1\"};" + "Deleted record;" + "{\"FirstName\":\"Inserting mutation 1\",\"LastName\":null,\"SingerInfo\":null};" + "Deleted record;", "{\"SingerId\":\"2\"}\n" + "{\"FirstName\":\"Inserting mutation 2\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 2\"};" + "Deleted record;", "{\"SingerId\":\"3\"}\n" + "{\"FirstName\":\"Inserting mutation 3\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 3\"};" + "Deleted record;", "{\"SingerId\":\"4\"}\n" + "{\"FirstName\":\"Inserting mutation 4\",\"LastName\":null,\"SingerInfo\":null};" + "Deleted record;", "{\"SingerId\":\"5\"}\n" + "{\"FirstName\":\"Updating mutation 5\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 5\"};" + "Deleted record;");
    pipeline.run().waitUntilFinish();
}
Also used : SpannerConfig(org.apache.beam.sdk.io.gcp.spanner.SpannerConfig) ArrayList(java.util.ArrayList) Mutation(com.google.cloud.spanner.Mutation) Test(org.junit.Test)

Example 4 with SpannerConfig

use of org.apache.beam.sdk.io.gcp.spanner.SpannerConfig in project beam by apache.

the class SpannerChangeStreamIT method testReadSpannerChangeStream.

@Test
public void testReadSpannerChangeStream() {
    // Defines how many rows are going to be inserted / updated / deleted in the test
    final int numRows = 5;
    // Inserts numRows rows and uses the first commit timestamp as the startAt for reading the
    // change stream
    final Pair<Timestamp, Timestamp> insertTimestamps = insertRows(numRows);
    final Timestamp startAt = insertTimestamps.getLeft();
    // Updates the created rows
    updateRows(numRows);
    // Delete the created rows and uses the last commit timestamp as the endAt for reading the
    // change stream
    final Pair<Timestamp, Timestamp> deleteTimestamps = deleteRows(numRows);
    final Timestamp endAt = deleteTimestamps.getRight();
    final SpannerConfig spannerConfig = SpannerConfig.create().withProjectId(projectId).withInstanceId(instanceId).withDatabaseId(databaseId);
    final PCollection<String> tokens = pipeline.apply(SpannerIO.readChangeStream().withSpannerConfig(spannerConfig).withChangeStreamName(changeStreamName).withMetadataDatabase(databaseId).withInclusiveStartAt(startAt).withInclusiveEndAt(endAt)).apply(ParDo.of(new ModsToString()));
    // Each row is composed by the following data
    // <mod type, singer id, old first name, old last name, new first name, new last name>
    PAssert.that(tokens).containsInAnyOrder("INSERT,1,null,null,First Name 1,Last Name 1", "INSERT,2,null,null,First Name 2,Last Name 2", "INSERT,3,null,null,First Name 3,Last Name 3", "INSERT,4,null,null,First Name 4,Last Name 4", "INSERT,5,null,null,First Name 5,Last Name 5", "UPDATE,1,First Name 1,Last Name 1,Updated First Name 1,Updated Last Name 1", "UPDATE,2,First Name 2,Last Name 2,Updated First Name 2,Updated Last Name 2", "UPDATE,3,First Name 3,Last Name 3,Updated First Name 3,Updated Last Name 3", "UPDATE,4,First Name 4,Last Name 4,Updated First Name 4,Updated Last Name 4", "UPDATE,5,First Name 5,Last Name 5,Updated First Name 5,Updated Last Name 5", "DELETE,1,Updated First Name 1,Updated Last Name 1,null,null", "DELETE,2,Updated First Name 2,Updated Last Name 2,null,null", "DELETE,3,Updated First Name 3,Updated Last Name 3,null,null", "DELETE,4,Updated First Name 4,Updated Last Name 4,null,null", "DELETE,5,Updated First Name 5,Updated Last Name 5,null,null");
    pipeline.run().waitUntilFinish();
}
Also used : SpannerConfig(org.apache.beam.sdk.io.gcp.spanner.SpannerConfig) Timestamp(com.google.cloud.Timestamp) Test(org.junit.Test)

Example 5 with SpannerConfig

use of org.apache.beam.sdk.io.gcp.spanner.SpannerConfig in project beam by apache.

the class SpannerChangeStreamOrderedWithinKeyIT method testOrderedWithinKey.

@Test
public void testOrderedWithinKey() {
    final SpannerConfig spannerConfig = SpannerConfig.create().withProjectId(projectId).withInstanceId(instanceId).withDatabaseId(databaseId);
    // Commit a initial transaction to get the timestamp to start reading from.
    List<Mutation> mutations = new ArrayList<>();
    mutations.add(insertRecordMutation(0));
    final com.google.cloud.Timestamp startTimestamp = databaseClient.write(mutations);
    // Get the timestamp of the last committed transaction to get the end timestamp.
    final com.google.cloud.Timestamp endTimestamp = writeTransactionsToDatabase();
    final PCollection<String> tokens = pipeline.apply(SpannerIO.readChangeStream().withSpannerConfig(spannerConfig).withChangeStreamName(changeStreamName).withMetadataDatabase(databaseId).withInclusiveStartAt(startTimestamp).withInclusiveEndAt(endTimestamp)).apply(ParDo.of(new BreakRecordByModFn())).apply(ParDo.of(new KeyByIdFn())).apply(ParDo.of(new KeyValueByCommitTimestampAndRecordSequenceFn<>())).apply(Window.into(FixedWindows.of(Duration.standardMinutes(2)))).apply(GroupByKey.create()).apply(ParDo.of(new ToStringFn()));
    // Assert that the returned PCollection contains one entry per key for the committed
    // transactions, and that each entry contains the mutations in commit timestamp order.
    // Note that if inserts and updates to the same key are in the same transaction, the change
    // record for that transaction will only contain a record for the last update for that key.
    // Note that if an insert then a delete for a key happens in the same transaction, there will be
    // change records for that key.
    PAssert.that(tokens).containsInAnyOrder("{\"SingerId\":\"0\"}\n" + "{\"FirstName\":\"Inserting mutation 0\",\"LastName\":null,\"SingerInfo\":null};" + "Deleted record;", "{\"SingerId\":\"1\"}\n" + "{\"FirstName\":\"Inserting mutation 1\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 1\"};" + "Deleted record;" + "{\"FirstName\":\"Inserting mutation 1\",\"LastName\":null,\"SingerInfo\":null};" + "Deleted record;", "{\"SingerId\":\"2\"}\n" + "{\"FirstName\":\"Inserting mutation 2\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 2\"};" + "Deleted record;", "{\"SingerId\":\"3\"}\n" + "{\"FirstName\":\"Inserting mutation 3\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 3\"};" + "Deleted record;", "{\"SingerId\":\"4\"}\n" + "{\"FirstName\":\"Inserting mutation 4\",\"LastName\":null,\"SingerInfo\":null};" + "Deleted record;", "{\"SingerId\":\"5\"}\n" + "{\"FirstName\":\"Updating mutation 5\",\"LastName\":null,\"SingerInfo\":null};" + "{\"FirstName\":\"Updating mutation 5\"};" + "Deleted record;");
    pipeline.run().waitUntilFinish();
}
Also used : SpannerConfig(org.apache.beam.sdk.io.gcp.spanner.SpannerConfig) ArrayList(java.util.ArrayList) Mutation(com.google.cloud.spanner.Mutation) Test(org.junit.Test)

Aggregations

SpannerConfig (org.apache.beam.sdk.io.gcp.spanner.SpannerConfig)5 Test (org.junit.Test)3 Mutation (com.google.cloud.spanner.Mutation)2 Struct (com.google.cloud.spanner.Struct)2 ArrayList (java.util.ArrayList)2 Pipeline (org.apache.beam.sdk.Pipeline)2 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)2 Timestamp (com.google.cloud.Timestamp)1 TextIO (org.apache.beam.sdk.io.TextIO)1 ReadOperation (org.apache.beam.sdk.io.gcp.spanner.ReadOperation)1 SpannerIO (org.apache.beam.sdk.io.gcp.spanner.SpannerIO)1 Transaction (org.apache.beam.sdk.io.gcp.spanner.Transaction)1 Description (org.apache.beam.sdk.options.Description)1 PipelineOptionsFactory (org.apache.beam.sdk.options.PipelineOptionsFactory)1 Validation (org.apache.beam.sdk.options.Validation)1 MapElements (org.apache.beam.sdk.transforms.MapElements)1 SerializableFunction (org.apache.beam.sdk.transforms.SerializableFunction)1 Sum (org.apache.beam.sdk.transforms.Sum)1 ToString (org.apache.beam.sdk.transforms.ToString)1 PCollection (org.apache.beam.sdk.values.PCollection)1