Search in sources :

Example 26 with FailsafeElement

use of com.google.cloud.teleport.v2.values.FailsafeElement in project DataflowTemplates by GoogleCloudPlatform.

the class SpannerStreamingWriteIntegrationTest method canUpdateExistingRecord.

@Test
public void canUpdateExistingRecord() throws Exception {
    JSONObject json1 = getChangeEventForTable1("1", "10", "INSERT", "1");
    JSONObject json2 = getChangeEventForTable1("1", "20", "UPDATE", "3");
    PCollection<FailsafeElement<String, String>> jsonRecords = testPipeline.apply(Create.of(Arrays.asList(FailsafeElement.of(json1.toString(), json1.toString()), FailsafeElement.of(json2.toString(), json2.toString()))).withCoder(FailsafeElementCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of())));
    constructAndRunPipeline(jsonRecords);
    verifyRecordCountinTable("Table1", 1);
    verifyDataInTable1(1, 20);
}
Also used : JSONObject(org.json.JSONObject) FailsafeElement(com.google.cloud.teleport.v2.values.FailsafeElement) Test(org.junit.Test) IntegrationTest(com.google.cloud.teleport.v2.spanner.IntegrationTest)

Example 27 with FailsafeElement

use of com.google.cloud.teleport.v2.values.FailsafeElement in project DataflowTemplates by GoogleCloudPlatform.

the class SpannerStreamingWriteIntegrationTest method constructAndRunPipeline.

private void constructAndRunPipeline(PCollection<FailsafeElement<String, String>> jsonRecords) {
    String shadowTablePrefix = "shadow";
    SpannerConfig sourceConfig = spannerServer.getSpannerConfig(testDb);
    PCollection<Ddl> ddl = testPipeline.apply("Process Information Schema", new ProcessInformationSchema(sourceConfig, true, shadowTablePrefix, "oracle"));
    PCollectionView<Ddl> ddlView = ddl.apply("Cloud Spanner DDL as view", View.asSingleton());
    jsonRecords.apply("Write events to Cloud Spanner", new SpannerTransactionWriter(sourceConfig, ddlView, shadowTablePrefix, "oracle"));
    PipelineResult testResult = testPipeline.run();
    testResult.waitUntilFinish();
}
Also used : SpannerConfig(org.apache.beam.sdk.io.gcp.spanner.SpannerConfig) ProcessInformationSchema(com.google.cloud.teleport.v2.templates.spanner.ProcessInformationSchema) PipelineResult(org.apache.beam.sdk.PipelineResult) Ddl(com.google.cloud.teleport.v2.templates.spanner.ddl.Ddl)

Example 28 with FailsafeElement

use of com.google.cloud.teleport.v2.values.FailsafeElement in project DataflowTemplates by GoogleCloudPlatform.

the class SpannerStreamingWriteIntegrationTest method canWriteInsertChangeEvents.

@Test
public void canWriteInsertChangeEvents() throws Exception {
    JSONObject json1 = getChangeEventForTable1("1", "334", "INSERT", "1");
    JSONObject json2 = getChangeEventForTable1("2", "32", "INSERT", "3");
    PCollection<FailsafeElement<String, String>> jsonRecords = testPipeline.apply(Create.of(Arrays.asList(FailsafeElement.of(json1.toString(), json1.toString()), FailsafeElement.of(json2.toString(), json2.toString()))).withCoder(FailsafeElementCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of())));
    constructAndRunPipeline(jsonRecords);
    verifyRecordCountinTable("Table1", 2);
    verifyDataInTable1(1, 334);
    verifyDataInTable1(2, 32);
}
Also used : JSONObject(org.json.JSONObject) FailsafeElement(com.google.cloud.teleport.v2.values.FailsafeElement) Test(org.junit.Test) IntegrationTest(com.google.cloud.teleport.v2.spanner.IntegrationTest)

Example 29 with FailsafeElement

use of com.google.cloud.teleport.v2.values.FailsafeElement in project DataflowTemplates by GoogleCloudPlatform.

the class SpannerStreamingWriteIntegrationTest method canWriteDisorderedAndInterleavedChangeEvents.

@Test
public void canWriteDisorderedAndInterleavedChangeEvents() throws Exception {
    JSONObject json1 = getChangeEventForTable1("1", "334", "INSERT", "1");
    JSONObject json2 = getChangeEvent("Table1_interleaved", "INSERT", "2");
    json2.put("id", "1");
    json2.put("id2", "1");
    json2.put("data2", "32");
    /* The order of event processing cannot be predicted or controlled in
     * Test pipelines. The order in the Arrays below does not mean the events
     * are processed in that order.
     * As long as atleast 1 change event for the interleaved table is after the
     * parent table, this test will be successful.
     * Hence change event for interleaved table is repeated multiple times.
     * This also mimics the retry behavior during interleaved tables handling.
     */
    PCollection<FailsafeElement<String, String>> jsonRecords = testPipeline.apply(Create.of(Arrays.asList(FailsafeElement.of(json2.toString(), json2.toString()), FailsafeElement.of(json2.toString(), json2.toString()), FailsafeElement.of(json1.toString(), json1.toString()), FailsafeElement.of(json2.toString(), json2.toString()), FailsafeElement.of(json2.toString(), json2.toString()), FailsafeElement.of(json2.toString(), json2.toString()), FailsafeElement.of(json2.toString(), json2.toString()), FailsafeElement.of(json2.toString(), json2.toString()))).withCoder(FailsafeElementCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of())));
    constructAndRunPipeline(jsonRecords);
    verifyRecordCountinTable("Table1", 1);
    verifyRecordCountinTable("Table1_interleaved", 1);
}
Also used : JSONObject(org.json.JSONObject) FailsafeElement(com.google.cloud.teleport.v2.values.FailsafeElement) Test(org.junit.Test) IntegrationTest(com.google.cloud.teleport.v2.spanner.IntegrationTest)

Example 30 with FailsafeElement

use of com.google.cloud.teleport.v2.values.FailsafeElement in project DataflowTemplates by GoogleCloudPlatform.

the class PubSubToElasticsearch method run.

/**
 * Runs the pipeline with the supplied options.
 *
 * @param options The execution parameters to the pipeline.
 * @return The result of the pipeline execution.
 */
public static PipelineResult run(PubSubToElasticsearchOptions options) {
    // Create the pipeline
    Pipeline pipeline = Pipeline.create(options);
    // Register the coders for pipeline
    CoderRegistry coderRegistry = pipeline.getCoderRegistry();
    coderRegistry.registerCoderForType(FAILSAFE_ELEMENT_CODER.getEncodedTypeDescriptor(), FAILSAFE_ELEMENT_CODER);
    coderRegistry.registerCoderForType(CODER.getEncodedTypeDescriptor(), CODER);
    /*
     * Steps: 1) Read PubSubMessage with attributes from input PubSub subscription.
     *        2) Apply Javascript UDF if provided.
     *        3) Index Json string to output ES index.
     *
     */
    LOG.info("Reading from subscription: " + options.getInputSubscription());
    PCollectionTuple convertedPubsubMessages = pipeline.apply("ReadPubSubSubscription", PubsubIO.readMessagesWithAttributes().fromSubscription(options.getInputSubscription())).apply("ConvertMessageToJsonDocument", PubSubMessageToJsonDocument.newBuilder().setJavascriptTextTransformFunctionName(options.getJavascriptTextTransformFunctionName()).setJavascriptTextTransformGcsPath(options.getJavascriptTextTransformGcsPath()).build());
    /*
     * Step #3a: Write Json documents into Elasticsearch using {@link ElasticsearchTransforms.WriteToElasticsearch}.
     */
    convertedPubsubMessages.get(TRANSFORM_OUT).apply("GetJsonDocuments", MapElements.into(TypeDescriptors.strings()).via(FailsafeElement::getPayload)).apply("Insert metadata", new ProcessEventMetadata()).apply("WriteToElasticsearch", WriteToElasticsearch.newBuilder().setOptions(options.as(PubSubToElasticsearchOptions.class)).build());
    /*
     * Step 3b: Write elements that failed processing to error output PubSub topic via {@link PubSubIO}.
     */
    convertedPubsubMessages.get(TRANSFORM_ERROROUTPUT_OUT).apply(ParDo.of(new FailedPubsubMessageToPubsubTopicFn())).apply("writeFailureMessages", PubsubIO.writeMessages().to(options.getErrorOutputTopic()));
    // Execute the pipeline and return the result.
    return pipeline.run();
}
Also used : FailedPubsubMessageToPubsubTopicFn(com.google.cloud.teleport.v2.elasticsearch.transforms.FailedPubsubMessageToPubsubTopicFn) CoderRegistry(org.apache.beam.sdk.coders.CoderRegistry) ProcessEventMetadata(com.google.cloud.teleport.v2.elasticsearch.transforms.ProcessEventMetadata) PCollectionTuple(org.apache.beam.sdk.values.PCollectionTuple) Pipeline(org.apache.beam.sdk.Pipeline) PubSubToElasticsearchOptions(com.google.cloud.teleport.v2.elasticsearch.options.PubSubToElasticsearchOptions)

Aggregations

FailsafeElement (com.google.cloud.teleport.v2.values.FailsafeElement)31 PCollectionTuple (org.apache.beam.sdk.values.PCollectionTuple)26 CoderRegistry (org.apache.beam.sdk.coders.CoderRegistry)21 Test (org.junit.Test)21 Pipeline (org.apache.beam.sdk.Pipeline)14 TableRow (com.google.api.services.bigquery.model.TableRow)8 PubsubMessage (org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage)6 DoFn (org.apache.beam.sdk.transforms.DoFn)6 PubSubToElasticsearchOptions (com.google.cloud.teleport.v2.elasticsearch.options.PubSubToElasticsearchOptions)5 IntegrationTest (com.google.cloud.teleport.v2.spanner.IntegrationTest)5 JSONObject (org.json.JSONObject)5 DeadLetterQueueManager (com.google.cloud.teleport.v2.cdc.dlq.DeadLetterQueueManager)4 StringDeadLetterQueueSanitizer (com.google.cloud.teleport.v2.cdc.dlq.StringDeadLetterQueueSanitizer)4 DataStreamIO (com.google.cloud.teleport.v2.cdc.sources.DataStreamIO)4 FailsafeElementCoder (com.google.cloud.teleport.v2.coders.FailsafeElementCoder)4 GCSToSplunk.flattenErrorsAndConvertToString (com.google.cloud.teleport.v2.templates.GCSToSplunk.flattenErrorsAndConvertToString)4 PipelineResult (org.apache.beam.sdk.PipelineResult)4 SpannerConfig (org.apache.beam.sdk.io.gcp.spanner.SpannerConfig)4 KV (org.apache.beam.sdk.values.KV)4 GCSToElasticsearchOptions (com.google.cloud.teleport.v2.elasticsearch.options.GCSToElasticsearchOptions)3