Search in sources :

Example 1 with DataplexJdbcIngestionFilter

use of com.google.cloud.teleport.v2.utils.DataplexJdbcIngestionFilter in project DataflowTemplates by GoogleCloudPlatform.

the class DataplexJdbcIngestion method applyPartitionedWriteDispositionFilter.

private static PCollection<GenericRecord> applyPartitionedWriteDispositionFilter(PCollection<GenericRecord> genericRecords, DataplexJdbcIngestionOptions options, String targetRootPath, org.apache.avro.Schema avroSchema, List<String> existingFiles) {
    PCollectionTuple filteredRecordsTuple = genericRecords.apply("Filter pre-existing records", new DataplexJdbcIngestionFilter(targetRootPath, Schemas.serialize(avroSchema), options.getParitionColumn(), options.getPartitioningScheme(), options.getFileFormat().getFileSuffix(), options.getWriteDisposition(), existingFiles, FILTERED_RECORDS_OUT, EXISTING_TARGET_FILES_OUT));
    filteredRecordsTuple.get(EXISTING_TARGET_FILES_OUT).apply(Distinct.create()).apply("Log existing target file names", ParDo.of(// PCollection will be empty.
    new DoFn<String, String>() {

        @ProcessElement
        public void processElement(ProcessContext c) {
            String filename = c.element();
            LOG.info("Target File {} already exists in the output asset bucket {}. Performing " + " {} writeDisposition strategy.", filename, targetRootPath, options.getWriteDisposition());
        }
    }));
    return filteredRecordsTuple.get(FILTERED_RECORDS_OUT);
}
Also used : DataplexJdbcIngestionFilter(com.google.cloud.teleport.v2.utils.DataplexJdbcIngestionFilter) DoFn(org.apache.beam.sdk.transforms.DoFn) PCollectionTuple(org.apache.beam.sdk.values.PCollectionTuple)

Example 2 with DataplexJdbcIngestionFilter

use of com.google.cloud.teleport.v2.utils.DataplexJdbcIngestionFilter in project DataflowTemplates by GoogleCloudPlatform.

the class DataplexJdbcIngestionFilterTest method testFailIfTargetFileExists.

@Test
public void testFailIfTargetFileExists() {
    String targetRootPath = temporaryFolder.getRoot().getAbsolutePath();
    PCollectionTuple result = mainPipeline.apply(Create.<GenericRecord>of(record11, record12, record21).withCoder(AvroCoder.of(SCHEMA))).apply(new DataplexJdbcIngestionFilter(targetRootPath, SERIALIZED_SCHEMA, PARTITION_COLUMN_NAME, PartitioningSchema.MONTHLY, FileFormatOptions.AVRO.getFileSuffix(), WriteDispositionOptions.WRITE_EMPTY, StorageUtils.getFilesInDirectory(targetRootPath), FILTERED_RECORDS_OUT, EXISTING_TARGET_FILES_OUT));
    try {
        mainPipeline.run();
        fail("Expected a WriteDispositionException.");
    } catch (Exception e) {
        assertThat(e).hasCauseThat().isInstanceOf(WriteDispositionException.class);
    }
}
Also used : PCollectionTuple(org.apache.beam.sdk.values.PCollectionTuple) WriteDispositionException(com.google.cloud.teleport.v2.utils.JdbcIngestionWriteDisposition.WriteDispositionException) WriteDispositionException(com.google.cloud.teleport.v2.utils.JdbcIngestionWriteDisposition.WriteDispositionException) IOException(java.io.IOException) Test(org.junit.Test)

Aggregations

PCollectionTuple (org.apache.beam.sdk.values.PCollectionTuple)2 DataplexJdbcIngestionFilter (com.google.cloud.teleport.v2.utils.DataplexJdbcIngestionFilter)1 WriteDispositionException (com.google.cloud.teleport.v2.utils.JdbcIngestionWriteDisposition.WriteDispositionException)1 IOException (java.io.IOException)1 DoFn (org.apache.beam.sdk.transforms.DoFn)1 Test (org.junit.Test)1