Search in sources :

Example 6 with Filter

use of com.google.cloud.teleport.v2.utils.BigQueryMetadataLoader.Filter in project DataflowTemplates by GoogleCloudPlatform.

the class DataplexBigQueryToGcsFilterTest method test_whenPartitionedTableHasNoPartitions_filterExcludesTable.

@Test
public void test_whenPartitionedTableHasNoPartitions_filterExcludesTable() {
    options.setTables(null);
    options.setExportDataModifiedBeforeDateTime(null);
    Filter f = new DataplexBigQueryToGcsFilter(options, new ArrayList<String>());
    assertThat(f.shouldSkipPartitionedTable(table(), Collections.emptyList())).isTrue();
}
Also used : Filter(com.google.cloud.teleport.v2.utils.BigQueryMetadataLoader.Filter) Test(org.junit.Test)

Example 7 with Filter

use of com.google.cloud.teleport.v2.utils.BigQueryMetadataLoader.Filter in project DataflowTemplates by GoogleCloudPlatform.

the class DataplexBigQueryToGcsFilterTest method test_whenNoFilterOptions_filterAcceptsAllTablesAndPartitions.

@Test
public void test_whenNoFilterOptions_filterAcceptsAllTablesAndPartitions() {
    BigQueryTable.Builder t = table();
    BigQueryTablePartition p = partition().build();
    options.setTables(null);
    options.setExportDataModifiedBeforeDateTime(null);
    Filter f = new DataplexBigQueryToGcsFilter(options, new ArrayList<String>());
    assertThat(f.shouldSkipUnpartitionedTable(t)).isFalse();
    assertThat(f.shouldSkipPartitionedTable(t, Collections.singletonList(p))).isFalse();
    assertThat(f.shouldSkipPartition(t, p)).isFalse();
}
Also used : BigQueryTablePartition(com.google.cloud.teleport.v2.values.BigQueryTablePartition) Filter(com.google.cloud.teleport.v2.utils.BigQueryMetadataLoader.Filter) BigQueryTable(com.google.cloud.teleport.v2.values.BigQueryTable) Test(org.junit.Test)

Example 8 with Filter

use of com.google.cloud.teleport.v2.utils.BigQueryMetadataLoader.Filter in project DataflowTemplates by GoogleCloudPlatform.

the class DataplexJdbcIngestion method applyPartitionedWriteDispositionFilter.

private static PCollection<GenericRecord> applyPartitionedWriteDispositionFilter(PCollection<GenericRecord> genericRecords, DataplexJdbcIngestionOptions options, String targetRootPath, org.apache.avro.Schema avroSchema, List<String> existingFiles) {
    PCollectionTuple filteredRecordsTuple = genericRecords.apply("Filter pre-existing records", new DataplexJdbcIngestionFilter(targetRootPath, Schemas.serialize(avroSchema), options.getParitionColumn(), options.getPartitioningScheme(), options.getFileFormat().getFileSuffix(), options.getWriteDisposition(), existingFiles, FILTERED_RECORDS_OUT, EXISTING_TARGET_FILES_OUT));
    filteredRecordsTuple.get(EXISTING_TARGET_FILES_OUT).apply(Distinct.create()).apply("Log existing target file names", ParDo.of(// PCollection will be empty.
    new DoFn<String, String>() {

        @ProcessElement
        public void processElement(ProcessContext c) {
            String filename = c.element();
            LOG.info("Target File {} already exists in the output asset bucket {}. Performing " + " {} writeDisposition strategy.", filename, targetRootPath, options.getWriteDisposition());
        }
    }));
    return filteredRecordsTuple.get(FILTERED_RECORDS_OUT);
}
Also used : DataplexJdbcIngestionFilter(com.google.cloud.teleport.v2.utils.DataplexJdbcIngestionFilter) DoFn(org.apache.beam.sdk.transforms.DoFn) PCollectionTuple(org.apache.beam.sdk.values.PCollectionTuple)

Example 9 with Filter

use of com.google.cloud.teleport.v2.utils.BigQueryMetadataLoader.Filter in project DataflowTemplates by GoogleCloudPlatform.

the class DataplexBigQueryToGcsFilterTest method test_whenBeforeDateHasNoTime_dateParsedCorrectly.

@Test
public void test_whenBeforeDateHasNoTime_dateParsedCorrectly() {
    // 2021-02-15 in the DEFAULT time zone:
    long micros = Instant.parse("2021-02-15T00:00:00").getMillis() * 1000L;
    BigQueryTable.Builder newerTable = table().setLastModificationTime(micros - 1000L);
    BigQueryTable.Builder olderTable = table().setLastModificationTime(micros + 1000L);
    options.setTables(null);
    options.setExportDataModifiedBeforeDateTime("2021-02-15");
    Filter f = new DataplexBigQueryToGcsFilter(options, new ArrayList<String>());
    assertThat(f.shouldSkipUnpartitionedTable(olderTable)).isTrue();
    assertThat(f.shouldSkipUnpartitionedTable(newerTable)).isFalse();
}
Also used : Filter(com.google.cloud.teleport.v2.utils.BigQueryMetadataLoader.Filter) BigQueryTable(com.google.cloud.teleport.v2.values.BigQueryTable) Test(org.junit.Test)

Example 10 with Filter

use of com.google.cloud.teleport.v2.utils.BigQueryMetadataLoader.Filter in project DataflowTemplates by GoogleCloudPlatform.

the class DataplexBigQueryToGcsFilterTest method test_whenTargetFileExistsWithWriteDispositionOverwrite_filterAcceptsTables.

@Test
public void test_whenTargetFileExistsWithWriteDispositionOverwrite_filterAcceptsTables() {
    BigQueryTable.Builder t = table().setTableName("table1").setPartitioningColumn("p2");
    BigQueryTablePartition p = partition().setPartitionName("partition1").build();
    options.setTables(null);
    options.setExportDataModifiedBeforeDateTime(null);
    options.setFileFormat(FileFormatOptions.AVRO);
    options.setWriteDisposition(WriteDispositionOptions.OVERWRITE);
    Filter f = new DataplexBigQueryToGcsFilter(options, Arrays.asList("table1/output-table1.avro", "table1/p2=partition1/output-table1-partition1.avro"));
    assertThat(f.shouldSkipUnpartitionedTable(t)).isFalse();
    assertThat(f.shouldSkipPartition(t, p)).isFalse();
}
Also used : BigQueryTablePartition(com.google.cloud.teleport.v2.values.BigQueryTablePartition) Filter(com.google.cloud.teleport.v2.utils.BigQueryMetadataLoader.Filter) BigQueryTable(com.google.cloud.teleport.v2.values.BigQueryTable) Test(org.junit.Test)

Aggregations

Filter (com.google.cloud.teleport.v2.utils.BigQueryMetadataLoader.Filter)10 BigQueryTable (com.google.cloud.teleport.v2.values.BigQueryTable)10 Test (org.junit.Test)10 BigQueryTablePartition (com.google.cloud.teleport.v2.values.BigQueryTablePartition)8 TableResult (com.google.cloud.bigquery.TableResult)2 DataStreamIO (com.google.cloud.teleport.v2.cdc.sources.DataStreamIO)2 CdcJdbcIO (com.google.cloud.teleport.v2.io.CdcJdbcIO)2 DmlInfo (com.google.cloud.teleport.v2.values.DmlInfo)2 FailsafeElement (com.google.cloud.teleport.v2.values.FailsafeElement)2 ArrayList (java.util.ArrayList)2 List (java.util.List)2 Collectors (java.util.stream.Collectors)2 Pipeline (org.apache.beam.sdk.Pipeline)2 GoogleCloudDataplexV1Asset (com.google.api.services.dataplex.v1.model.GoogleCloudDataplexV1Asset)1 GoogleCloudDataplexV1Entity (com.google.api.services.dataplex.v1.model.GoogleCloudDataplexV1Entity)1 GoogleCloudDataplexV1Partition (com.google.api.services.dataplex.v1.model.GoogleCloudDataplexV1Partition)1 TableReadOptions (com.google.cloud.bigquery.storage.v1beta1.ReadOptions.TableReadOptions)1 ReadSession (com.google.cloud.bigquery.storage.v1beta1.Storage.ReadSession)1 AvroSinkWithJodaDatesConversion (com.google.cloud.teleport.v2.io.AvroSinkWithJodaDatesConversion)1 DatastreamConstants (com.google.cloud.teleport.v2.templates.datastream.DatastreamConstants)1