Search in sources :

Example 1 with BatchPartialColumnDlpTable

use of com.google.cloud.solutions.autotokenize.dlp.PartialBatchAccumulator.BatchPartialColumnDlpTable in project auto-data-tokenize by GoogleCloudPlatform.

the class PartialBatchAccumulatorTest method batch_arrayFields_deidConfigContainsOnlyFieldReference.

@Test
public void batch_arrayFields_deidConfigContainsOnlyFieldReference() {
    PartialBatchAccumulator accumulator = PartialBatchAccumulator.withConfig(DlpEncryptConfig.newBuilder().addTransforms(ColumnTransform.newBuilder().setColumnId("$.multi_level_arrays.simple_field1").setTransform(CRYPTO_UNWRAPPED_TRANSFORM)).addTransforms(ColumnTransform.newBuilder().setColumnId("$.multi_level_arrays.level1_array.level1_array_record.level2_simple_field").setTransform(CRYPTO_UNWRAPPED_TRANSFORM)).addTransforms(ColumnTransform.newBuilder().setColumnId("$.multi_level_arrays.level1_array.level1_array_record.level2_array").setTransform(CRYPTO_UNWRAPPED_TRANSFORM)).build());
    FlatRecord record = RecordFlattener.forGenericRecord().flatten(TestResourceLoader.classPath().forAvro().withSchemaFile("avro_records/records_with_two_levels_of_arrays/two_level_arrays_schema.avsc").loadRecord("avro_records/records_with_two_levels_of_arrays/simple_two_level_array_record.json"));
    accumulator.addElement(record.toBuilder().setRecordId(UUID.randomUUID().toString()).build());
    BatchPartialColumnDlpTable batch = accumulator.makeBatch();
    ImmutableList<FieldId> deidConfigTokenizeFields = batch.get().getDeidentifyConfig().getRecordTransformations().getFieldTransformationsList().stream().map(FieldTransformation::getFieldsList).flatMap(List::stream).collect(toImmutableList());
    assertThat(deidConfigTokenizeFields).containsExactlyElementsIn(DeidentifyColumns.fieldIdsFor(ImmutableList.of("$.simple_field1", "$.level1_array.[\"level1_array_record\"].level2_simple_field.string", "$.level1_array.[\"level1_array_record\"].level2_array.string")));
}
Also used : BatchPartialColumnDlpTable(com.google.cloud.solutions.autotokenize.dlp.PartialBatchAccumulator.BatchPartialColumnDlpTable) FieldId(com.google.privacy.dlp.v2.FieldId) FlatRecord(com.google.cloud.solutions.autotokenize.AutoTokenizeMessages.FlatRecord) FieldTransformation(com.google.privacy.dlp.v2.FieldTransformation) Test(org.junit.Test)

Example 2 with BatchPartialColumnDlpTable

use of com.google.cloud.solutions.autotokenize.dlp.PartialBatchAccumulator.BatchPartialColumnDlpTable in project auto-data-tokenize by GoogleCloudPlatform.

the class PartialBatchAccumulatorTest method batch_nullableUnionField_valid.

@Test
public void batch_nullableUnionField_valid() {
    PartialBatchAccumulator accumulator = PartialBatchAccumulator.withConfig(TestResourceLoader.classPath().forProto(DlpEncryptConfig.class).loadJson("email_cc_dlp_encrypt_config.json"));
    var flatRecords = TestResourceLoader.classPath().forAvro().withSchemaFile("avro_records/userdata_records/schema.json").loadAllRecords("avro_records/userdata_records/record-2.json", "avro_records/userdata_records/record-3-cc-null.json").stream().map(RecordFlattener.forGenericRecord()::flatten).map(record -> record.toBuilder().setRecordId(UUID.randomUUID().toString()).build()).collect(toImmutableList());
    accumulator.addAllElements(flatRecords);
    BatchPartialColumnDlpTable batch = accumulator.makeBatch();
    FieldIdMatchesTokenizeColumns.withRecordIdColumn("__AUTOTOKENIZE__RECORD_ID__").assertExpectedHeadersOnly(batch.get().getTable().getHeadersList()).contains(TokenizingColPatternChecker.of("$.email", "$.cc"));
}
Also used : CryptoKey(com.google.privacy.dlp.v2.CryptoKey) Assert.assertThrows(org.junit.Assert.assertThrows) RunWith(org.junit.runner.RunWith) Random(java.util.Random) PrimitiveTransformation(com.google.privacy.dlp.v2.PrimitiveTransformation) DlpEncryptConfig(com.google.cloud.solutions.autotokenize.AutoTokenizeMessages.DlpEncryptConfig) ImmutableList(com.google.common.collect.ImmutableList) CryptoDeterministicConfig(com.google.privacy.dlp.v2.CryptoDeterministicConfig) UnwrappedCryptoKey(com.google.privacy.dlp.v2.UnwrappedCryptoKey) FieldId(com.google.privacy.dlp.v2.FieldId) RecordFlattener(com.google.cloud.solutions.autotokenize.common.RecordFlattener) TestResourceLoader(com.google.cloud.solutions.autotokenize.testing.TestResourceLoader) Value(com.google.privacy.dlp.v2.Value) TokenizingColPatternChecker(com.google.cloud.solutions.autotokenize.testing.TokenizingColPatternChecker) ImmutableList.toImmutableList(com.google.common.collect.ImmutableList.toImmutableList) FieldTransformation(com.google.privacy.dlp.v2.FieldTransformation) Test(org.junit.Test) BatchPartialColumnDlpTable(com.google.cloud.solutions.autotokenize.dlp.PartialBatchAccumulator.BatchPartialColumnDlpTable) UUID(java.util.UUID) JUnit4(org.junit.runners.JUnit4) Truth.assertThat(com.google.common.truth.Truth.assertThat) StandardCharsets(java.nio.charset.StandardCharsets) ByteString(com.google.protobuf.ByteString) Base64(java.util.Base64) List(java.util.List) FieldIdMatchesTokenizeColumns(com.google.cloud.solutions.autotokenize.testing.FieldIdMatchesTokenizeColumns) FlatRecord(com.google.cloud.solutions.autotokenize.AutoTokenizeMessages.FlatRecord) DeidentifyColumns(com.google.cloud.solutions.autotokenize.common.DeidentifyColumns) ColumnTransform(com.google.cloud.solutions.autotokenize.AutoTokenizeMessages.ColumnTransform) BatchPartialColumnDlpTable(com.google.cloud.solutions.autotokenize.dlp.PartialBatchAccumulator.BatchPartialColumnDlpTable) Test(org.junit.Test)

Example 3 with BatchPartialColumnDlpTable

use of com.google.cloud.solutions.autotokenize.dlp.PartialBatchAccumulator.BatchPartialColumnDlpTable in project auto-data-tokenize by GoogleCloudPlatform.

the class PartialBatchAccumulatorTest method batch_arrayFields_itemTableContainsFlattenedEntries.

@Test
public void batch_arrayFields_itemTableContainsFlattenedEntries() {
    PartialBatchAccumulator accumulator = PartialBatchAccumulator.withConfig(DlpEncryptConfig.newBuilder().addTransforms(ColumnTransform.newBuilder().setColumnId("$.multi_level_arrays.simple_field1").setTransform(CRYPTO_UNWRAPPED_TRANSFORM)).addTransforms(ColumnTransform.newBuilder().setColumnId("$.multi_level_arrays.level1_array.level1_array_record.level2_simple_field").setTransform(CRYPTO_UNWRAPPED_TRANSFORM)).addTransforms(ColumnTransform.newBuilder().setColumnId("$.multi_level_arrays.level1_array.level1_array_record.level2_array").setTransform(CRYPTO_UNWRAPPED_TRANSFORM)).build());
    FlatRecord record = RecordFlattener.forGenericRecord().flatten(TestResourceLoader.classPath().forAvro().withSchemaFile("avro_records/records_with_two_levels_of_arrays/two_level_arrays_schema.avsc").loadRecord("avro_records/records_with_two_levels_of_arrays/simple_two_level_array_record.json"));
    accumulator.addElement(record.toBuilder().setRecordId(UUID.randomUUID().toString()).build());
    BatchPartialColumnDlpTable batch = accumulator.makeBatch();
    assertThat(batch.get().getTable().getHeadersList()).containsExactlyElementsIn(DeidentifyColumns.fieldIdsFor(ImmutableList.of("__AUTOTOKENIZE__RECORD_ID__", "$.simple_field1", "$.level1_array[0].[\"level1_array_record\"].level2_simple_field.string", "$.level1_array[1].[\"level1_array_record\"].level2_array[1].string", "$.level1_array[0].[\"level1_array_record\"].level2_array[0].string", "$.level1_array[0].[\"level1_array_record\"].level2_array[1].string", "$.level1_array[1].[\"level1_array_record\"].level2_simple_field.string", "$.level1_array[1].[\"level1_array_record\"].level2_array[0].string")));
}
Also used : BatchPartialColumnDlpTable(com.google.cloud.solutions.autotokenize.dlp.PartialBatchAccumulator.BatchPartialColumnDlpTable) FlatRecord(com.google.cloud.solutions.autotokenize.AutoTokenizeMessages.FlatRecord) Test(org.junit.Test)

Aggregations

FlatRecord (com.google.cloud.solutions.autotokenize.AutoTokenizeMessages.FlatRecord)3 BatchPartialColumnDlpTable (com.google.cloud.solutions.autotokenize.dlp.PartialBatchAccumulator.BatchPartialColumnDlpTable)3 Test (org.junit.Test)3 FieldId (com.google.privacy.dlp.v2.FieldId)2 FieldTransformation (com.google.privacy.dlp.v2.FieldTransformation)2 ColumnTransform (com.google.cloud.solutions.autotokenize.AutoTokenizeMessages.ColumnTransform)1 DlpEncryptConfig (com.google.cloud.solutions.autotokenize.AutoTokenizeMessages.DlpEncryptConfig)1 DeidentifyColumns (com.google.cloud.solutions.autotokenize.common.DeidentifyColumns)1 RecordFlattener (com.google.cloud.solutions.autotokenize.common.RecordFlattener)1 FieldIdMatchesTokenizeColumns (com.google.cloud.solutions.autotokenize.testing.FieldIdMatchesTokenizeColumns)1 TestResourceLoader (com.google.cloud.solutions.autotokenize.testing.TestResourceLoader)1 TokenizingColPatternChecker (com.google.cloud.solutions.autotokenize.testing.TokenizingColPatternChecker)1 ImmutableList (com.google.common.collect.ImmutableList)1 ImmutableList.toImmutableList (com.google.common.collect.ImmutableList.toImmutableList)1 Truth.assertThat (com.google.common.truth.Truth.assertThat)1 CryptoDeterministicConfig (com.google.privacy.dlp.v2.CryptoDeterministicConfig)1 CryptoKey (com.google.privacy.dlp.v2.CryptoKey)1 PrimitiveTransformation (com.google.privacy.dlp.v2.PrimitiveTransformation)1 UnwrappedCryptoKey (com.google.privacy.dlp.v2.UnwrappedCryptoKey)1 Value (com.google.privacy.dlp.v2.Value)1