Search in sources :

Example 1 with TableScanLimitsConfig

use of com.google.cloud.pso.bq_pii_classifier.entities.TableScanLimitsConfig in project bq-pii-classifier by GoogleCloudPlatform.

the class Inspector method execute.

public DlpJob execute(Operation request, String trackingId, String pubSubMessageId) throws IOException, NonRetryableApplicationException {
    logger.logFunctionStart(trackingId);
    logger.logInfoWithTracker(trackingId, String.format("Request : %s", request.toString()));
    /**
     *  Check if we already processed this pubSubMessageId before to avoid submitting BQ queries
     *  in case we have unexpected errors with PubSub re-sending the message. This is an extra measure to avoid unnecessary cost.
     *  We do that by keeping simple flag files in GCS with the pubSubMessageId as file name.
     */
    String flagFileName = String.format("%s/%s", persistentSetObjectPrefix, pubSubMessageId);
    if (persistentSet.contains(flagFileName)) {
        // log error and ACK and return
        String msg = String.format("PubSub message ID '%s' has been processed before by %s. The message should be ACK to PubSub to stop retries. Please investigate further why the message was retried in the first place.", pubSubMessageId, this.getClass().getSimpleName());
        throw new NonRetryableApplicationException(msg);
    }
    // get Table Scan Limits config and Table size
    TableScanLimitsConfig tableScanLimitsConfig = new TableScanLimitsConfig(config.getTableScanLimitsJsonConfig());
    logger.logInfoWithTracker(trackingId, String.format("TableScanLimitsConfig is %s", tableScanLimitsConfig.toString()));
    // DLP job config accepts Integer only for table scan limit. Must downcast
    // NumRows from BigInteger to Integer
    TableSpec targetTableSpec = TableSpec.fromSqlString(request.getEntityKey());
    Integer tableNumRows = bqService.getTableNumRows(targetTableSpec).intValue();
    InspectJobConfig jobConfig = createJob(targetTableSpec, tableScanLimitsConfig, tableNumRows, config);
    CreateDlpJobRequest createDlpJobRequest = CreateDlpJobRequest.newBuilder().setJobId(// Letters, numbers, hyphens, and underscores allowed.
    trackingId).setParent(LocationName.of(config.getProjectId(), config.getRegionId()).toString()).setInspectJob(jobConfig).build();
    DlpJob submittedDlpJob = dlpService.submitJob(createDlpJobRequest);
    logger.logInfoWithTracker(trackingId, String.format("DLP job created successfully id='%s'", submittedDlpJob.getName()));
    // Add a flag key marking that we already completed this request and no additional runs
    // are required in case PubSub is in a loop of retrying due to ACK timeout while the service has already processed the request
    // This is an extra measure to avoid unnecessary cost due to config issues.
    logger.logInfoWithTracker(trackingId, String.format("Persisting processing key for PubSub message ID %s", pubSubMessageId));
    persistentSet.add(flagFileName);
    logger.logFunctionEnd(trackingId);
    return submittedDlpJob;
}
Also used : TableSpec(com.google.cloud.pso.bq_pii_classifier.entities.TableSpec) TableScanLimitsConfig(com.google.cloud.pso.bq_pii_classifier.entities.TableScanLimitsConfig) NonRetryableApplicationException(com.google.cloud.pso.bq_pii_classifier.entities.NonRetryableApplicationException) DlpJob(com.google.privacy.dlp.v2.DlpJob) CreateDlpJobRequest(com.google.privacy.dlp.v2.CreateDlpJobRequest) InspectJobConfig(com.google.privacy.dlp.v2.InspectJobConfig)

Aggregations

NonRetryableApplicationException (com.google.cloud.pso.bq_pii_classifier.entities.NonRetryableApplicationException)1 TableScanLimitsConfig (com.google.cloud.pso.bq_pii_classifier.entities.TableScanLimitsConfig)1 TableSpec (com.google.cloud.pso.bq_pii_classifier.entities.TableSpec)1 CreateDlpJobRequest (com.google.privacy.dlp.v2.CreateDlpJobRequest)1 DlpJob (com.google.privacy.dlp.v2.DlpJob)1 InspectJobConfig (com.google.privacy.dlp.v2.InspectJobConfig)1