Search in sources :

Example 6 with NonRetryableApplicationException

use of com.google.cloud.pso.bq_pii_classifier.entities.NonRetryableApplicationException in project bq-pii-classifier by GoogleCloudPlatform.

the class StandardDlpResultsScannerImpl method listParents.

@Override
public // return: List("project.dataset")
List<String> listParents(String project) throws NonRetryableApplicationException, InterruptedException {
    String queryTemplate = "SELECT \n" + "DISTINCT \n" + "CONCAT(l.record_location.record_key.big_query_key.table_reference.project_id, '.', l.record_location.record_key.big_query_key.table_reference.dataset_id) AS dataset \n" + "FROM `%s.%s.%s` , UNNEST(location.content_locations) l\n" + "WHERE l.record_location.record_key.big_query_key.table_reference.project_id = '%s'\n";
    String formattedQuery = String.format(queryTemplate, hostProject, hostDataset, dlpFindingsTable, project);
    // Create a job ID so that we can safely retry.
    Job queryJob = bqService.submitJob(formattedQuery);
    TableResult result = bqService.waitAndGetJobResults(queryJob);
    List<String> projectDatasets = new ArrayList<>();
    // Construct a mapping between field names and DLP infotypes
    for (FieldValueList row : result.iterateAll()) {
        if (row.get("dataset").isNull()) {
            throw new NonRetryableApplicationException("processProjects query returned rows with null 'dataset' field.");
        }
        String datasetSpec = row.get("dataset").getStringValue();
        projectDatasets.add(datasetSpec);
    }
    return projectDatasets;
}
Also used : TableResult(com.google.cloud.bigquery.TableResult) ArrayList(java.util.ArrayList) NonRetryableApplicationException(com.google.cloud.pso.bq_pii_classifier.entities.NonRetryableApplicationException) FieldValueList(com.google.cloud.bigquery.FieldValueList) Job(com.google.cloud.bigquery.Job)

Example 7 with NonRetryableApplicationException

use of com.google.cloud.pso.bq_pii_classifier.entities.NonRetryableApplicationException in project bq-pii-classifier by GoogleCloudPlatform.

the class Inspector method execute.

public DlpJob execute(Operation request, String trackingId, String pubSubMessageId) throws IOException, NonRetryableApplicationException {
    logger.logFunctionStart(trackingId);
    logger.logInfoWithTracker(trackingId, String.format("Request : %s", request.toString()));
    /**
     *  Check if we already processed this pubSubMessageId before to avoid submitting BQ queries
     *  in case we have unexpected errors with PubSub re-sending the message. This is an extra measure to avoid unnecessary cost.
     *  We do that by keeping simple flag files in GCS with the pubSubMessageId as file name.
     */
    String flagFileName = String.format("%s/%s", persistentSetObjectPrefix, pubSubMessageId);
    if (persistentSet.contains(flagFileName)) {
        // log error and ACK and return
        String msg = String.format("PubSub message ID '%s' has been processed before by %s. The message should be ACK to PubSub to stop retries. Please investigate further why the message was retried in the first place.", pubSubMessageId, this.getClass().getSimpleName());
        throw new NonRetryableApplicationException(msg);
    }
    // get Table Scan Limits config and Table size
    TableScanLimitsConfig tableScanLimitsConfig = new TableScanLimitsConfig(config.getTableScanLimitsJsonConfig());
    logger.logInfoWithTracker(trackingId, String.format("TableScanLimitsConfig is %s", tableScanLimitsConfig.toString()));
    // DLP job config accepts Integer only for table scan limit. Must downcast
    // NumRows from BigInteger to Integer
    TableSpec targetTableSpec = TableSpec.fromSqlString(request.getEntityKey());
    Integer tableNumRows = bqService.getTableNumRows(targetTableSpec).intValue();
    InspectJobConfig jobConfig = createJob(targetTableSpec, tableScanLimitsConfig, tableNumRows, config);
    CreateDlpJobRequest createDlpJobRequest = CreateDlpJobRequest.newBuilder().setJobId(// Letters, numbers, hyphens, and underscores allowed.
    trackingId).setParent(LocationName.of(config.getProjectId(), config.getRegionId()).toString()).setInspectJob(jobConfig).build();
    DlpJob submittedDlpJob = dlpService.submitJob(createDlpJobRequest);
    logger.logInfoWithTracker(trackingId, String.format("DLP job created successfully id='%s'", submittedDlpJob.getName()));
    // Add a flag key marking that we already completed this request and no additional runs
    // are required in case PubSub is in a loop of retrying due to ACK timeout while the service has already processed the request
    // This is an extra measure to avoid unnecessary cost due to config issues.
    logger.logInfoWithTracker(trackingId, String.format("Persisting processing key for PubSub message ID %s", pubSubMessageId));
    persistentSet.add(flagFileName);
    logger.logFunctionEnd(trackingId);
    return submittedDlpJob;
}
Also used : TableSpec(com.google.cloud.pso.bq_pii_classifier.entities.TableSpec) TableScanLimitsConfig(com.google.cloud.pso.bq_pii_classifier.entities.TableScanLimitsConfig) NonRetryableApplicationException(com.google.cloud.pso.bq_pii_classifier.entities.NonRetryableApplicationException) DlpJob(com.google.privacy.dlp.v2.DlpJob) CreateDlpJobRequest(com.google.privacy.dlp.v2.CreateDlpJobRequest) InspectJobConfig(com.google.privacy.dlp.v2.InspectJobConfig)

Example 8 with NonRetryableApplicationException

use of com.google.cloud.pso.bq_pii_classifier.entities.NonRetryableApplicationException in project bq-pii-classifier by GoogleCloudPlatform.

the class TaggingDispatcherController method receiveMessage.

@RequestMapping(value = "/", method = RequestMethod.POST)
public ResponseEntity receiveMessage(@RequestBody PubSubEvent requestBody) {
    String runId = TrackingHelper.generateTaggingRunId();
    String state = "";
    try {
        if (requestBody == null || requestBody.getMessage() == null) {
            String msg = "Bad Request: invalid message format";
            logger.logSevereWithTracker(runId, msg);
            throw new NonRetryableApplicationException("Request body or message is Null.");
        }
        String requestJsonString = requestBody.getMessage().dataToUtf8String();
        // remove any escape characters (e.g. from Terraform
        requestJsonString = requestJsonString.replace("\\", "");
        logger.logInfoWithTracker(runId, String.format("Received payload: %s", requestJsonString));
        BigQueryScope bqScope = gson.fromJson(requestJsonString, BigQueryScope.class);
        logger.logInfoWithTracker(runId, String.format("Parsed JSON input %s ", bqScope.toString()));
        Scanner dlpResultsScanner;
        if (environment.getIsAutoDlpMode()) {
            dlpResultsScanner = new AutoDlpResultsScannerImpl(environment.getProjectId(), environment.getSolutionDataset(), environment.getDlpTableAuto(), new BigQueryServiceImpl());
        } else {
            dlpResultsScanner = new StandardDlpResultsScannerImpl(environment.getProjectId(), environment.getSolutionDataset(), environment.getDlpTableStandard(), environment.getLoggingTable(), new BigQueryServiceImpl());
        }
        Dispatcher dispatcher = new Dispatcher(environment.toConfig(), new BigQueryServiceImpl(), new PubSubServiceImpl(), dlpResultsScanner, new GCSPersistentSetImpl(environment.getGcsFlagsBucket()), "tagging-dispatcher-flags", runId);
        PubSubPublishResults results = dispatcher.execute(bqScope, requestBody.getMessage().getMessageId());
        state = String.format("Publishing results: %s SUCCESS MESSAGES and %s FAILED MESSAGES", results.getSuccessMessages().size(), results.getFailedMessages().size());
        logger.logInfoWithTracker(runId, state);
    } catch (Exception e) {
        logger.logNonRetryableExceptions(runId, e);
        state = String.format("ERROR '%s'", e.getMessage());
    }
    return new ResponseEntity(String.format("Process completed with state = %s", state), HttpStatus.OK);
}
Also used : BigQueryServiceImpl(com.google.cloud.pso.bq_pii_classifier.services.bq.BigQueryServiceImpl) PubSubPublishResults(com.google.cloud.pso.bq_pii_classifier.services.pubsub.PubSubPublishResults) NonRetryableApplicationException(com.google.cloud.pso.bq_pii_classifier.entities.NonRetryableApplicationException) Dispatcher(com.google.cloud.pso.bq_pii_classifier.functions.dispatcher.Dispatcher) GCSPersistentSetImpl(com.google.cloud.pso.bq_pii_classifier.services.set.GCSPersistentSetImpl) NonRetryableApplicationException(com.google.cloud.pso.bq_pii_classifier.entities.NonRetryableApplicationException) BigQueryScope(com.google.cloud.pso.bq_pii_classifier.functions.dispatcher.BigQueryScope) PubSubServiceImpl(com.google.cloud.pso.bq_pii_classifier.services.pubsub.PubSubServiceImpl) ResponseEntity(org.springframework.http.ResponseEntity) RequestMapping(org.springframework.web.bind.annotation.RequestMapping)

Example 9 with NonRetryableApplicationException

use of com.google.cloud.pso.bq_pii_classifier.entities.NonRetryableApplicationException in project bq-pii-classifier by GoogleCloudPlatform.

the class FindingsReaderAutoDlp method getFieldsToPolicyTagsMap.

/**
 * Look for DLP results by a tableSpec. Returns a map of fields to policy tags or null if DLP
 * doesn't have findings
 *
 * @param inspectedTableSpec: "project.dataset.table"
 * @return
 * @throws InterruptedException
 * @throws NonRetryableApplicationException
 * @throws IOException
 */
public TablePolicyTags getFieldsToPolicyTagsMap(String inspectedTableSpec) throws InterruptedException, NonRetryableApplicationException, IOException {
    String formattedQuery = generateQuery(inspectedTableSpec);
    // Create a job ID so that we can safely retry.
    Job queryJob = bqService.submitJob(formattedQuery);
    TableResult result = bqService.waitAndGetJobResults(queryJob);
    // Construct a mapping between field names and DLP infotypes
    Map<String, String> fieldsToPolicyTagMap = new HashMap<>();
    for (FieldValueList row : result.iterateAll()) {
        if (row.get("field_name").isNull()) {
            throw new NonRetryableApplicationException("getFieldsToPolicyTagsMap query returned rows with null field_name");
        }
        String column_name = row.get("field_name").getStringValue();
        if (row.get("info_type").isNull()) {
            throw new NonRetryableApplicationException(String.format("getFieldsToPolicyTagsMap query returned rows with null info_type for column '%s'", column_name));
        }
        String info_type = row.get("info_type").getStringValue();
        if (row.get("policy_tag").isNull()) {
            throw new NonRetryableApplicationException(String.format("getFieldsToPolicyTagsMap query returned rows with null policy_tag for column '%s' of info_type '%s'. Checkout the classification taxonomy configuration and the DLP inspection template. All InfoTypes defined in the inspection template must have corresponding entries in the classification taxonomies.", column_name, info_type));
        }
        String policy_tag = row.get("policy_tag").getStringValue();
        fieldsToPolicyTagMap.put(column_name, policy_tag);
    }
    if (fieldsToPolicyTagMap.isEmpty())
        return null;
    else
        return new TablePolicyTags(TableSpec.fromSqlString(inspectedTableSpec), fieldsToPolicyTagMap);
}
Also used : TableResult(com.google.cloud.bigquery.TableResult) TablePolicyTags(com.google.cloud.pso.bq_pii_classifier.entities.TablePolicyTags) HashMap(java.util.HashMap) NonRetryableApplicationException(com.google.cloud.pso.bq_pii_classifier.entities.NonRetryableApplicationException) FieldValueList(com.google.cloud.bigquery.FieldValueList) Job(com.google.cloud.bigquery.Job)

Example 10 with NonRetryableApplicationException

use of com.google.cloud.pso.bq_pii_classifier.entities.NonRetryableApplicationException in project bq-pii-classifier by GoogleCloudPlatform.

the class FindingsReaderStandardDlp method getFieldsToPolicyTagsMap.

/**
 * Look for DLP results by a tableSpec. Returns a map of fields to policy tags or null if DLP
 * doesn't have findings
 *
 * @param dlpJobName: "projects/<PROJECT>/locations/<GCP REGION>/dlpJobs/<JOB ID>"
 * @return
 * @throws InterruptedException
 * @throws NonRetryableApplicationException
 * @throws IOException
 */
public TablePolicyTags getFieldsToPolicyTagsMap(String dlpJobName) throws InterruptedException, NonRetryableApplicationException, IOException {
    String formattedQuery = generateQuery(dlpJobName);
    // Create a job ID so that we can safely retry.
    Job queryJob = bqService.submitJob(formattedQuery);
    TableResult result = bqService.waitAndGetJobResults(queryJob);
    // Construct a mapping between field names and DLP infotypes
    Map<String, String> fieldsToPolicyTagMap = new HashMap<>();
    String tableSpecStr = "";
    for (FieldValueList row : result.iterateAll()) {
        if (row.get("field_name").isNull()) {
            throw new NonRetryableApplicationException("getFieldsToPolicyTagsMap query returned rows with null field_name");
        }
        String column_name = row.get("field_name").getStringValue();
        if (row.get("info_type").isNull()) {
            throw new NonRetryableApplicationException(String.format("getFieldsToPolicyTagsMap query returned rows with null info_type for column '%s'", column_name));
        }
        String info_type = row.get("info_type").getStringValue();
        if (row.get("policy_tag").isNull()) {
            throw new NonRetryableApplicationException(String.format("getFieldsToPolicyTagsMap query returned rows with null policy_tag for column '%s' of info_type '%s'. Checkout the classification taxonomy configuration and the DLP inspection template. All InfoTypes defined in the inspection template must have corresponding entries in the classification taxonomies.", column_name, info_type));
        }
        String policy_tag = row.get("policy_tag").getStringValue();
        if (row.get("table_spec").isNull()) {
            throw new NonRetryableApplicationException("getFieldsToPolicyTagsMap query returned rows with null table_spec");
        }
        tableSpecStr = row.get("table_spec").getStringValue();
        fieldsToPolicyTagMap.put(column_name, policy_tag);
    }
    if (fieldsToPolicyTagMap.isEmpty())
        return null;
    else
        return new TablePolicyTags(TableSpec.fromSqlString(tableSpecStr), fieldsToPolicyTagMap);
}
Also used : TableResult(com.google.cloud.bigquery.TableResult) TablePolicyTags(com.google.cloud.pso.bq_pii_classifier.entities.TablePolicyTags) HashMap(java.util.HashMap) NonRetryableApplicationException(com.google.cloud.pso.bq_pii_classifier.entities.NonRetryableApplicationException) FieldValueList(com.google.cloud.bigquery.FieldValueList) Job(com.google.cloud.bigquery.Job)

Aggregations

NonRetryableApplicationException (com.google.cloud.pso.bq_pii_classifier.entities.NonRetryableApplicationException)13 FieldValueList (com.google.cloud.bigquery.FieldValueList)6 Job (com.google.cloud.bigquery.Job)6 TableResult (com.google.cloud.bigquery.TableResult)6 BigQueryServiceImpl (com.google.cloud.pso.bq_pii_classifier.services.bq.BigQueryServiceImpl)5 ArrayList (java.util.ArrayList)5 Operation (com.google.cloud.pso.bq_pii_classifier.entities.Operation)4 GCSPersistentSetImpl (com.google.cloud.pso.bq_pii_classifier.services.set.GCSPersistentSetImpl)4 ResponseEntity (org.springframework.http.ResponseEntity)4 RequestMapping (org.springframework.web.bind.annotation.RequestMapping)4 PubSubPublishResults (com.google.cloud.pso.bq_pii_classifier.services.pubsub.PubSubPublishResults)3 PubSubServiceImpl (com.google.cloud.pso.bq_pii_classifier.services.pubsub.PubSubServiceImpl)3 TablePolicyTags (com.google.cloud.pso.bq_pii_classifier.entities.TablePolicyTags)2 TableSpec (com.google.cloud.pso.bq_pii_classifier.entities.TableSpec)2 BigQueryScope (com.google.cloud.pso.bq_pii_classifier.functions.dispatcher.BigQueryScope)2 Dispatcher (com.google.cloud.pso.bq_pii_classifier.functions.dispatcher.Dispatcher)2 BigQueryService (com.google.cloud.pso.bq_pii_classifier.services.bq.BigQueryService)2 InvalidProtocolBufferException (com.google.protobuf.InvalidProtocolBufferException)2 HashMap (java.util.HashMap)2 DispatcherType (com.google.cloud.pso.bq_pii_classifier.entities.DispatcherType)1