Search in sources :

Example 6 with BigQueryCow

use of bio.terra.cloudres.google.bigquery.BigQueryCow in project terra-workspace-manager by DataBiosphere.

the class CreateTableCopyJobsStep method doStep.

/**
 * Create one BigQuery copy job for each table in the source dataset. Keep a running map from
 * table ID to job ID as new jobs are created, and only create jobs for tables that aren't in the
 * map already. Rerun the step after every table is processed so that the map may be persisted
 * incrementally.
 *
 * <p>On retry, create the jobs for any tables that don't have them. Use WRITE_TRUNCATE to avoid
 * the possibility of duplicate data.
 */
@Override
public StepResult doStep(FlightContext flightContext) throws InterruptedException, RetryException {
    final FlightMap workingMap = flightContext.getWorkingMap();
    final CloningInstructions effectiveCloningInstructions = flightContext.getInputParameters().get(ControlledResourceKeys.CLONING_INSTRUCTIONS, CloningInstructions.class);
    if (CloningInstructions.COPY_RESOURCE != effectiveCloningInstructions) {
        return StepResult.getStepResultSuccess();
    }
    // Gather inputs
    final DatasetCloneInputs sourceInputs = getSourceInputs();
    workingMap.put(ControlledResourceKeys.SOURCE_CLONE_INPUTS, sourceInputs);
    final DatasetCloneInputs destinationInputs = getDestinationInputs(flightContext);
    workingMap.put(ControlledResourceKeys.DESTINATION_CLONE_INPUTS, destinationInputs);
    final BigQueryCow bigQueryCow = crlService.createWsmSaBigQueryCow();
    // TODO(jaycarlton):  remove usage of this client when it's all in CRL PF-942
    final Bigquery bigQueryClient = crlService.createWsmSaNakedBigQueryClient();
    try {
        // Get a list of all tables in the source dataset
        final TableList sourceTables = bigQueryCow.tables().list(sourceInputs.getProjectId(), sourceInputs.getDatasetName()).execute();
        // Start a copy job for each source table
        final Map<String, String> tableToJobId = Optional.ofNullable(workingMap.get(ControlledResourceKeys.TABLE_TO_JOB_ID_MAP, new TypeReference<Map<String, String>>() {
        })).orElseGet(HashMap::new);
        final List<Tables> tables = Optional.ofNullable(sourceTables.getTables()).orElse(Collections.emptyList());
        // Find the first table whose ID isn't a key in the map.
        final Optional<Tables> tableMaybe = tables.stream().filter(t -> null != t.getId() && !tableToJobId.containsKey(t.getId())).findFirst();
        if (tableMaybe.isPresent()) {
            final Tables table = tableMaybe.get();
            checkStreamingBuffer(sourceInputs, bigQueryCow, table);
            final Job inputJob = buildTableCopyJob(sourceInputs, destinationInputs, table);
            // bill the job to the destination project
            final Job submittedJob = bigQueryClient.jobs().insert(destinationInputs.getProjectId(), inputJob).execute();
            // Update the map, which will be persisted
            tableToJobId.put(table.getId(), submittedJob.getId());
            workingMap.put(ControlledResourceKeys.TABLE_TO_JOB_ID_MAP, tableToJobId);
            return new StepResult(StepStatus.STEP_RESULT_RERUN);
        } else {
            // All tables have entries in the map, so all jobs are started.
            workingMap.put(ControlledResourceKeys.TABLE_TO_JOB_ID_MAP, // in case it's empty
            tableToJobId);
            return StepResult.getStepResultSuccess();
        }
    } catch (IOException e) {
        return new StepResult(StepStatus.STEP_RESULT_FAILURE_RETRY, e);
    }
}
Also used : TableList(com.google.api.services.bigquery.model.TableList) BigQueryCow(bio.terra.cloudres.google.bigquery.BigQueryCow) LoggerFactory(org.slf4j.LoggerFactory) HashMap(java.util.HashMap) Tables(com.google.api.services.bigquery.model.TableList.Tables) StepResult(bio.terra.stairway.StepResult) Step(bio.terra.stairway.Step) RetryException(bio.terra.stairway.exception.RetryException) Duration(java.time.Duration) Map(java.util.Map) TypeReference(com.fasterxml.jackson.core.type.TypeReference) Job(com.google.api.services.bigquery.model.Job) CrlService(bio.terra.workspace.service.crl.CrlService) TableReference(com.google.api.services.bigquery.model.TableReference) ControlledBigQueryDatasetResource(bio.terra.workspace.service.resource.controlled.cloud.gcp.bqdataset.ControlledBigQueryDatasetResource) Logger(org.slf4j.Logger) FlightMap(bio.terra.stairway.FlightMap) IOException(java.io.IOException) UUID(java.util.UUID) Instant(java.time.Instant) JobConfigurationTableCopy(com.google.api.services.bigquery.model.JobConfigurationTableCopy) Table(com.google.api.services.bigquery.model.Table) List(java.util.List) GcpCloudContextService(bio.terra.workspace.service.workspace.GcpCloudContextService) Bigquery(com.google.api.services.bigquery.Bigquery) CloningInstructions(bio.terra.workspace.service.resource.model.CloningInstructions) Optional(java.util.Optional) ControlledResourceKeys(bio.terra.workspace.service.workspace.flight.WorkspaceFlightMapKeys.ControlledResourceKeys) StepStatus(bio.terra.stairway.StepStatus) Collections(java.util.Collections) FlightContext(bio.terra.stairway.FlightContext) JobConfiguration(com.google.api.services.bigquery.model.JobConfiguration) HashMap(java.util.HashMap) Bigquery(com.google.api.services.bigquery.Bigquery) TableList(com.google.api.services.bigquery.model.TableList) IOException(java.io.IOException) BigQueryCow(bio.terra.cloudres.google.bigquery.BigQueryCow) CloningInstructions(bio.terra.workspace.service.resource.model.CloningInstructions) Tables(com.google.api.services.bigquery.model.TableList.Tables) FlightMap(bio.terra.stairway.FlightMap) Job(com.google.api.services.bigquery.model.Job) StepResult(bio.terra.stairway.StepResult) HashMap(java.util.HashMap) Map(java.util.Map) FlightMap(bio.terra.stairway.FlightMap)

Example 7 with BigQueryCow

use of bio.terra.cloudres.google.bigquery.BigQueryCow in project terra-workspace-manager by DataBiosphere.

the class RetrieveBigQueryDatasetCloudAttributesStep method doStep.

@Override
public StepResult doStep(FlightContext flightContext) throws InterruptedException, RetryException {
    final String suppliedLocation = flightContext.getInputParameters().get(ControlledResourceKeys.LOCATION, String.class);
    if (!Strings.isNullOrEmpty(suppliedLocation)) {
        flightContext.getWorkingMap().put(ControlledResourceKeys.LOCATION, suppliedLocation);
        // we can stop here as we don't need the original location
        return StepResult.getStepResultSuccess();
    }
    // Since no location was specified, we need to find the original one
    // from the source dataset.
    final String projectId = gcpCloudContextService.getRequiredGcpProject(datasetResource.getWorkspaceId());
    final BigQueryCow bigQueryCow = crlService.createWsmSaBigQueryCow();
    try {
        final Dataset dataset = bigQueryCow.datasets().get(projectId, datasetResource.getDatasetName()).execute();
        final String sourceLocation = dataset.getLocation();
        flightContext.getWorkingMap().put(ControlledResourceKeys.LOCATION, sourceLocation);
        return StepResult.getStepResultSuccess();
    } catch (IOException e) {
        // TODO: consider retry here
        return new StepResult(StepStatus.STEP_RESULT_FAILURE_FATAL, e);
    }
}
Also used : Dataset(com.google.api.services.bigquery.model.Dataset) IOException(java.io.IOException) StepResult(bio.terra.stairway.StepResult) BigQueryCow(bio.terra.cloudres.google.bigquery.BigQueryCow)

Example 8 with BigQueryCow

use of bio.terra.cloudres.google.bigquery.BigQueryCow in project terra-cli by DataBiosphere.

the class ExternalBQDatasets method grantAccess.

/**
 * Grant a given user or group access to a dataset. This method uses SA credentials that have
 * permissions on the external (to WSM) project.
 */
private static void grantAccess(DatasetReference datasetRef, String memberEmail, IamMemberType memberType, String role) throws IOException {
    BigQueryCow bigQuery = getBQCow();
    Dataset datasetToUpdate = bigQuery.datasets().get(datasetRef.getProjectId(), datasetRef.getDatasetId()).execute();
    List<Dataset.Access> accessToUpdate = datasetToUpdate.getAccess();
    Dataset.Access newAccess = new Dataset.Access().setRole(role);
    if (memberType.equals(IamMemberType.USER)) {
        newAccess.setUserByEmail(memberEmail);
    } else {
        newAccess.setGroupByEmail(memberEmail);
    }
    accessToUpdate.add(newAccess);
    datasetToUpdate.setAccess(accessToUpdate);
    bigQuery.datasets().update(datasetRef.getProjectId(), datasetRef.getDatasetId(), datasetToUpdate).execute();
}
Also used : Dataset(com.google.api.services.bigquery.model.Dataset) BigQueryCow(bio.terra.cloudres.google.bigquery.BigQueryCow)

Example 9 with BigQueryCow

use of bio.terra.cloudres.google.bigquery.BigQueryCow in project terra-cli by DataBiosphere.

the class ExternalBQDatasets method grantReadAccessToTable.

/**
 * Grants a given group dataViewer role to the specified table.
 */
public static void grantReadAccessToTable(String projectId, String datasetId, String tableId, String groupEmail) throws IOException {
    BigQueryCow bigQuery = getBQCow();
    Policy policy = bigQuery.tables().getIamPolicy(projectId, datasetId, tableId, new GetIamPolicyRequest()).execute();
    List<Binding> updatedBindings = Optional.ofNullable(policy.getBindings()).orElse(new ArrayList<>());
    updatedBindings.add(new Binding().setRole("roles/bigquery.dataViewer").setMembers(ImmutableList.of("group:" + groupEmail)));
    bigQuery.tables().setIamPolicy(projectId, datasetId, tableId, new SetIamPolicyRequest().setPolicy(policy.setBindings(updatedBindings))).execute();
    System.out.println("Grant dataViewer access to table " + tableId + " for group email: " + groupEmail);
}
Also used : Policy(com.google.api.services.bigquery.model.Policy) Binding(com.google.api.services.bigquery.model.Binding) SetIamPolicyRequest(com.google.api.services.bigquery.model.SetIamPolicyRequest) GetIamPolicyRequest(com.google.api.services.bigquery.model.GetIamPolicyRequest) BigQueryCow(bio.terra.cloudres.google.bigquery.BigQueryCow)

Example 10 with BigQueryCow

use of bio.terra.cloudres.google.bigquery.BigQueryCow in project terra-workspace-manager by DataBiosphere.

the class ControlledResourceServiceTest method deleteBqDatasetDo.

@Test
@DisabledIfEnvironmentVariable(named = "TEST_ENV", matches = BUFFER_SERVICE_DISABLED_ENVS_REG_EX)
void deleteBqDatasetDo() throws Exception {
    String datasetId = ControlledResourceFixtures.uniqueDatasetId();
    String location = "us-central1";
    ApiGcpBigQueryDatasetCreationParameters creationParameters = new ApiGcpBigQueryDatasetCreationParameters().datasetId(datasetId).location(location);
    ControlledBigQueryDatasetResource resource = ControlledResourceFixtures.makeDefaultControlledBigQueryBuilder(workspace.getWorkspaceId()).datasetName(datasetId).build();
    ControlledBigQueryDatasetResource createdDataset = controlledResourceService.createControlledResourceSync(resource, null, user.getAuthenticatedRequest(), creationParameters).castByEnum(WsmResourceType.CONTROLLED_GCP_BIG_QUERY_DATASET);
    assertEquals(resource, createdDataset);
    // Test idempotency of delete by retrying steps once.
    Map<String, StepStatus> retrySteps = new HashMap<>();
    retrySteps.put(DeleteMetadataStep.class.getName(), StepStatus.STEP_RESULT_FAILURE_RETRY);
    retrySteps.put(DeleteBigQueryDatasetStep.class.getName(), StepStatus.STEP_RESULT_FAILURE_RETRY);
    // Do not test lastStepFailure, as this flight has no undo steps, only dismal failure.
    jobService.setFlightDebugInfoForTest(FlightDebugInfo.newBuilder().doStepFailures(retrySteps).build());
    controlledResourceService.deleteControlledResourceSync(resource.getWorkspaceId(), resource.getResourceId(), user.getAuthenticatedRequest());
    BigQueryCow bqCow = crlService.createWsmSaBigQueryCow();
    GoogleJsonResponseException getException = assertThrows(GoogleJsonResponseException.class, () -> bqCow.datasets().get(projectId, resource.getDatasetName()).execute());
    assertEquals(HttpStatus.NOT_FOUND.value(), getException.getStatusCode());
    assertThrows(ResourceNotFoundException.class, () -> controlledResourceService.getControlledResource(workspace.getWorkspaceId(), resource.getResourceId(), user.getAuthenticatedRequest()));
}
Also used : GoogleJsonResponseException(com.google.api.client.googleapis.json.GoogleJsonResponseException) HashMap(java.util.HashMap) DeleteMetadataStep(bio.terra.workspace.service.resource.controlled.flight.delete.DeleteMetadataStep) ApiGcpBigQueryDatasetCreationParameters(bio.terra.workspace.generated.model.ApiGcpBigQueryDatasetCreationParameters) StepStatus(bio.terra.stairway.StepStatus) ControlledBigQueryDatasetResource(bio.terra.workspace.service.resource.controlled.cloud.gcp.bqdataset.ControlledBigQueryDatasetResource) DeleteBigQueryDatasetStep(bio.terra.workspace.service.resource.controlled.cloud.gcp.bqdataset.DeleteBigQueryDatasetStep) BigQueryCow(bio.terra.cloudres.google.bigquery.BigQueryCow) Test(org.junit.jupiter.api.Test) BaseConnectedTest(bio.terra.workspace.common.BaseConnectedTest) DisabledIfEnvironmentVariable(org.junit.jupiter.api.condition.DisabledIfEnvironmentVariable)

Aggregations

BigQueryCow (bio.terra.cloudres.google.bigquery.BigQueryCow)14 StepResult (bio.terra.stairway.StepResult)7 Dataset (com.google.api.services.bigquery.model.Dataset)7 IOException (java.io.IOException)7 GoogleJsonResponseException (com.google.api.client.googleapis.json.GoogleJsonResponseException)6 ApiGcpBigQueryDatasetCreationParameters (bio.terra.workspace.generated.model.ApiGcpBigQueryDatasetCreationParameters)5 ControlledBigQueryDatasetResource (bio.terra.workspace.service.resource.controlled.cloud.gcp.bqdataset.ControlledBigQueryDatasetResource)5 StepStatus (bio.terra.stairway.StepStatus)4 BaseConnectedTest (bio.terra.workspace.common.BaseConnectedTest)4 HashMap (java.util.HashMap)4 Test (org.junit.jupiter.api.Test)4 DisabledIfEnvironmentVariable (org.junit.jupiter.api.condition.DisabledIfEnvironmentVariable)4 FlightMap (bio.terra.stairway.FlightMap)3 CreateBigQueryDatasetStep (bio.terra.workspace.service.resource.controlled.cloud.gcp.bqdataset.CreateBigQueryDatasetStep)2 GcpCloudContext (bio.terra.workspace.service.workspace.model.GcpCloudContext)2 FlightContext (bio.terra.stairway.FlightContext)1 Step (bio.terra.stairway.Step)1 RetryException (bio.terra.stairway.exception.RetryException)1 ApiGcpBigQueryDatasetUpdateParameters (bio.terra.workspace.generated.model.ApiGcpBigQueryDatasetUpdateParameters)1 CrlService (bio.terra.workspace.service.crl.CrlService)1