Search in sources :

Example 1 with ErrorProto

use of com.google.api.services.bigquery.model.ErrorProto in project beam by apache.

the class FakeJobService method runLoadJob.

private JobStatus runLoadJob(JobReference jobRef, JobConfigurationLoad load) throws InterruptedException, IOException {
    TableReference destination = load.getDestinationTable();
    TableSchema schema = load.getSchema();
    List<ResourceId> sourceFiles = filesForLoadJobs.get(jobRef.getProjectId(), jobRef.getJobId());
    WriteDisposition writeDisposition = WriteDisposition.valueOf(load.getWriteDisposition());
    CreateDisposition createDisposition = CreateDisposition.valueOf(load.getCreateDisposition());
    checkArgument(load.getSourceFormat().equals("NEWLINE_DELIMITED_JSON"));
    Table existingTable = datasetService.getTable(destination);
    if (!validateDispositions(existingTable, createDisposition, writeDisposition)) {
        return new JobStatus().setState("FAILED").setErrorResult(new ErrorProto());
    }
    datasetService.createTable(new Table().setTableReference(destination).setSchema(schema));
    List<TableRow> rows = Lists.newArrayList();
    for (ResourceId filename : sourceFiles) {
        rows.addAll(readRows(filename.toString()));
    }
    datasetService.insertAll(destination, rows, null);
    return new JobStatus().setState("DONE");
}
Also used : JobStatus(com.google.api.services.bigquery.model.JobStatus) TableReference(com.google.api.services.bigquery.model.TableReference) CreateDisposition(org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition) HashBasedTable(com.google.common.collect.HashBasedTable) Table(com.google.api.services.bigquery.model.Table) ErrorProto(com.google.api.services.bigquery.model.ErrorProto) TableSchema(com.google.api.services.bigquery.model.TableSchema) ResourceId(org.apache.beam.sdk.io.fs.ResourceId) TableRow(com.google.api.services.bigquery.model.TableRow) WriteDisposition(org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.WriteDisposition)

Example 2 with ErrorProto

use of com.google.api.services.bigquery.model.ErrorProto in project beam by apache.

the class BigQueryIOTest method testRetryPolicy.

@Test
public void testRetryPolicy() throws Exception {
    BigQueryOptions bqOptions = TestPipeline.testingPipelineOptions().as(BigQueryOptions.class);
    bqOptions.setProject("project-id");
    bqOptions.setTempLocation(testFolder.newFolder("BigQueryIOTest").getAbsolutePath());
    FakeDatasetService datasetService = new FakeDatasetService();
    FakeBigQueryServices fakeBqServices = new FakeBigQueryServices().withJobService(new FakeJobService()).withDatasetService(datasetService);
    datasetService.createDataset("project-id", "dataset-id", "", "");
    TableRow row1 = new TableRow().set("name", "a").set("number", "1");
    TableRow row2 = new TableRow().set("name", "b").set("number", "2");
    TableRow row3 = new TableRow().set("name", "c").set("number", "3");
    TableDataInsertAllResponse.InsertErrors ephemeralError = new TableDataInsertAllResponse.InsertErrors().setErrors(ImmutableList.of(new ErrorProto().setReason("timeout")));
    TableDataInsertAllResponse.InsertErrors persistentError = new TableDataInsertAllResponse.InsertErrors().setErrors(ImmutableList.of(new ErrorProto().setReason("invalidQuery")));
    datasetService.failOnInsert(ImmutableMap.<TableRow, List<TableDataInsertAllResponse.InsertErrors>>of(row1, ImmutableList.of(ephemeralError, ephemeralError), row2, ImmutableList.of(ephemeralError, ephemeralError, persistentError)));
    Pipeline p = TestPipeline.create(bqOptions);
    PCollection<TableRow> failedRows = p.apply(Create.of(row1, row2, row3)).setIsBoundedInternal(IsBounded.UNBOUNDED).apply(BigQueryIO.writeTableRows().to("project-id:dataset-id.table-id").withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED).withSchema(new TableSchema().setFields(ImmutableList.of(new TableFieldSchema().setName("name").setType("STRING"), new TableFieldSchema().setName("number").setType("INTEGER")))).withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors()).withTestServices(fakeBqServices).withoutValidation()).getFailedInserts();
    // row2 finally fails with a non-retryable error, so we expect to see it in the collection of
    // failed rows.
    PAssert.that(failedRows).containsInAnyOrder(row2);
    p.run();
    // Only row1 and row3 were successfully inserted.
    assertThat(datasetService.getAllRows("project-id", "dataset-id", "table-id"), containsInAnyOrder(row1, row3));
}
Also used : ErrorProto(com.google.api.services.bigquery.model.ErrorProto) TableSchema(com.google.api.services.bigquery.model.TableSchema) JsonSchemaToTableSchema(org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.JsonSchemaToTableSchema) TableDataInsertAllResponse(com.google.api.services.bigquery.model.TableDataInsertAllResponse) TableFieldSchema(com.google.api.services.bigquery.model.TableFieldSchema) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Pipeline(org.apache.beam.sdk.Pipeline) TableRow(com.google.api.services.bigquery.model.TableRow) Test(org.junit.Test)

Example 3 with ErrorProto

use of com.google.api.services.bigquery.model.ErrorProto in project beam by apache.

the class BigQueryTableRowIterator method executeQueryAndWaitForCompletion.

/**
   * Executes the specified query and returns a reference to the temporary BigQuery table created
   * to hold the results.
   *
   * @throws IOException if the query fails.
   */
private TableReference executeQueryAndWaitForCompletion() throws IOException, InterruptedException {
    checkState(projectId != null, "Unable to execute a query without a configured project id");
    checkState(queryConfig != null, "Unable to execute a query without a configured query");
    // Dry run query to get source table location
    Job dryRunJob = new Job().setConfiguration(new JobConfiguration().setQuery(queryConfig).setDryRun(true));
    JobStatistics jobStats = executeWithBackOff(client.jobs().insert(projectId, dryRunJob), String.format("Error when trying to dry run query %s.", queryConfig.toPrettyString())).getStatistics();
    // Let BigQuery to pick default location if the query does not read any tables.
    String location = null;
    @Nullable List<TableReference> tables = jobStats.getQuery().getReferencedTables();
    if (tables != null && !tables.isEmpty()) {
        Table table = getTable(tables.get(0));
        location = table.getLocation();
    }
    // Create a temporary dataset to store results.
    // Starting dataset name with an "_" so that it is hidden.
    Random rnd = new Random(System.currentTimeMillis());
    temporaryDatasetId = "_beam_temporary_dataset_" + rnd.nextInt(1000000);
    temporaryTableId = "beam_temporary_table_" + rnd.nextInt(1000000);
    createDataset(temporaryDatasetId, location);
    Job job = new Job();
    JobConfiguration config = new JobConfiguration();
    config.setQuery(queryConfig);
    job.setConfiguration(config);
    TableReference destinationTable = new TableReference();
    destinationTable.setProjectId(projectId);
    destinationTable.setDatasetId(temporaryDatasetId);
    destinationTable.setTableId(temporaryTableId);
    queryConfig.setDestinationTable(destinationTable);
    queryConfig.setAllowLargeResults(true);
    Job queryJob = executeWithBackOff(client.jobs().insert(projectId, job), String.format("Error when trying to execute the job for query %s.", queryConfig.toPrettyString()));
    JobReference jobId = queryJob.getJobReference();
    while (true) {
        Job pollJob = executeWithBackOff(client.jobs().get(projectId, jobId.getJobId()), String.format("Error when trying to get status of the job for query %s.", queryConfig.toPrettyString()));
        JobStatus status = pollJob.getStatus();
        if (status.getState().equals("DONE")) {
            // Job is DONE, but did not necessarily succeed.
            ErrorProto error = status.getErrorResult();
            if (error == null) {
                return pollJob.getConfiguration().getQuery().getDestinationTable();
            } else {
                // There will be no temporary table to delete, so null out the reference.
                temporaryTableId = null;
                throw new IOException(String.format("Executing query %s failed: %s", queryConfig.toPrettyString(), error.getMessage()));
            }
        }
        Uninterruptibles.sleepUninterruptibly(QUERY_COMPLETION_POLL_TIME.getMillis(), TimeUnit.MILLISECONDS);
    }
}
Also used : JobStatistics(com.google.api.services.bigquery.model.JobStatistics) Table(com.google.api.services.bigquery.model.Table) JobReference(com.google.api.services.bigquery.model.JobReference) ErrorProto(com.google.api.services.bigquery.model.ErrorProto) IOException(java.io.IOException) JobStatus(com.google.api.services.bigquery.model.JobStatus) TableReference(com.google.api.services.bigquery.model.TableReference) Random(java.util.Random) Job(com.google.api.services.bigquery.model.Job) JobConfiguration(com.google.api.services.bigquery.model.JobConfiguration) Nullable(javax.annotation.Nullable)

Example 4 with ErrorProto

use of com.google.api.services.bigquery.model.ErrorProto in project google-cloud-java by GoogleCloudPlatform.

the class BigQueryImplTest method testInsertAll.

@Test
public void testInsertAll() {
    Map<String, Object> row1 = ImmutableMap.<String, Object>of("field", "value1");
    Map<String, Object> row2 = ImmutableMap.<String, Object>of("field", "value2");
    List<RowToInsert> rows = ImmutableList.of(new RowToInsert("row1", row1), new RowToInsert("row2", row2));
    InsertAllRequest request = InsertAllRequest.newBuilder(TABLE_ID).setRows(rows).setSkipInvalidRows(false).setIgnoreUnknownValues(true).setTemplateSuffix("suffix").build();
    TableDataInsertAllRequest requestPb = new TableDataInsertAllRequest().setRows(Lists.transform(rows, new Function<RowToInsert, TableDataInsertAllRequest.Rows>() {

        @Override
        public TableDataInsertAllRequest.Rows apply(RowToInsert rowToInsert) {
            return new TableDataInsertAllRequest.Rows().setInsertId(rowToInsert.getId()).setJson(rowToInsert.getContent());
        }
    })).setSkipInvalidRows(false).setIgnoreUnknownValues(true).setTemplateSuffix("suffix");
    TableDataInsertAllResponse responsePb = new TableDataInsertAllResponse().setInsertErrors(ImmutableList.of(new TableDataInsertAllResponse.InsertErrors().setIndex(0L).setErrors(ImmutableList.of(new ErrorProto().setMessage("ErrorMessage")))));
    EasyMock.expect(bigqueryRpcMock.insertAll(PROJECT, DATASET, TABLE, requestPb)).andReturn(responsePb);
    EasyMock.replay(bigqueryRpcMock);
    bigquery = options.getService();
    InsertAllResponse response = bigquery.insertAll(request);
    assertNotNull(response.getErrorsFor(0L));
    assertNull(response.getErrorsFor(1L));
    assertEquals(1, response.getErrorsFor(0L).size());
    assertEquals("ErrorMessage", response.getErrorsFor(0L).get(0).getMessage());
}
Also used : ErrorProto(com.google.api.services.bigquery.model.ErrorProto) TableDataInsertAllResponse(com.google.api.services.bigquery.model.TableDataInsertAllResponse) TableDataInsertAllResponse(com.google.api.services.bigquery.model.TableDataInsertAllResponse) TableDataInsertAllRequest(com.google.api.services.bigquery.model.TableDataInsertAllRequest) RowToInsert(com.google.cloud.bigquery.InsertAllRequest.RowToInsert) TableDataInsertAllRequest(com.google.api.services.bigquery.model.TableDataInsertAllRequest) Test(org.junit.Test)

Example 5 with ErrorProto

use of com.google.api.services.bigquery.model.ErrorProto in project beam by apache.

the class BigQueryServicesImplTest method testInsertRetrySelectRows.

/**
 * Tests that {@link DatasetServiceImpl#insertAll} retries selected rows on failure.
 */
@Test
public void testInsertRetrySelectRows() throws Exception {
    TableReference ref = new TableReference().setProjectId("project").setDatasetId("dataset").setTableId("table");
    List<FailsafeValueInSingleWindow<TableRow, TableRow>> rows = ImmutableList.of(wrapValue(new TableRow().set("row", "a")), wrapValue(new TableRow().set("row", "b")));
    List<String> insertIds = ImmutableList.of("a", "b");
    final TableDataInsertAllResponse bFailed = new TableDataInsertAllResponse().setInsertErrors(ImmutableList.of(new InsertErrors().setIndex(1L).setErrors(ImmutableList.of(new ErrorProto()))));
    final TableDataInsertAllResponse allRowsSucceeded = new TableDataInsertAllResponse();
    setupMockResponses(response -> {
        when(response.getContentType()).thenReturn(Json.MEDIA_TYPE);
        when(response.getStatusCode()).thenReturn(200);
        when(response.getContent()).thenReturn(toStream(bFailed));
    }, response -> {
        when(response.getContentType()).thenReturn(Json.MEDIA_TYPE);
        when(response.getStatusCode()).thenReturn(200);
        when(response.getContent()).thenReturn(toStream(allRowsSucceeded));
    });
    DatasetServiceImpl dataService = new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create());
    dataService.insertAll(ref, rows, insertIds, BackOffAdapter.toGcpBackOff(TEST_BACKOFF.backoff()), TEST_BACKOFF, new MockSleeper(), InsertRetryPolicy.alwaysRetry(), null, null, false, false, false, null);
    verifyAllResponsesAreRead();
    verifyWriteMetricWasSet("project", "dataset", "table", "unknown", 1);
    verifyWriteMetricWasSet("project", "dataset", "table", "ok", 1);
}
Also used : TableReference(com.google.api.services.bigquery.model.TableReference) ErrorProto(com.google.api.services.bigquery.model.ErrorProto) DatasetServiceImpl(org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.DatasetServiceImpl) TableRow(com.google.api.services.bigquery.model.TableRow) TableDataInsertAllResponse(com.google.api.services.bigquery.model.TableDataInsertAllResponse) InsertErrors(com.google.api.services.bigquery.model.TableDataInsertAllResponse.InsertErrors) Matchers.containsString(org.hamcrest.Matchers.containsString) FailsafeValueInSingleWindow(org.apache.beam.sdk.values.FailsafeValueInSingleWindow) MockSleeper(com.google.api.client.testing.util.MockSleeper) Test(org.junit.Test)

Aggregations

ErrorProto (com.google.api.services.bigquery.model.ErrorProto)20 TableRow (com.google.api.services.bigquery.model.TableRow)14 Test (org.junit.Test)14 TableDataInsertAllResponse (com.google.api.services.bigquery.model.TableDataInsertAllResponse)13 TableReference (com.google.api.services.bigquery.model.TableReference)11 JobStatus (com.google.api.services.bigquery.model.JobStatus)8 TableSchema (com.google.api.services.bigquery.model.TableSchema)8 Table (com.google.api.services.bigquery.model.Table)6 MockSleeper (com.google.api.client.testing.util.MockSleeper)5 InsertErrors (com.google.api.services.bigquery.model.TableDataInsertAllResponse.InsertErrors)5 TableFieldSchema (com.google.api.services.bigquery.model.TableFieldSchema)5 DatasetServiceImpl (org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.DatasetServiceImpl)5 FailsafeValueInSingleWindow (org.apache.beam.sdk.values.FailsafeValueInSingleWindow)5 Job (com.google.api.services.bigquery.model.Job)4 JobReference (com.google.api.services.bigquery.model.JobReference)4 ValueInSingleWindow (org.apache.beam.sdk.values.ValueInSingleWindow)4 TableDataInsertAllRequest (com.google.api.services.bigquery.model.TableDataInsertAllRequest)3 CreateDisposition (org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition)3 WriteDisposition (org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.WriteDisposition)3 RowToInsert (com.google.cloud.bigquery.InsertAllRequest.RowToInsert)2