Search in sources :

Example 11 with FailsafeValueInSingleWindow

use of org.apache.beam.sdk.values.FailsafeValueInSingleWindow in project beam by apache.

the class BigQueryServicesImplTest method testInsertWithinRequestByteSizeLimitsErrorsOut.

/**
 * Tests that {@link DatasetServiceImpl#insertAll} does not go over limit of rows per request.
 */
@Test
public void testInsertWithinRequestByteSizeLimitsErrorsOut() throws Exception {
    TableReference ref = new TableReference().setProjectId("project").setDatasetId("dataset").setTableId("table");
    List<FailsafeValueInSingleWindow<TableRow, TableRow>> rows = ImmutableList.of(wrapValue(new TableRow().set("row", Strings.repeat("abcdefghi", 1024 * 1025))), wrapValue(new TableRow().set("row", "a")), wrapValue(new TableRow().set("row", "b")));
    List<String> insertIds = ImmutableList.of("a", "b", "c");
    final TableDataInsertAllResponse allRowsSucceeded = new TableDataInsertAllResponse();
    setupMockResponses(response -> {
        when(response.getContentType()).thenReturn(Json.MEDIA_TYPE);
        when(response.getStatusCode()).thenReturn(200);
        when(response.getContent()).thenReturn(toStream(allRowsSucceeded));
    }, response -> {
        when(response.getContentType()).thenReturn(Json.MEDIA_TYPE);
        when(response.getStatusCode()).thenReturn(200);
        when(response.getContent()).thenReturn(toStream(allRowsSucceeded));
    });
    DatasetServiceImpl dataService = new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.fromArgs("--maxStreamingBatchSize=15").create());
    List<ValueInSingleWindow<TableRow>> failedInserts = Lists.newArrayList();
    List<ValueInSingleWindow<TableRow>> successfulRows = Lists.newArrayList();
    RuntimeException e = assertThrows(RuntimeException.class, () -> dataService.<TableRow>insertAll(ref, rows, insertIds, BackOffAdapter.toGcpBackOff(TEST_BACKOFF.backoff()), TEST_BACKOFF, new MockSleeper(), InsertRetryPolicy.alwaysRetry(), failedInserts, ErrorContainer.TABLE_ROW_ERROR_CONTAINER, false, false, false, successfulRows));
    assertThat(e.getMessage(), containsString("this row is too large."));
}
Also used : TableReference(com.google.api.services.bigquery.model.TableReference) DatasetServiceImpl(org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.DatasetServiceImpl) TableRow(com.google.api.services.bigquery.model.TableRow) TableDataInsertAllResponse(com.google.api.services.bigquery.model.TableDataInsertAllResponse) ValueInSingleWindow(org.apache.beam.sdk.values.ValueInSingleWindow) FailsafeValueInSingleWindow(org.apache.beam.sdk.values.FailsafeValueInSingleWindow) Matchers.containsString(org.hamcrest.Matchers.containsString) FailsafeValueInSingleWindow(org.apache.beam.sdk.values.FailsafeValueInSingleWindow) MockSleeper(com.google.api.client.testing.util.MockSleeper) Test(org.junit.Test)

Example 12 with FailsafeValueInSingleWindow

use of org.apache.beam.sdk.values.FailsafeValueInSingleWindow in project beam by apache.

the class BigQueryServicesImplTest method testFailInsertOtherRetry.

/**
 * Tests that {@link DatasetServiceImpl#insertAll} will not retry other non-rate-limited,
 * non-quota-exceeded attempts.
 */
@Test
public void testFailInsertOtherRetry() throws Exception {
    TableReference ref = new TableReference().setProjectId("project").setDatasetId("dataset").setTableId("table");
    List<FailsafeValueInSingleWindow<TableRow, TableRow>> rows = new ArrayList<>();
    rows.add(wrapValue(new TableRow()));
    // First response is 403 non-{rate-limited, quota-exceeded}, second response has valid payload
    // but should not be invoked.
    setupMockResponses(response -> {
        when(response.getStatusCode()).thenReturn(403);
        when(response.getContentType()).thenReturn(Json.MEDIA_TYPE);
        when(response.getContent()).thenReturn(toStream(errorWithReasonAndStatus("actually forbidden", 403)));
    }, response -> {
        when(response.getContentType()).thenReturn(Json.MEDIA_TYPE);
        when(response.getStatusCode()).thenReturn(200);
        when(response.getContent()).thenReturn(toStream(new TableDataInsertAllResponse()));
    });
    DatasetServiceImpl dataService = new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create());
    thrown.expect(RuntimeException.class);
    thrown.expectMessage("actually forbidden");
    try {
        dataService.insertAll(ref, rows, null, BackOffAdapter.toGcpBackOff(TEST_BACKOFF.backoff()), TEST_BACKOFF, new MockSleeper(), InsertRetryPolicy.alwaysRetry(), null, null, false, false, false, null);
    } finally {
        verify(responses[0], atLeastOnce()).getStatusCode();
        verify(responses[0]).getContent();
        verify(responses[0]).getContentType();
        // It should not invoke 2nd response
        verify(responses[1], never()).getStatusCode();
        verify(responses[1], never()).getContent();
        verify(responses[1], never()).getContentType();
    }
    verifyWriteMetricWasSet("project", "dataset", "table", "actually forbidden", 1);
}
Also used : TableReference(com.google.api.services.bigquery.model.TableReference) DatasetServiceImpl(org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.DatasetServiceImpl) TableRow(com.google.api.services.bigquery.model.TableRow) TableDataInsertAllResponse(com.google.api.services.bigquery.model.TableDataInsertAllResponse) ArrayList(java.util.ArrayList) FailsafeValueInSingleWindow(org.apache.beam.sdk.values.FailsafeValueInSingleWindow) MockSleeper(com.google.api.client.testing.util.MockSleeper) Test(org.junit.Test)

Example 13 with FailsafeValueInSingleWindow

use of org.apache.beam.sdk.values.FailsafeValueInSingleWindow in project beam by apache.

the class BigQueryUtilTest method testInsertAll.

@Test
public void testInsertAll() throws Exception {
    // Build up a list of indices to fail on each invocation. This should result in
    // 5 calls to insertAll.
    List<List<Long>> errorsIndices = new ArrayList<>();
    errorsIndices.add(Arrays.asList(0L, 5L, 10L, 15L, 20L));
    errorsIndices.add(Arrays.asList(0L, 2L, 4L));
    errorsIndices.add(Arrays.asList(0L, 2L));
    errorsIndices.add(new ArrayList<>());
    onInsertAll(errorsIndices);
    TableReference ref = BigQueryHelpers.parseTableSpec("project:dataset.table");
    DatasetServiceImpl datasetService = new DatasetServiceImpl(mockClient, null, options, 5);
    List<FailsafeValueInSingleWindow<TableRow, TableRow>> rows = new ArrayList<>();
    List<String> ids = new ArrayList<>();
    for (int i = 0; i < 25; ++i) {
        rows.add(FailsafeValueInSingleWindow.of(rawRow("foo", 1234), GlobalWindow.TIMESTAMP_MAX_VALUE, GlobalWindow.INSTANCE, PaneInfo.ON_TIME_AND_ONLY_FIRING, rawRow("foo", 1234)));
        ids.add("");
    }
    long totalBytes = datasetService.insertAll(ref, rows, ids, InsertRetryPolicy.alwaysRetry(), null, null, false, false, false, null);
    verifyInsertAll(5);
    // Each of the 25 rows has 1 byte for length and 30 bytes: '{"f":[{"v":"foo"},{"v":1234}]}'
    assertEquals("Incorrect byte count", 25L * 31L, totalBytes);
}
Also used : TableReference(com.google.api.services.bigquery.model.TableReference) DatasetServiceImpl(org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.DatasetServiceImpl) ArrayList(java.util.ArrayList) ArrayList(java.util.ArrayList) List(java.util.List) TableDataList(com.google.api.services.bigquery.model.TableDataList) ArgumentMatchers.anyString(org.mockito.ArgumentMatchers.anyString) FailsafeValueInSingleWindow(org.apache.beam.sdk.values.FailsafeValueInSingleWindow) Test(org.junit.Test)

Example 14 with FailsafeValueInSingleWindow

use of org.apache.beam.sdk.values.FailsafeValueInSingleWindow in project beam by apache.

the class BigQueryServicesImplTest method testInsertRetryPolicy.

/**
 * Tests that {@link DatasetServiceImpl#insertAll} uses the supplied {@link InsertRetryPolicy},
 * and returns the list of rows not retried.
 */
@Test
public void testInsertRetryPolicy() throws InterruptedException, IOException {
    TableReference ref = new TableReference().setProjectId("project").setDatasetId("dataset").setTableId("table");
    List<FailsafeValueInSingleWindow<TableRow, TableRow>> rows = ImmutableList.of(wrapValue(new TableRow()), wrapValue(new TableRow()));
    // First time row0 fails with a retryable error, and row1 fails with a persistent error.
    final TableDataInsertAllResponse firstFailure = new TableDataInsertAllResponse().setInsertErrors(ImmutableList.of(new InsertErrors().setIndex(0L).setErrors(ImmutableList.of(new ErrorProto().setReason("timeout"))), new InsertErrors().setIndex(1L).setErrors(ImmutableList.of(new ErrorProto().setReason("invalid")))));
    // Second time there is only one row, which fails with a retryable error.
    final TableDataInsertAllResponse secondFialure = new TableDataInsertAllResponse().setInsertErrors(ImmutableList.of(new InsertErrors().setIndex(0L).setErrors(ImmutableList.of(new ErrorProto().setReason("timeout")))));
    // On the final attempt, no failures are returned.
    final TableDataInsertAllResponse allRowsSucceeded = new TableDataInsertAllResponse();
    setupMockResponses(response -> {
        when(response.getContentType()).thenReturn(Json.MEDIA_TYPE);
        // Always return 200.
        when(response.getStatusCode()).thenReturn(200);
        when(response.getContent()).thenReturn(toStream(firstFailure));
    }, response -> {
        when(response.getContentType()).thenReturn(Json.MEDIA_TYPE);
        when(response.getStatusCode()).thenReturn(200);
        when(response.getContent()).thenReturn(toStream(secondFialure));
    }, response -> {
        when(response.getContentType()).thenReturn(Json.MEDIA_TYPE);
        when(response.getStatusCode()).thenReturn(200);
        when(response.getContent()).thenReturn(toStream(allRowsSucceeded));
    });
    DatasetServiceImpl dataService = new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create());
    List<ValueInSingleWindow<TableRow>> failedInserts = Lists.newArrayList();
    dataService.insertAll(ref, rows, null, BackOffAdapter.toGcpBackOff(TEST_BACKOFF.backoff()), TEST_BACKOFF, new MockSleeper(), InsertRetryPolicy.retryTransientErrors(), failedInserts, ErrorContainer.TABLE_ROW_ERROR_CONTAINER, false, false, false, null);
    assertEquals(1, failedInserts.size());
    expectedLogs.verifyInfo("Retrying 1 failed inserts to BigQuery");
    verifyWriteMetricWasSet("project", "dataset", "table", "timeout", 2);
}
Also used : TableReference(com.google.api.services.bigquery.model.TableReference) ErrorProto(com.google.api.services.bigquery.model.ErrorProto) DatasetServiceImpl(org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.DatasetServiceImpl) TableRow(com.google.api.services.bigquery.model.TableRow) TableDataInsertAllResponse(com.google.api.services.bigquery.model.TableDataInsertAllResponse) ValueInSingleWindow(org.apache.beam.sdk.values.ValueInSingleWindow) FailsafeValueInSingleWindow(org.apache.beam.sdk.values.FailsafeValueInSingleWindow) InsertErrors(com.google.api.services.bigquery.model.TableDataInsertAllResponse.InsertErrors) FailsafeValueInSingleWindow(org.apache.beam.sdk.values.FailsafeValueInSingleWindow) MockSleeper(com.google.api.client.testing.util.MockSleeper) Test(org.junit.Test)

Example 15 with FailsafeValueInSingleWindow

use of org.apache.beam.sdk.values.FailsafeValueInSingleWindow in project beam by apache.

the class BigQueryServicesImplTest method testInsertStoppedRetry.

/**
 * Tests that {@link DatasetServiceImpl#insertAll} can stop quotaExceeded retry attempts.
 */
@Test
public void testInsertStoppedRetry() throws Exception {
    TableReference ref = new TableReference().setProjectId("project").setDatasetId("dataset").setTableId("table");
    List<FailsafeValueInSingleWindow<TableRow, TableRow>> rows = new ArrayList<>();
    rows.add(wrapValue(new TableRow()));
    MockSetupFunction quotaExceededResponse = response -> {
        when(response.getStatusCode()).thenReturn(403);
        when(response.getContentType()).thenReturn(Json.MEDIA_TYPE);
        when(response.getContent()).thenReturn(toStream(errorWithReasonAndStatus("quotaExceeded", 403)));
    };
    // Respond 403 four times, then valid payload.
    setupMockResponses(quotaExceededResponse, quotaExceededResponse, quotaExceededResponse, quotaExceededResponse, response -> {
        when(response.getContentType()).thenReturn(Json.MEDIA_TYPE);
        when(response.getStatusCode()).thenReturn(200);
        when(response.getContent()).thenReturn(toStream(new TableDataInsertAllResponse()));
    });
    thrown.expect(RuntimeException.class);
    // Google-http-client 1.39.1 and higher does not read the content of the response with error
    // status code. How can we ensure appropriate exception is thrown?
    thrown.expectMessage("quotaExceeded");
    DatasetServiceImpl dataService = new DatasetServiceImpl(bigquery, null, PipelineOptionsFactory.create());
    dataService.insertAll(ref, rows, null, BackOffAdapter.toGcpBackOff(TEST_BACKOFF.backoff()), TEST_BACKOFF, new MockSleeper(), InsertRetryPolicy.alwaysRetry(), null, null, false, false, false, null);
    verifyAllResponsesAreRead();
    verifyWriteMetricWasSet("project", "dataset", "table", "quotaexceeded", 1);
}
Also used : MetricName(org.apache.beam.sdk.metrics.MetricName) ExpectedLogs(org.apache.beam.sdk.testing.ExpectedLogs) ReadRowsResponse(com.google.cloud.bigquery.storage.v1.ReadRowsResponse) MockSleeper(com.google.api.client.testing.util.MockSleeper) ValueInSingleWindow(org.apache.beam.sdk.values.ValueInSingleWindow) ErrorInfo(com.google.api.client.googleapis.json.GoogleJsonError.ErrorInfo) MockitoAnnotations(org.mockito.MockitoAnnotations) GoogleJsonErrorContainer(com.google.api.client.googleapis.json.GoogleJsonErrorContainer) Strings(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings) GenericJson(com.google.api.client.json.GenericJson) GlobalWindow(org.apache.beam.sdk.transforms.windowing.GlobalWindow) Status(io.grpc.Status) JobServiceImpl(org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.JobServiceImpl) FastNanoClockAndSleeper(org.apache.beam.sdk.extensions.gcp.util.FastNanoClockAndSleeper) Json(com.google.api.client.json.Json) Mockito.atLeastOnce(org.mockito.Mockito.atLeastOnce) HttpResponseException(com.google.api.client.http.HttpResponseException) Matchers.instanceOf(org.hamcrest.Matchers.instanceOf) MockLowLevelHttpRequest(com.google.api.client.testing.http.MockLowLevelHttpRequest) Assert.assertFalse(org.junit.Assert.assertFalse) Matchers.is(org.hamcrest.Matchers.is) Metadata(io.grpc.Metadata) RetryBoundedBackOff(com.google.cloud.hadoop.util.RetryBoundedBackOff) Matchers.containsString(org.hamcrest.Matchers.containsString) DatasetServiceImpl(org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.DatasetServiceImpl) Mockito.mock(org.mockito.Mockito.mock) MockHttpTransport(com.google.api.client.testing.http.MockHttpTransport) MonitoringInfoMetricName(org.apache.beam.runners.core.metrics.MonitoringInfoMetricName) Duration(org.joda.time.Duration) RunWith(org.junit.runner.RunWith) ArrayList(java.util.ArrayList) JobReference(com.google.api.services.bigquery.model.JobReference) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) Before(org.junit.Before) TableReference(com.google.api.services.bigquery.model.TableReference) RetryHttpRequestInitializer(org.apache.beam.sdk.extensions.gcp.util.RetryHttpRequestInitializer) TableFieldSchema(com.google.api.services.bigquery.model.TableFieldSchema) GoogleJsonError(com.google.api.client.googleapis.json.GoogleJsonError) ReadSession(com.google.cloud.bigquery.storage.v1.ReadSession) MetricsContainerImpl(org.apache.beam.runners.core.metrics.MetricsContainerImpl) Assert.assertTrue(org.junit.Assert.assertTrue) Mockito.times(org.mockito.Mockito.times) IOException(java.io.IOException) Test(org.junit.Test) ApiException(com.google.api.gax.rpc.ApiException) MetricsEnvironment(org.apache.beam.sdk.metrics.MetricsEnvironment) Mockito.never(org.mockito.Mockito.never) Assert.assertNull(org.junit.Assert.assertNull) Bigquery(com.google.api.services.bigquery.Bigquery) StatusCode(com.google.api.gax.rpc.StatusCode) Assert.assertEquals(org.junit.Assert.assertEquals) CreateReadSessionRequest(com.google.cloud.bigquery.storage.v1.CreateReadSessionRequest) SplitReadStreamRequest(com.google.cloud.bigquery.storage.v1.SplitReadStreamRequest) ApiErrorExtractor(com.google.cloud.hadoop.util.ApiErrorExtractor) ReadRowsRequest(com.google.cloud.bigquery.storage.v1.ReadRowsRequest) GoogleJsonResponseException(com.google.api.client.googleapis.json.GoogleJsonResponseException) ByteArrayInputStream(java.io.ByteArrayInputStream) Transport(org.apache.beam.sdk.extensions.gcp.util.Transport) TableRow(com.google.api.services.bigquery.model.TableRow) Assert.fail(org.junit.Assert.fail) TableSchema(com.google.api.services.bigquery.model.TableSchema) FailsafeValueInSingleWindow(org.apache.beam.sdk.values.FailsafeValueInSingleWindow) JacksonFactory(com.google.api.client.json.jackson2.JacksonFactory) PaneInfo(org.apache.beam.sdk.transforms.windowing.PaneInfo) Parser(com.google.protobuf.Parser) List(java.util.List) JobStatus(com.google.api.services.bigquery.model.JobStatus) TableDataInsertAllResponse(com.google.api.services.bigquery.model.TableDataInsertAllResponse) TableDataInsertAllRequest(com.google.api.services.bigquery.model.TableDataInsertAllRequest) ImmutableList(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList) BackOff(com.google.api.client.util.BackOff) ErrorProto(com.google.api.services.bigquery.model.ErrorProto) ArgumentMatchers.any(org.mockito.ArgumentMatchers.any) Assert.assertThrows(org.junit.Assert.assertThrows) HashMap(java.util.HashMap) PipelineOptionsFactory(org.apache.beam.sdk.options.PipelineOptionsFactory) BackOffAdapter(org.apache.beam.sdk.extensions.gcp.util.BackOffAdapter) RetryInfo(com.google.rpc.RetryInfo) Verify(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Verify) MonitoringInfoConstants(org.apache.beam.runners.core.metrics.MonitoringInfoConstants) Job(com.google.api.services.bigquery.model.Job) ExpectedException(org.junit.rules.ExpectedException) Sleeper(com.google.api.client.util.Sleeper) GcpResourceIdentifiers(org.apache.beam.runners.core.metrics.GcpResourceIdentifiers) FluentBackoff(org.apache.beam.sdk.util.FluentBackoff) Iterator(java.util.Iterator) Assert.assertNotNull(org.junit.Assert.assertNotNull) Lists(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists) Mockito.when(org.mockito.Mockito.when) JUnit4(org.junit.runners.JUnit4) InsertErrors(com.google.api.services.bigquery.model.TableDataInsertAllResponse.InsertErrors) Table(com.google.api.services.bigquery.model.Table) Mockito.verify(org.mockito.Mockito.verify) LowLevelHttpResponse(com.google.api.client.http.LowLevelHttpResponse) Rule(org.junit.Rule) SplitReadStreamResponse(com.google.cloud.bigquery.storage.v1.SplitReadStreamResponse) TableDataList(com.google.api.services.bigquery.model.TableDataList) InputStream(java.io.InputStream) TableReference(com.google.api.services.bigquery.model.TableReference) DatasetServiceImpl(org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.DatasetServiceImpl) TableRow(com.google.api.services.bigquery.model.TableRow) TableDataInsertAllResponse(com.google.api.services.bigquery.model.TableDataInsertAllResponse) ArrayList(java.util.ArrayList) FailsafeValueInSingleWindow(org.apache.beam.sdk.values.FailsafeValueInSingleWindow) MockSleeper(com.google.api.client.testing.util.MockSleeper) Test(org.junit.Test)

Aggregations

TableReference (com.google.api.services.bigquery.model.TableReference)16 DatasetServiceImpl (org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.DatasetServiceImpl)16 FailsafeValueInSingleWindow (org.apache.beam.sdk.values.FailsafeValueInSingleWindow)16 Test (org.junit.Test)16 MockSleeper (com.google.api.client.testing.util.MockSleeper)15 TableRow (com.google.api.services.bigquery.model.TableRow)15 TableDataInsertAllResponse (com.google.api.services.bigquery.model.TableDataInsertAllResponse)14 ArrayList (java.util.ArrayList)8 ValueInSingleWindow (org.apache.beam.sdk.values.ValueInSingleWindow)8 Matchers.containsString (org.hamcrest.Matchers.containsString)8 ErrorProto (com.google.api.services.bigquery.model.ErrorProto)7 InsertErrors (com.google.api.services.bigquery.model.TableDataInsertAllResponse.InsertErrors)7 TableDataList (com.google.api.services.bigquery.model.TableDataList)4 GoogleJsonError (com.google.api.client.googleapis.json.GoogleJsonError)3 ErrorInfo (com.google.api.client.googleapis.json.GoogleJsonError.ErrorInfo)3 GoogleJsonErrorContainer (com.google.api.client.googleapis.json.GoogleJsonErrorContainer)3 GoogleJsonResponseException (com.google.api.client.googleapis.json.GoogleJsonResponseException)3 HttpResponseException (com.google.api.client.http.HttpResponseException)3 LowLevelHttpResponse (com.google.api.client.http.LowLevelHttpResponse)3 GenericJson (com.google.api.client.json.GenericJson)3