Search in sources :

Example 1 with CreateReadSessionRequest

use of com.google.cloud.bigquery.storage.v1beta1.Storage.CreateReadSessionRequest in project hadoop-connectors by GoogleCloudDataproc.

the class DirectBigQueryInputFormat method startSession.

private static ReadSession startSession(Configuration configuration, Table table, BigQueryStorageClient client) {
    // Extract relevant configuration settings.
    String jobProjectId = PROJECT_ID.get(configuration, configuration::get);
    String filter = SQL_FILTER.get(configuration, configuration::get);
    Collection<String> selectedFields = SELECTED_FIELDS.getStringCollection(configuration);
    Builder readOptions = TableReadOptions.newBuilder().setRowRestriction(filter);
    if (!selectedFields.isEmpty()) {
        readOptions.addAllSelectedFields(selectedFields);
    }
    CreateReadSessionRequest request = CreateReadSessionRequest.newBuilder().setTableReference(TableReferenceProto.TableReference.newBuilder().setProjectId(table.getTableReference().getProjectId()).setDatasetId(table.getTableReference().getDatasetId()).setTableId(table.getTableReference().getTableId())).setRequestedStreams(DIRECT_PARALLELISM.get(configuration, configuration::getInt)).setParent("projects/" + jobProjectId).setReadOptions(readOptions).setFormat(DataFormat.AVRO).build();
    return client.createReadSession(request);
}
Also used : Builder(com.google.cloud.bigquery.storage.v1beta1.ReadOptions.TableReadOptions.Builder) CreateReadSessionRequest(com.google.cloud.bigquery.storage.v1beta1.Storage.CreateReadSessionRequest)

Example 2 with CreateReadSessionRequest

use of com.google.cloud.bigquery.storage.v1beta1.Storage.CreateReadSessionRequest in project java-bigquerystorage by googleapis.

the class ITBigQueryStorageTest method testColumnSelection.

@Test
public void testColumnSelection() throws IOException {
    TableReference tableReference = TableReference.newBuilder().setProjectId("bigquery-public-data").setDatasetId("samples").setTableId("shakespeare").build();
    TableReadOptions options = TableReadOptions.newBuilder().addSelectedFields("word").addSelectedFields("word_count").setRowRestriction("word_count > 100").build();
    CreateReadSessionRequest request = CreateReadSessionRequest.newBuilder().setParent(parentProjectId).setRequestedStreams(1).setTableReference(tableReference).setReadOptions(options).setFormat(DataFormat.AVRO).build();
    ReadSession session = client.createReadSession(request);
    assertEquals(String.format("Did not receive expected number of streams for table reference '%s' CreateReadSession response:%n%s", TextFormat.shortDebugString(tableReference), session.toString()), 1, session.getStreamsCount());
    StreamPosition readPosition = StreamPosition.newBuilder().setStream(session.getStreams(0)).build();
    ReadRowsRequest readRowsRequest = ReadRowsRequest.newBuilder().setReadPosition(readPosition).build();
    Schema avroSchema = new Schema.Parser().parse(session.getAvroSchema().getSchema());
    String actualSchemaMessage = String.format("Unexpected schema. Actual schema:%n%s", avroSchema.toString(/* pretty = */
    true));
    assertEquals(actualSchemaMessage, Schema.Type.RECORD, avroSchema.getType());
    assertEquals(actualSchemaMessage, "__root__", avroSchema.getName());
    assertEquals(actualSchemaMessage, 2, avroSchema.getFields().size());
    assertEquals(actualSchemaMessage, Schema.Type.STRING, avroSchema.getField("word").schema().getType());
    assertEquals(actualSchemaMessage, Schema.Type.LONG, avroSchema.getField("word_count").schema().getType());
    SimpleRowReader reader = new SimpleRowReader(avroSchema);
    long rowCount = 0;
    ServerStream<ReadRowsResponse> stream = client.readRowsCallable().call(readRowsRequest);
    for (ReadRowsResponse response : stream) {
        rowCount += response.getRowCount();
        reader.processRows(response.getAvroRows(), new SimpleRowReader.AvroRowConsumer() {

            @Override
            public void accept(GenericData.Record record) {
                String rowAssertMessage = String.format("Row not matching expectations: %s", record.toString());
                Long wordCount = (Long) record.get("word_count");
                assertWithMessage(rowAssertMessage).that(wordCount).isGreaterThan(100L);
                Utf8 word = (Utf8) record.get("word");
                assertWithMessage(rowAssertMessage).that(word.length()).isGreaterThan(0);
            }
        });
    }
    assertEquals(1_333, rowCount);
}
Also used : AvroRowConsumer(com.google.cloud.bigquery.storage.v1beta1.it.SimpleRowReader.AvroRowConsumer) ReadSession(com.google.cloud.bigquery.storage.v1beta1.Storage.ReadSession) Schema(org.apache.avro.Schema) StreamPosition(com.google.cloud.bigquery.storage.v1beta1.Storage.StreamPosition) ReadRowsRequest(com.google.cloud.bigquery.storage.v1beta1.Storage.ReadRowsRequest) GenericData(org.apache.avro.generic.GenericData) TableReference(com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto.TableReference) ReadRowsResponse(com.google.cloud.bigquery.storage.v1beta1.Storage.ReadRowsResponse) Utf8(org.apache.avro.util.Utf8) TableReadOptions(com.google.cloud.bigquery.storage.v1beta1.ReadOptions.TableReadOptions) CreateReadSessionRequest(com.google.cloud.bigquery.storage.v1beta1.Storage.CreateReadSessionRequest) Test(org.junit.Test)

Example 3 with CreateReadSessionRequest

use of com.google.cloud.bigquery.storage.v1beta1.Storage.CreateReadSessionRequest in project java-bigquerystorage by googleapis.

the class BigQueryStorageClientTest method createReadSessionTest.

@Test
@SuppressWarnings("all")
public void createReadSessionTest() {
    String name = "name3373707";
    ReadSession expectedResponse = ReadSession.newBuilder().setName(name).build();
    mockBigQueryStorage.addResponse(expectedResponse);
    TableReference tableReference = TableReference.newBuilder().build();
    String parent = "parent-995424086";
    int requestedStreams = 1017221410;
    ReadSession actualResponse = client.createReadSession(tableReference, parent, requestedStreams);
    Assert.assertEquals(expectedResponse, actualResponse);
    List<AbstractMessage> actualRequests = mockBigQueryStorage.getRequests();
    Assert.assertEquals(1, actualRequests.size());
    CreateReadSessionRequest actualRequest = (CreateReadSessionRequest) actualRequests.get(0);
    Assert.assertEquals(tableReference, actualRequest.getTableReference());
    Assert.assertEquals(parent, actualRequest.getParent());
    Assert.assertEquals(requestedStreams, actualRequest.getRequestedStreams());
    Assert.assertTrue(channelProvider.isHeaderSent(ApiClientHeaderProvider.getDefaultApiClientHeaderKey(), GaxGrpcProperties.getDefaultApiClientHeaderPattern()));
}
Also used : TableReference(com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto.TableReference) AbstractMessage(com.google.protobuf.AbstractMessage) ReadSession(com.google.cloud.bigquery.storage.v1beta1.Storage.ReadSession) CreateReadSessionRequest(com.google.cloud.bigquery.storage.v1beta1.Storage.CreateReadSessionRequest) Test(org.junit.Test)

Example 4 with CreateReadSessionRequest

use of com.google.cloud.bigquery.storage.v1beta1.Storage.CreateReadSessionRequest in project hadoop-connectors by GoogleCloudDataproc.

the class DirectBigQueryInputFormatTest method getSplits.

@Test
public void getSplits() throws IOException {
    JobContext jobContext = new JobContextImpl(config, new JobID());
    CreateReadSessionRequest request = CreateReadSessionRequest.newBuilder().setTableReference(TableReferenceProto.TableReference.newBuilder().setProjectId("publicdata").setDatasetId("test_dataset").setTableId("test_table")).setRequestedStreams(// request 3, but only get 2 back
    3).setParent("projects/foo-project").setReadOptions(TableReadOptions.newBuilder().addAllSelectedFields(ImmutableList.of("foo", "bar")).setRowRestriction("foo == 0").build()).setFormat(DataFormat.AVRO).build();
    ReadSession session = ReadSession.newBuilder().setAvroSchema(AvroSchema.newBuilder().setSchema("schema").build()).addAllStreams(ImmutableList.of(Stream.newBuilder().setName("stream1").build(), Stream.newBuilder().setName("stream2").build())).build();
    ImmutableList<DirectBigQueryInputSplit> expected = ImmutableList.of(new DirectBigQueryInputSplit("stream1", "schema", 14), new DirectBigQueryInputSplit("stream2", "schema", 14));
    when(bqClient.createReadSession(any(CreateReadSessionRequest.class))).thenReturn(session);
    try {
        List<InputSplit> splits = input.getSplits(jobContext);
        assertThat(splits).containsExactlyElementsIn(expected);
    } catch (Exception e) {
        e.printStackTrace();
        System.exit(1);
    }
    verify(bqHelper).getTable(tableRef);
    verify(bqClient).createReadSession(request);
}
Also used : JobContextImpl(org.apache.hadoop.mapreduce.task.JobContextImpl) ReadSession(com.google.cloud.bigquery.storage.v1beta1.Storage.ReadSession) DirectBigQueryInputSplit(com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat.DirectBigQueryInputSplit) JobContext(org.apache.hadoop.mapreduce.JobContext) InputSplit(org.apache.hadoop.mapreduce.InputSplit) DirectBigQueryInputSplit(com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat.DirectBigQueryInputSplit) JobID(org.apache.hadoop.mapreduce.JobID) CreateReadSessionRequest(com.google.cloud.bigquery.storage.v1beta1.Storage.CreateReadSessionRequest) IOException(java.io.IOException) Test(org.junit.Test)

Example 5 with CreateReadSessionRequest

use of com.google.cloud.bigquery.storage.v1beta1.Storage.CreateReadSessionRequest in project java-bigquerystorage by googleapis.

the class ITBigQueryStorageTest method testFilter.

@Test
public void testFilter() throws IOException {
    TableReference tableReference = TableReference.newBuilder().setProjectId("bigquery-public-data").setDatasetId("samples").setTableId("shakespeare").build();
    TableReadOptions options = TableReadOptions.newBuilder().setRowRestriction("word_count > 100").build();
    CreateReadSessionRequest request = CreateReadSessionRequest.newBuilder().setParent(parentProjectId).setRequestedStreams(1).setTableReference(tableReference).setReadOptions(options).setFormat(DataFormat.AVRO).build();
    ReadSession session = client.createReadSession(request);
    assertEquals(String.format("Did not receive expected number of streams for table reference '%s' CreateReadSession response:%n%s", TextFormat.shortDebugString(tableReference), session.toString()), 1, session.getStreamsCount());
    StreamPosition readPosition = StreamPosition.newBuilder().setStream(session.getStreams(0)).build();
    ReadRowsRequest readRowsRequest = ReadRowsRequest.newBuilder().setReadPosition(readPosition).build();
    SimpleRowReader reader = new SimpleRowReader(new Schema.Parser().parse(session.getAvroSchema().getSchema()));
    long rowCount = 0;
    ServerStream<ReadRowsResponse> stream = client.readRowsCallable().call(readRowsRequest);
    for (ReadRowsResponse response : stream) {
        rowCount += response.getRowCount();
        reader.processRows(response.getAvroRows(), new SimpleRowReader.AvroRowConsumer() {

            @Override
            public void accept(GenericData.Record record) {
                Long wordCount = (Long) record.get("word_count");
                assertWithMessage("Row not matching expectations: %s", record.toString()).that(wordCount).isGreaterThan(100L);
            }
        });
    }
    assertEquals(1_333, rowCount);
}
Also used : AvroRowConsumer(com.google.cloud.bigquery.storage.v1beta1.it.SimpleRowReader.AvroRowConsumer) ReadSession(com.google.cloud.bigquery.storage.v1beta1.Storage.ReadSession) StreamPosition(com.google.cloud.bigquery.storage.v1beta1.Storage.StreamPosition) ReadRowsRequest(com.google.cloud.bigquery.storage.v1beta1.Storage.ReadRowsRequest) GenericData(org.apache.avro.generic.GenericData) TableReference(com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto.TableReference) ReadRowsResponse(com.google.cloud.bigquery.storage.v1beta1.Storage.ReadRowsResponse) TableReadOptions(com.google.cloud.bigquery.storage.v1beta1.ReadOptions.TableReadOptions) CreateReadSessionRequest(com.google.cloud.bigquery.storage.v1beta1.Storage.CreateReadSessionRequest) Test(org.junit.Test)

Aggregations

CreateReadSessionRequest (com.google.cloud.bigquery.storage.v1beta1.Storage.CreateReadSessionRequest)5 ReadSession (com.google.cloud.bigquery.storage.v1beta1.Storage.ReadSession)4 Test (org.junit.Test)4 TableReference (com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto.TableReference)3 TableReadOptions (com.google.cloud.bigquery.storage.v1beta1.ReadOptions.TableReadOptions)2 ReadRowsRequest (com.google.cloud.bigquery.storage.v1beta1.Storage.ReadRowsRequest)2 ReadRowsResponse (com.google.cloud.bigquery.storage.v1beta1.Storage.ReadRowsResponse)2 StreamPosition (com.google.cloud.bigquery.storage.v1beta1.Storage.StreamPosition)2 AvroRowConsumer (com.google.cloud.bigquery.storage.v1beta1.it.SimpleRowReader.AvroRowConsumer)2 GenericData (org.apache.avro.generic.GenericData)2 Builder (com.google.cloud.bigquery.storage.v1beta1.ReadOptions.TableReadOptions.Builder)1 DirectBigQueryInputSplit (com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat.DirectBigQueryInputSplit)1 AbstractMessage (com.google.protobuf.AbstractMessage)1 IOException (java.io.IOException)1 Schema (org.apache.avro.Schema)1 Utf8 (org.apache.avro.util.Utf8)1 InputSplit (org.apache.hadoop.mapreduce.InputSplit)1 JobContext (org.apache.hadoop.mapreduce.JobContext)1 JobID (org.apache.hadoop.mapreduce.JobID)1 JobContextImpl (org.apache.hadoop.mapreduce.task.JobContextImpl)1