Search in sources :

Example 1 with TableReference

use of com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto.TableReference in project presto by prestodb.

the class ReadSessionCreator method create.

public Storage.ReadSession create(TableId table, ImmutableList<String> selectedFields, Optional<String> filter, int parallelism) {
    TableInfo tableDetails = bigQueryClient.getTable(table);
    TableInfo actualTable = getActualTable(tableDetails, selectedFields, new String[] {});
    try (BigQueryStorageClient bigQueryStorageClient = bigQueryStorageClientFactory.createBigQueryStorageClient()) {
        ReadOptions.TableReadOptions.Builder readOptions = ReadOptions.TableReadOptions.newBuilder().addAllSelectedFields(selectedFields);
        filter.ifPresent(readOptions::setRowRestriction);
        TableReferenceProto.TableReference tableReference = toTableReference(actualTable.getTableId());
        Storage.ReadSession readSession = bigQueryStorageClient.createReadSession(Storage.CreateReadSessionRequest.newBuilder().setParent("projects/" + bigQueryClient.getProjectId()).setFormat(Storage.DataFormat.AVRO).setRequestedStreams(parallelism).setReadOptions(readOptions).setTableReference(tableReference).setShardingStrategy(Storage.ShardingStrategy.BALANCED).build());
        return readSession;
    }
}
Also used : Storage(com.google.cloud.bigquery.storage.v1beta1.Storage) BigQueryStorageClient(com.google.cloud.bigquery.storage.v1beta1.BigQueryStorageClient) TableReferenceProto(com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto) TableInfo(com.google.cloud.bigquery.TableInfo)

Example 2 with TableReference

use of com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto.TableReference in project java-bigquerystorage by googleapis.

the class ITBigQueryStorageTest method testColumnSelection.

@Test
public void testColumnSelection() throws IOException {
    TableReference tableReference = TableReference.newBuilder().setProjectId("bigquery-public-data").setDatasetId("samples").setTableId("shakespeare").build();
    TableReadOptions options = TableReadOptions.newBuilder().addSelectedFields("word").addSelectedFields("word_count").setRowRestriction("word_count > 100").build();
    CreateReadSessionRequest request = CreateReadSessionRequest.newBuilder().setParent(parentProjectId).setRequestedStreams(1).setTableReference(tableReference).setReadOptions(options).setFormat(DataFormat.AVRO).build();
    ReadSession session = client.createReadSession(request);
    assertEquals(String.format("Did not receive expected number of streams for table reference '%s' CreateReadSession response:%n%s", TextFormat.shortDebugString(tableReference), session.toString()), 1, session.getStreamsCount());
    StreamPosition readPosition = StreamPosition.newBuilder().setStream(session.getStreams(0)).build();
    ReadRowsRequest readRowsRequest = ReadRowsRequest.newBuilder().setReadPosition(readPosition).build();
    Schema avroSchema = new Schema.Parser().parse(session.getAvroSchema().getSchema());
    String actualSchemaMessage = String.format("Unexpected schema. Actual schema:%n%s", avroSchema.toString(/* pretty = */
    true));
    assertEquals(actualSchemaMessage, Schema.Type.RECORD, avroSchema.getType());
    assertEquals(actualSchemaMessage, "__root__", avroSchema.getName());
    assertEquals(actualSchemaMessage, 2, avroSchema.getFields().size());
    assertEquals(actualSchemaMessage, Schema.Type.STRING, avroSchema.getField("word").schema().getType());
    assertEquals(actualSchemaMessage, Schema.Type.LONG, avroSchema.getField("word_count").schema().getType());
    SimpleRowReader reader = new SimpleRowReader(avroSchema);
    long rowCount = 0;
    ServerStream<ReadRowsResponse> stream = client.readRowsCallable().call(readRowsRequest);
    for (ReadRowsResponse response : stream) {
        rowCount += response.getRowCount();
        reader.processRows(response.getAvroRows(), new SimpleRowReader.AvroRowConsumer() {

            @Override
            public void accept(GenericData.Record record) {
                String rowAssertMessage = String.format("Row not matching expectations: %s", record.toString());
                Long wordCount = (Long) record.get("word_count");
                assertWithMessage(rowAssertMessage).that(wordCount).isGreaterThan(100L);
                Utf8 word = (Utf8) record.get("word");
                assertWithMessage(rowAssertMessage).that(word.length()).isGreaterThan(0);
            }
        });
    }
    assertEquals(1_333, rowCount);
}
Also used : AvroRowConsumer(com.google.cloud.bigquery.storage.v1beta1.it.SimpleRowReader.AvroRowConsumer) ReadSession(com.google.cloud.bigquery.storage.v1beta1.Storage.ReadSession) Schema(org.apache.avro.Schema) StreamPosition(com.google.cloud.bigquery.storage.v1beta1.Storage.StreamPosition) ReadRowsRequest(com.google.cloud.bigquery.storage.v1beta1.Storage.ReadRowsRequest) GenericData(org.apache.avro.generic.GenericData) TableReference(com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto.TableReference) ReadRowsResponse(com.google.cloud.bigquery.storage.v1beta1.Storage.ReadRowsResponse) Utf8(org.apache.avro.util.Utf8) TableReadOptions(com.google.cloud.bigquery.storage.v1beta1.ReadOptions.TableReadOptions) CreateReadSessionRequest(com.google.cloud.bigquery.storage.v1beta1.Storage.CreateReadSessionRequest) Test(org.junit.Test)

Example 3 with TableReference

use of com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto.TableReference in project java-bigquerystorage by googleapis.

the class ITBigQueryStorageTest method testGeographySqlType.

@Test
public void testGeographySqlType() throws InterruptedException, IOException {
    String table_name = "test_geography_sql_type";
    String createTableStatement = String.format(" CREATE TABLE %s.%s " + " (geo_field GEOGRAPHY NOT NULL)" + " OPTIONS( " + "   description=\"a table with a geography column type\" " + " ) " + "AS " + "   SELECT ST_GEOGPOINT(1.1, 2.2)", DATASET, table_name);
    RunQueryJobAndExpectSuccess(QueryJobConfiguration.newBuilder(createTableStatement).build());
    TableReference tableReference = TableReference.newBuilder().setTableId(table_name).setDatasetId(DATASET).setProjectId(ServiceOptions.getDefaultProjectId()).build();
    List<GenericData.Record> rows = ReadAllRows(/* tableReference = */
    tableReference, /* filter = */
    null);
    assertEquals("Actual rows read: " + rows.toString(), 1, rows.size());
    GenericData.Record record = rows.get(0);
    Schema avroSchema = record.getSchema();
    String actualSchemaMessage = String.format("Unexpected schema. Actual schema:%n%s", avroSchema.toString(/* pretty = */
    true));
    String rowAssertMessage = String.format("Row not matching expectations: %s", record.toString());
    assertEquals(actualSchemaMessage, Schema.Type.RECORD, avroSchema.getType());
    assertEquals(actualSchemaMessage, "__root__", avroSchema.getName());
    assertEquals(actualSchemaMessage, 1, avroSchema.getFields().size());
    assertEquals(actualSchemaMessage, Schema.Type.STRING, avroSchema.getField("geo_field").schema().getType());
    assertEquals(actualSchemaMessage, "GEOGRAPHY", avroSchema.getField("geo_field").schema().getObjectProp("sqlType"));
    assertEquals(rowAssertMessage, new Utf8("POINT(1.1 2.2)"), (Utf8) record.get("geo_field"));
}
Also used : TableReference(com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto.TableReference) Schema(org.apache.avro.Schema) Utf8(org.apache.avro.util.Utf8) GenericData(org.apache.avro.generic.GenericData) Test(org.junit.Test)

Example 4 with TableReference

use of com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto.TableReference in project java-bigquerystorage by googleapis.

the class ITBigQueryStorageTest method testSimpleRead.

@Test
public void testSimpleRead() {
    TableReference tableReference = TableReference.newBuilder().setProjectId("bigquery-public-data").setDatasetId("samples").setTableId("shakespeare").build();
    ReadSession session = client.createReadSession(/* tableReference = */
    tableReference, /* parent = */
    parentProjectId, /* requestedStreams = */
    1);
    assertEquals(String.format("Did not receive expected number of streams for table reference '%s' CreateReadSession response:%n%s", TextFormat.shortDebugString(tableReference), session.toString()), 1, session.getStreamsCount());
    StreamPosition readPosition = StreamPosition.newBuilder().setStream(session.getStreams(0)).build();
    ReadRowsRequest readRowsRequest = ReadRowsRequest.newBuilder().setReadPosition(readPosition).build();
    long rowCount = 0;
    ServerStream<ReadRowsResponse> stream = client.readRowsCallable().call(readRowsRequest);
    for (ReadRowsResponse response : stream) {
        rowCount += response.getRowCount();
    }
    assertEquals(164_656, rowCount);
}
Also used : TableReference(com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto.TableReference) ReadRowsResponse(com.google.cloud.bigquery.storage.v1beta1.Storage.ReadRowsResponse) ReadSession(com.google.cloud.bigquery.storage.v1beta1.Storage.ReadSession) StreamPosition(com.google.cloud.bigquery.storage.v1beta1.Storage.StreamPosition) ReadRowsRequest(com.google.cloud.bigquery.storage.v1beta1.Storage.ReadRowsRequest) Test(org.junit.Test)

Example 5 with TableReference

use of com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto.TableReference in project java-bigquerystorage by googleapis.

the class ITBigQueryStorageTest method testReadAtSnapshot.

@Test
public void testReadAtSnapshot() throws InterruptedException, IOException {
    Field intFieldSchema = Field.newBuilder("col", LegacySQLTypeName.INTEGER).setMode(Mode.REQUIRED).setDescription("IntegerDescription").build();
    com.google.cloud.bigquery.Schema tableSchema = com.google.cloud.bigquery.Schema.of(intFieldSchema);
    TableId testTableId = TableId.of(/* dataset = */
    DATASET, /* table = */
    "test_read_snapshot");
    bigquery.create(TableInfo.of(testTableId, StandardTableDefinition.of(tableSchema)));
    TableReference tableReference = TableReference.newBuilder().setTableId(testTableId.getTable()).setDatasetId(DATASET).setProjectId(ServiceOptions.getDefaultProjectId()).build();
    Job firstJob = RunQueryAppendJobAndExpectSuccess(/* destinationTableId = */
    testTableId, /* query = */
    "SELECT 1 AS col");
    Job secondJob = RunQueryAppendJobAndExpectSuccess(/* destinationTableId = */
    testTableId, /* query = */
    "SELECT 2 AS col");
    final List<Long> rowsAfterFirstSnapshot = new ArrayList<>();
    ProcessRowsAtSnapshot(/* tableReference = */
    tableReference, /* snapshotInMillis = */
    firstJob.getStatistics().getEndTime(), /* filter = */
    null, /* consumer = */
    new AvroRowConsumer() {

        @Override
        public void accept(GenericData.Record record) {
            rowsAfterFirstSnapshot.add((Long) record.get("col"));
        }
    });
    assertEquals(Arrays.asList(1L), rowsAfterFirstSnapshot);
    final List<Long> rowsAfterSecondSnapshot = new ArrayList<>();
    ProcessRowsAtSnapshot(/* tableReference = */
    tableReference, /* snapshotInMillis = */
    secondJob.getStatistics().getEndTime(), /* filter = */
    null, /* consumer = */
    new AvroRowConsumer() {

        @Override
        public void accept(GenericData.Record record) {
            rowsAfterSecondSnapshot.add((Long) record.get("col"));
        }
    });
    Collections.sort(rowsAfterSecondSnapshot);
    assertEquals(Arrays.asList(1L, 2L), rowsAfterSecondSnapshot);
}
Also used : TableId(com.google.cloud.bigquery.TableId) AvroRowConsumer(com.google.cloud.bigquery.storage.v1beta1.it.SimpleRowReader.AvroRowConsumer) ArrayList(java.util.ArrayList) GenericData(org.apache.avro.generic.GenericData) Field(com.google.cloud.bigquery.Field) TableReference(com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto.TableReference) Job(com.google.cloud.bigquery.Job) Test(org.junit.Test)

Aggregations

TableReference (com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto.TableReference)14 Test (org.junit.Test)14 GenericData (org.apache.avro.generic.GenericData)8 ReadSession (com.google.cloud.bigquery.storage.v1beta1.Storage.ReadSession)7 ReadRowsRequest (com.google.cloud.bigquery.storage.v1beta1.Storage.ReadRowsRequest)5 ReadRowsResponse (com.google.cloud.bigquery.storage.v1beta1.Storage.ReadRowsResponse)5 StreamPosition (com.google.cloud.bigquery.storage.v1beta1.Storage.StreamPosition)5 Schema (org.apache.avro.Schema)5 Utf8 (org.apache.avro.util.Utf8)5 CreateReadSessionRequest (com.google.cloud.bigquery.storage.v1beta1.Storage.CreateReadSessionRequest)4 AvroRowConsumer (com.google.cloud.bigquery.storage.v1beta1.it.SimpleRowReader.AvroRowConsumer)4 ArrayList (java.util.ArrayList)3 Field (com.google.cloud.bigquery.Field)2 TableId (com.google.cloud.bigquery.TableId)2 TableInfo (com.google.cloud.bigquery.TableInfo)2 BigQueryStorageClient (com.google.cloud.bigquery.storage.v1beta1.BigQueryStorageClient)2 TableReadOptions (com.google.cloud.bigquery.storage.v1beta1.ReadOptions.TableReadOptions)2 Storage (com.google.cloud.bigquery.storage.v1beta1.Storage)2 TableReferenceProto (com.google.cloud.bigquery.storage.v1beta1.TableReferenceProto)2 InvalidArgumentException (com.google.api.gax.rpc.InvalidArgumentException)1