Search in sources :

Example 1 with ReadRowsResponseToInternalRowIteratorConverter

use of com.google.cloud.spark.bigquery.ReadRowsResponseToInternalRowIteratorConverter in project spark-bigquery-connector by GoogleCloudDataproc.

the class BigQueryDataSourceReaderContext method createConverter.

private ReadRowsResponseToInternalRowIteratorConverter createConverter(ImmutableList<String> selectedFields, ReadSessionResponse readSessionResponse, Optional<StructType> userProvidedSchema) {
    ReadRowsResponseToInternalRowIteratorConverter converter;
    DataFormat format = readSessionCreatorConfig.getReadDataFormat();
    if (format == DataFormat.AVRO) {
        Schema schema = SchemaConverters.getSchemaWithPseudoColumns(readSessionResponse.getReadTableInfo());
        if (selectedFields.isEmpty()) {
            // means select *
            selectedFields = schema.getFields().stream().map(Field::getName).collect(ImmutableList.toImmutableList());
        } else {
            Set<String> requiredColumnSet = ImmutableSet.copyOf(selectedFields);
            schema = Schema.of(schema.getFields().stream().filter(field -> requiredColumnSet.contains(field.getName())).collect(Collectors.toList()));
        }
        return ReadRowsResponseToInternalRowIteratorConverter.avro(schema, selectedFields, readSessionResponse.getReadSession().getAvroSchema().getSchema(), userProvidedSchema);
    }
    throw new IllegalArgumentException("No known converted for " + readSessionCreatorConfig.getReadDataFormat());
}
Also used : ReadRowsResponseToInternalRowIteratorConverter(com.google.cloud.spark.bigquery.ReadRowsResponseToInternalRowIteratorConverter) IntStream(java.util.stream.IntStream) Iterables(com.google.common.collect.Iterables) InternalRow(org.apache.spark.sql.catalyst.InternalRow) TableId(com.google.cloud.bigquery.TableId) LoggerFactory(org.slf4j.LoggerFactory) ArrayList(java.util.ArrayList) LinkedHashMap(java.util.LinkedHashMap) OptionalLong(java.util.OptionalLong) ImmutableList(com.google.common.collect.ImmutableList) Schema(com.google.cloud.bigquery.Schema) Map(java.util.Map) ReadSessionResponse(com.google.cloud.bigquery.connector.common.ReadSessionResponse) StructField(org.apache.spark.sql.types.StructField) StructType(org.apache.spark.sql.types.StructType) Field(com.google.cloud.bigquery.Field) TableDefinition(com.google.cloud.bigquery.TableDefinition) ReadSessionCreator(com.google.cloud.bigquery.connector.common.ReadSessionCreator) JavaConversions(scala.collection.JavaConversions) ReadStream(com.google.cloud.bigquery.storage.v1.ReadStream) ImmutableSet(com.google.common.collect.ImmutableSet) Logger(org.slf4j.Logger) ReadSessionCreatorConfig(com.google.cloud.bigquery.connector.common.ReadSessionCreatorConfig) ReadSession(com.google.cloud.bigquery.storage.v1.ReadSession) BigQueryClient(com.google.cloud.bigquery.connector.common.BigQueryClient) Set(java.util.Set) SchemaConverters(com.google.cloud.spark.bigquery.SchemaConverters) Streams(com.google.common.collect.Streams) Collectors(java.util.stream.Collectors) DataFormat(com.google.cloud.bigquery.storage.v1.DataFormat) List(java.util.List) Stream(java.util.stream.Stream) ColumnarBatch(org.apache.spark.sql.vectorized.ColumnarBatch) ReadRowsResponseToInternalRowIteratorConverter(com.google.cloud.spark.bigquery.ReadRowsResponseToInternalRowIteratorConverter) BigQueryClientFactory(com.google.cloud.bigquery.connector.common.BigQueryClientFactory) SparkFilterUtils(com.google.cloud.spark.bigquery.SparkFilterUtils) Optional(java.util.Optional) Filter(org.apache.spark.sql.sources.Filter) TableInfo(com.google.cloud.bigquery.TableInfo) BigQueryUtil(com.google.cloud.bigquery.connector.common.BigQueryUtil) BigQueryTracerFactory(com.google.cloud.bigquery.connector.common.BigQueryTracerFactory) StructField(org.apache.spark.sql.types.StructField) Field(com.google.cloud.bigquery.Field) Schema(com.google.cloud.bigquery.Schema) DataFormat(com.google.cloud.bigquery.storage.v1.DataFormat)

Example 2 with ReadRowsResponseToInternalRowIteratorConverter

use of com.google.cloud.spark.bigquery.ReadRowsResponseToInternalRowIteratorConverter in project spark-bigquery-connector by GoogleCloudDataproc.

the class BigQueryInputPartitionReaderContextTest method testReadAvro.

@Test
public void testReadAvro() throws Exception {
    TableInfo allTypesTableInfo = allTypesTableInfo();
    ReadRowsResponse.Builder readRowsResponse = ReadRowsResponse.newBuilder();
    TextFormat.merge(ALL_TYPES_TABLE_READ_ROWS_RESPONSE_STR, readRowsResponse);
    Iterator<ReadRowsResponse> readRowsResponses = ImmutableList.of(readRowsResponse.build()).iterator();
    ReadRowsResponseToInternalRowIteratorConverter converter = ReadRowsResponseToInternalRowIteratorConverter.avro(ALL_TYPES_TABLE_BIGQUERY_SCHEMA, ALL_TYPES_TABLE_FIELDS, ALL_TYPES_TABLE_AVRO_RAW_SCHEMA, Optional.empty());
    BigQueryInputPartitionReaderContext reader = new BigQueryInputPartitionReaderContext(readRowsResponses, converter, null);
    assertThat(reader.next()).isTrue();
    InternalRow row = reader.get();
    assertThat(reader.next()).isFalse();
    assertThat(row.numFields()).isEqualTo(15);
    assertThat(row.getString(0)).isEqualTo("hello");
}
Also used : ReadRowsResponseToInternalRowIteratorConverter(com.google.cloud.spark.bigquery.ReadRowsResponseToInternalRowIteratorConverter) ReadRowsResponse(com.google.cloud.bigquery.storage.v1.ReadRowsResponse) InternalRow(org.apache.spark.sql.catalyst.InternalRow) Test(org.junit.Test)

Aggregations

ReadRowsResponseToInternalRowIteratorConverter (com.google.cloud.spark.bigquery.ReadRowsResponseToInternalRowIteratorConverter)2 InternalRow (org.apache.spark.sql.catalyst.InternalRow)2 Field (com.google.cloud.bigquery.Field)1 Schema (com.google.cloud.bigquery.Schema)1 TableDefinition (com.google.cloud.bigquery.TableDefinition)1 TableId (com.google.cloud.bigquery.TableId)1 TableInfo (com.google.cloud.bigquery.TableInfo)1 BigQueryClient (com.google.cloud.bigquery.connector.common.BigQueryClient)1 BigQueryClientFactory (com.google.cloud.bigquery.connector.common.BigQueryClientFactory)1 BigQueryTracerFactory (com.google.cloud.bigquery.connector.common.BigQueryTracerFactory)1 BigQueryUtil (com.google.cloud.bigquery.connector.common.BigQueryUtil)1 ReadSessionCreator (com.google.cloud.bigquery.connector.common.ReadSessionCreator)1 ReadSessionCreatorConfig (com.google.cloud.bigquery.connector.common.ReadSessionCreatorConfig)1 ReadSessionResponse (com.google.cloud.bigquery.connector.common.ReadSessionResponse)1 DataFormat (com.google.cloud.bigquery.storage.v1.DataFormat)1 ReadRowsResponse (com.google.cloud.bigquery.storage.v1.ReadRowsResponse)1 ReadSession (com.google.cloud.bigquery.storage.v1.ReadSession)1 ReadStream (com.google.cloud.bigquery.storage.v1.ReadStream)1 SchemaConverters (com.google.cloud.spark.bigquery.SchemaConverters)1 SparkFilterUtils (com.google.cloud.spark.bigquery.SparkFilterUtils)1