Search in sources :

Example 1 with BigQueryTracerFactory

use of com.google.cloud.bigquery.connector.common.BigQueryTracerFactory in project spark-bigquery-connector by GoogleCloudDataproc.

the class BigQueryDataSourceReaderContext method planBatchInputPartitionContexts.

public Stream<InputPartitionContext<ColumnarBatch>> planBatchInputPartitionContexts() {
    if (!enableBatchRead()) {
        throw new IllegalStateException("Batch reads should not be enabled");
    }
    ImmutableList<String> selectedFields = schema.map(requiredSchema -> ImmutableList.copyOf(requiredSchema.fieldNames())).orElse(ImmutableList.copyOf(fields.keySet()));
    Optional<String> filter = getCombinedFilter();
    ReadSessionResponse readSessionResponse = readSessionCreator.create(tableId, selectedFields, filter);
    ReadSession readSession = readSessionResponse.getReadSession();
    logger.info("Created read session for {}: {} for application id: {}", tableId.toString(), readSession.getName(), applicationId);
    if (selectedFields.isEmpty()) {
        // means select *
        Schema tableSchema = SchemaConverters.getSchemaWithPseudoColumns(readSessionResponse.getReadTableInfo());
        selectedFields = tableSchema.getFields().stream().map(Field::getName).collect(ImmutableList.toImmutableList());
    }
    ImmutableList<String> partitionSelectedFields = selectedFields;
    return Streams.stream(Iterables.partition(readSession.getStreamsList(), readSessionCreatorConfig.streamsPerPartition())).map(streams -> new ArrowInputPartitionContext(bigQueryReadClientFactory, bigQueryTracerFactory, streams.stream().map(ReadStream::getName).collect(Collectors.toCollection(ArrayList::new)), readSessionCreatorConfig.toReadRowsHelperOptions(), partitionSelectedFields, readSessionResponse, userProvidedSchema));
}
Also used : IntStream(java.util.stream.IntStream) Iterables(com.google.common.collect.Iterables) InternalRow(org.apache.spark.sql.catalyst.InternalRow) TableId(com.google.cloud.bigquery.TableId) LoggerFactory(org.slf4j.LoggerFactory) ArrayList(java.util.ArrayList) LinkedHashMap(java.util.LinkedHashMap) OptionalLong(java.util.OptionalLong) ImmutableList(com.google.common.collect.ImmutableList) Schema(com.google.cloud.bigquery.Schema) Map(java.util.Map) ReadSessionResponse(com.google.cloud.bigquery.connector.common.ReadSessionResponse) StructField(org.apache.spark.sql.types.StructField) StructType(org.apache.spark.sql.types.StructType) Field(com.google.cloud.bigquery.Field) TableDefinition(com.google.cloud.bigquery.TableDefinition) ReadSessionCreator(com.google.cloud.bigquery.connector.common.ReadSessionCreator) JavaConversions(scala.collection.JavaConversions) ReadStream(com.google.cloud.bigquery.storage.v1.ReadStream) ImmutableSet(com.google.common.collect.ImmutableSet) Logger(org.slf4j.Logger) ReadSessionCreatorConfig(com.google.cloud.bigquery.connector.common.ReadSessionCreatorConfig) ReadSession(com.google.cloud.bigquery.storage.v1.ReadSession) BigQueryClient(com.google.cloud.bigquery.connector.common.BigQueryClient) Set(java.util.Set) SchemaConverters(com.google.cloud.spark.bigquery.SchemaConverters) Streams(com.google.common.collect.Streams) Collectors(java.util.stream.Collectors) DataFormat(com.google.cloud.bigquery.storage.v1.DataFormat) List(java.util.List) Stream(java.util.stream.Stream) ColumnarBatch(org.apache.spark.sql.vectorized.ColumnarBatch) ReadRowsResponseToInternalRowIteratorConverter(com.google.cloud.spark.bigquery.ReadRowsResponseToInternalRowIteratorConverter) BigQueryClientFactory(com.google.cloud.bigquery.connector.common.BigQueryClientFactory) SparkFilterUtils(com.google.cloud.spark.bigquery.SparkFilterUtils) Optional(java.util.Optional) Filter(org.apache.spark.sql.sources.Filter) TableInfo(com.google.cloud.bigquery.TableInfo) BigQueryUtil(com.google.cloud.bigquery.connector.common.BigQueryUtil) BigQueryTracerFactory(com.google.cloud.bigquery.connector.common.BigQueryTracerFactory) StructField(org.apache.spark.sql.types.StructField) Field(com.google.cloud.bigquery.Field) ReadSessionResponse(com.google.cloud.bigquery.connector.common.ReadSessionResponse) ReadSession(com.google.cloud.bigquery.storage.v1.ReadSession) Schema(com.google.cloud.bigquery.Schema)

Aggregations

Field (com.google.cloud.bigquery.Field)1 Schema (com.google.cloud.bigquery.Schema)1 TableDefinition (com.google.cloud.bigquery.TableDefinition)1 TableId (com.google.cloud.bigquery.TableId)1 TableInfo (com.google.cloud.bigquery.TableInfo)1 BigQueryClient (com.google.cloud.bigquery.connector.common.BigQueryClient)1 BigQueryClientFactory (com.google.cloud.bigquery.connector.common.BigQueryClientFactory)1 BigQueryTracerFactory (com.google.cloud.bigquery.connector.common.BigQueryTracerFactory)1 BigQueryUtil (com.google.cloud.bigquery.connector.common.BigQueryUtil)1 ReadSessionCreator (com.google.cloud.bigquery.connector.common.ReadSessionCreator)1 ReadSessionCreatorConfig (com.google.cloud.bigquery.connector.common.ReadSessionCreatorConfig)1 ReadSessionResponse (com.google.cloud.bigquery.connector.common.ReadSessionResponse)1 DataFormat (com.google.cloud.bigquery.storage.v1.DataFormat)1 ReadSession (com.google.cloud.bigquery.storage.v1.ReadSession)1 ReadStream (com.google.cloud.bigquery.storage.v1.ReadStream)1 ReadRowsResponseToInternalRowIteratorConverter (com.google.cloud.spark.bigquery.ReadRowsResponseToInternalRowIteratorConverter)1 SchemaConverters (com.google.cloud.spark.bigquery.SchemaConverters)1 SparkFilterUtils (com.google.cloud.spark.bigquery.SparkFilterUtils)1 ImmutableList (com.google.common.collect.ImmutableList)1 ImmutableSet (com.google.common.collect.ImmutableSet)1