Search in sources :

Example 1 with ArrowSchemaConverter

use of com.google.cloud.spark.bigquery.ArrowSchemaConverter in project spark-bigquery-connector by GoogleCloudDataproc.

the class ArrowColumnBatchPartitionReaderContext method next.

public boolean next() throws IOException {
    tracer.nextBatchNeeded();
    if (closed) {
        return false;
    }
    tracer.rowsParseStarted();
    closed = !reader.loadNextBatch();
    if (closed) {
        return false;
    }
    VectorSchemaRoot root = reader.root();
    if (currentBatch == null) {
        // trying to verify from dev@spark but this object
        // should only need to get created once.  The underlying
        // vectors should stay the same.
        ColumnVector[] columns = namesInOrder.stream().map(root::getVector).map(vector -> new ArrowSchemaConverter(vector, userProvidedFieldMap.get(vector.getName()))).toArray(ColumnVector[]::new);
        currentBatch = new ColumnarBatch(columns);
    }
    currentBatch.setNumRows(root.getRowCount());
    tracer.rowsParseFinished(currentBatch.numRows());
    return true;
}
Also used : VectorLoader(org.apache.arrow.vector.VectorLoader) MoreExecutors(com.google.common.util.concurrent.MoreExecutors) Arrays(java.util.Arrays) Schema(org.apache.arrow.vector.types.pojo.Schema) ThreadPoolExecutor(java.util.concurrent.ThreadPoolExecutor) ReadRowsResponse(com.google.cloud.bigquery.storage.v1.ReadRowsResponse) ArrowSchemaConverter(com.google.cloud.spark.bigquery.ArrowSchemaConverter) ArrayList(java.util.ArrayList) IteratorMultiplexer(com.google.cloud.bigquery.connector.common.IteratorMultiplexer) ParallelArrowReader(com.google.cloud.bigquery.connector.common.ParallelArrowReader) ImmutableList(com.google.common.collect.ImmutableList) Map(java.util.Map) AutoCloseables(org.apache.arrow.util.AutoCloseables) ArrowStreamReader(org.apache.arrow.vector.ipc.ArrowStreamReader) ExecutorService(java.util.concurrent.ExecutorService) BufferAllocator(org.apache.arrow.memory.BufferAllocator) StructField(org.apache.spark.sql.types.StructField) StructType(org.apache.spark.sql.types.StructType) NonInterruptibleBlockingBytesChannel(com.google.cloud.bigquery.connector.common.NonInterruptibleBlockingBytesChannel) ArrowReader(org.apache.arrow.vector.ipc.ArrowReader) ColumnVector(org.apache.spark.sql.vectorized.ColumnVector) Iterator(java.util.Iterator) ReadRowsResponseInputStreamEnumeration(com.google.cloud.bigquery.connector.common.ReadRowsResponseInputStreamEnumeration) SynchronousQueue(java.util.concurrent.SynchronousQueue) SequenceInputStream(java.io.SequenceInputStream) CommonsCompressionFactory(org.apache.arrow.compression.CommonsCompressionFactory) VectorSchemaRoot(org.apache.arrow.vector.VectorSchemaRoot) IOException(java.io.IOException) ArrowUtil(com.google.cloud.bigquery.connector.common.ArrowUtil) Collectors(java.util.stream.Collectors) ByteString(com.google.protobuf.ByteString) TimeUnit(java.util.concurrent.TimeUnit) BigQueryStorageReadRowsTracer(com.google.cloud.bigquery.connector.common.BigQueryStorageReadRowsTracer) List(java.util.List) ColumnarBatch(org.apache.spark.sql.vectorized.ColumnarBatch) Optional(java.util.Optional) ReadRowsHelper(com.google.cloud.bigquery.connector.common.ReadRowsHelper) InputStream(java.io.InputStream) VectorSchemaRoot(org.apache.arrow.vector.VectorSchemaRoot) ColumnarBatch(org.apache.spark.sql.vectorized.ColumnarBatch) ColumnVector(org.apache.spark.sql.vectorized.ColumnVector) ArrowSchemaConverter(com.google.cloud.spark.bigquery.ArrowSchemaConverter)

Aggregations

ArrowUtil (com.google.cloud.bigquery.connector.common.ArrowUtil)1 BigQueryStorageReadRowsTracer (com.google.cloud.bigquery.connector.common.BigQueryStorageReadRowsTracer)1 IteratorMultiplexer (com.google.cloud.bigquery.connector.common.IteratorMultiplexer)1 NonInterruptibleBlockingBytesChannel (com.google.cloud.bigquery.connector.common.NonInterruptibleBlockingBytesChannel)1 ParallelArrowReader (com.google.cloud.bigquery.connector.common.ParallelArrowReader)1 ReadRowsHelper (com.google.cloud.bigquery.connector.common.ReadRowsHelper)1 ReadRowsResponseInputStreamEnumeration (com.google.cloud.bigquery.connector.common.ReadRowsResponseInputStreamEnumeration)1 ReadRowsResponse (com.google.cloud.bigquery.storage.v1.ReadRowsResponse)1 ArrowSchemaConverter (com.google.cloud.spark.bigquery.ArrowSchemaConverter)1 ImmutableList (com.google.common.collect.ImmutableList)1 MoreExecutors (com.google.common.util.concurrent.MoreExecutors)1 ByteString (com.google.protobuf.ByteString)1 IOException (java.io.IOException)1 InputStream (java.io.InputStream)1 SequenceInputStream (java.io.SequenceInputStream)1 ArrayList (java.util.ArrayList)1 Arrays (java.util.Arrays)1 Iterator (java.util.Iterator)1 List (java.util.List)1 Map (java.util.Map)1