Search in sources :

Example 6 with BlockMetaData

use of org.apache.parquet.hadoop.metadata.BlockMetaData in project h2o-3 by h2oai.

the class VecParquetReader method initReader.

private void initReader() throws IOException {
    assert reader == null;
    List<BlockMetaData> blocks = metadata.getBlocks();
    MessageType fileSchema = metadata.getFileMetaData().getSchema();
    reader = new InternalParquetRecordReader<>(new ChunkReadSupport(writer, chunkSchema));
    Configuration conf = VecFileSystem.makeConfiguration(vec);
    reader.initialize(fileSchema, metadata.getFileMetaData().getKeyValueMetaData(), VecFileSystem.VEC_PATH, blocks, conf);
}
Also used : BlockMetaData(org.apache.parquet.hadoop.metadata.BlockMetaData) Configuration(org.apache.hadoop.conf.Configuration) MessageType(org.apache.parquet.schema.MessageType) ChunkReadSupport(water.parser.parquet.ChunkReadSupport)

Example 7 with BlockMetaData

use of org.apache.parquet.hadoop.metadata.BlockMetaData in project drill by apache.

the class HiveDrillNativeScanBatchCreator method getRowGroupNumbersFromFileSplit.

/**
   * Get the list of row group numbers for given file input split. Logic used here is same as how Hive's parquet input
   * format finds the row group numbers for input split.
   */
private List<Integer> getRowGroupNumbersFromFileSplit(final FileSplit split, final ParquetMetadata footer) throws IOException {
    final List<BlockMetaData> blocks = footer.getBlocks();
    final long splitStart = split.getStart();
    final long splitLength = split.getLength();
    final List<Integer> rowGroupNums = Lists.newArrayList();
    int i = 0;
    for (final BlockMetaData block : blocks) {
        final long firstDataPage = block.getColumns().get(0).getFirstDataPageOffset();
        if (firstDataPage >= splitStart && firstDataPage < splitStart + splitLength) {
            rowGroupNums.add(i);
        }
        i++;
    }
    return rowGroupNums;
}
Also used : BlockMetaData(org.apache.parquet.hadoop.metadata.BlockMetaData)

Aggregations

BlockMetaData (org.apache.parquet.hadoop.metadata.BlockMetaData)7 ArrayList (java.util.ArrayList)4 MessageType (org.apache.parquet.schema.MessageType)4 ParquetMetadata (org.apache.parquet.hadoop.metadata.ParquetMetadata)3 SchemaPath (org.apache.drill.common.expression.SchemaPath)2 Path (org.apache.hadoop.fs.Path)2 FilterCompat (org.apache.parquet.filter2.compat.FilterCompat)2 ColumnChunkMetaData (org.apache.parquet.hadoop.metadata.ColumnChunkMetaData)2 IOException (java.io.IOException)1 HashMap (java.util.HashMap)1 HashSet (java.util.HashSet)1 DrillRuntimeException (org.apache.drill.common.exceptions.DrillRuntimeException)1 ExecutionSetupException (org.apache.drill.common.exceptions.ExecutionSetupException)1 OutOfMemoryException (org.apache.drill.exec.exception.OutOfMemoryException)1 ParquetDirectByteBufferAllocator (org.apache.drill.exec.store.parquet.ParquetDirectByteBufferAllocator)1 ValueVector (org.apache.drill.exec.vector.ValueVector)1 VectorContainerWriter (org.apache.drill.exec.vector.complex.impl.VectorContainerWriter)1 Configuration (org.apache.hadoop.conf.Configuration)1 DataWritableReadSupport (org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport)1 FileSplit (org.apache.hadoop.mapred.FileSplit)1