use of org.apache.parquet.hadoop.metadata.BlockMetaData in project h2o-3 by h2oai.
the class VecParquetReader method initReader.
private void initReader() throws IOException {
assert reader == null;
List<BlockMetaData> blocks = metadata.getBlocks();
MessageType fileSchema = metadata.getFileMetaData().getSchema();
reader = new InternalParquetRecordReader<>(new ChunkReadSupport(writer, chunkSchema));
Configuration conf = VecFileSystem.makeConfiguration(vec);
reader.initialize(fileSchema, metadata.getFileMetaData().getKeyValueMetaData(), VecFileSystem.VEC_PATH, blocks, conf);
}
use of org.apache.parquet.hadoop.metadata.BlockMetaData in project drill by apache.
the class HiveDrillNativeScanBatchCreator method getRowGroupNumbersFromFileSplit.
/**
* Get the list of row group numbers for given file input split. Logic used here is same as how Hive's parquet input
* format finds the row group numbers for input split.
*/
private List<Integer> getRowGroupNumbersFromFileSplit(final FileSplit split, final ParquetMetadata footer) throws IOException {
final List<BlockMetaData> blocks = footer.getBlocks();
final long splitStart = split.getStart();
final long splitLength = split.getLength();
final List<Integer> rowGroupNums = Lists.newArrayList();
int i = 0;
for (final BlockMetaData block : blocks) {
final long firstDataPage = block.getColumns().get(0).getFirstDataPageOffset();
if (firstDataPage >= splitStart && firstDataPage < splitStart + splitLength) {
rowGroupNums.add(i);
}
i++;
}
return rowGroupNums;
}
Aggregations