Search in sources :

Example 1 with ColumnStatistics

use of com.facebook.presto.orc.metadata.ColumnStatistics in project presto by prestodb.

the class StripeReader method selectRowGroups.

private Set<Integer> selectRowGroups(StripeInformation stripe, Map<Integer, List<RowGroupIndex>> columnIndexes) throws IOException {
    int rowsInStripe = toIntExact(stripe.getNumberOfRows());
    int groupsInStripe = ceil(rowsInStripe, rowsInRowGroup);
    ImmutableSet.Builder<Integer> selectedRowGroups = ImmutableSet.builder();
    int remainingRows = rowsInStripe;
    for (int rowGroup = 0; rowGroup < groupsInStripe; ++rowGroup) {
        int rows = Math.min(remainingRows, rowsInRowGroup);
        Map<Integer, ColumnStatistics> statistics = getRowGroupStatistics(types.get(0), columnIndexes, rowGroup);
        if (predicate.matches(rows, statistics)) {
            selectedRowGroups.add(rowGroup);
        }
        remainingRows -= rows;
    }
    return selectedRowGroups.build();
}
Also used : ColumnStatistics(com.facebook.presto.orc.metadata.ColumnStatistics) ImmutableSet(com.google.common.collect.ImmutableSet) Checkpoints.getDictionaryStreamCheckpoint(com.facebook.presto.orc.checkpoint.Checkpoints.getDictionaryStreamCheckpoint) StreamCheckpoint(com.facebook.presto.orc.checkpoint.StreamCheckpoint)

Example 2 with ColumnStatistics

use of com.facebook.presto.orc.metadata.ColumnStatistics in project presto by prestodb.

the class StripeReader method getRowGroupStatistics.

private static Map<Integer, ColumnStatistics> getRowGroupStatistics(OrcType rootStructType, Map<Integer, List<RowGroupIndex>> columnIndexes, int rowGroup) {
    requireNonNull(rootStructType, "rootStructType is null");
    checkArgument(rootStructType.getOrcTypeKind() == OrcTypeKind.STRUCT);
    requireNonNull(columnIndexes, "columnIndexes is null");
    checkArgument(rowGroup >= 0, "rowGroup is negative");
    ImmutableMap.Builder<Integer, ColumnStatistics> statistics = ImmutableMap.builder();
    for (int ordinal = 0; ordinal < rootStructType.getFieldCount(); ordinal++) {
        List<RowGroupIndex> rowGroupIndexes = columnIndexes.get(rootStructType.getFieldTypeIndex(ordinal));
        if (rowGroupIndexes != null) {
            statistics.put(ordinal, rowGroupIndexes.get(rowGroup).getColumnStatistics());
        }
    }
    return statistics.build();
}
Also used : ColumnStatistics(com.facebook.presto.orc.metadata.ColumnStatistics) RowGroupIndex(com.facebook.presto.orc.metadata.RowGroupIndex) ImmutableMap(com.google.common.collect.ImmutableMap) Checkpoints.getDictionaryStreamCheckpoint(com.facebook.presto.orc.checkpoint.Checkpoints.getDictionaryStreamCheckpoint) StreamCheckpoint(com.facebook.presto.orc.checkpoint.StreamCheckpoint)

Example 3 with ColumnStatistics

use of com.facebook.presto.orc.metadata.ColumnStatistics in project presto by prestodb.

the class StripeReader method readColumnIndexes.

private Map<Integer, List<RowGroupIndex>> readColumnIndexes(Map<StreamId, Stream> streams, Map<StreamId, OrcInputStream> streamsData, Map<Integer, List<HiveBloomFilter>> bloomFilterIndexes) throws IOException {
    ImmutableMap.Builder<Integer, List<RowGroupIndex>> columnIndexes = ImmutableMap.builder();
    for (Entry<StreamId, Stream> entry : streams.entrySet()) {
        Stream stream = entry.getValue();
        if (stream.getStreamKind() == ROW_INDEX) {
            OrcInputStream inputStream = streamsData.get(entry.getKey());
            List<HiveBloomFilter> bloomFilters = bloomFilterIndexes.get(stream.getColumn());
            List<RowGroupIndex> rowGroupIndexes = metadataReader.readRowIndexes(hiveWriterVersion, inputStream);
            if (bloomFilters != null && !bloomFilters.isEmpty()) {
                ImmutableList.Builder<RowGroupIndex> newRowGroupIndexes = ImmutableList.builder();
                for (int i = 0; i < rowGroupIndexes.size(); i++) {
                    RowGroupIndex rowGroupIndex = rowGroupIndexes.get(i);
                    ColumnStatistics columnStatistics = rowGroupIndex.getColumnStatistics().withBloomFilter(bloomFilters.get(i));
                    newRowGroupIndexes.add(new RowGroupIndex(rowGroupIndex.getPositions(), columnStatistics));
                }
                rowGroupIndexes = newRowGroupIndexes.build();
            }
            columnIndexes.put(stream.getColumn(), rowGroupIndexes);
        }
    }
    return columnIndexes.build();
}
Also used : ColumnStatistics(com.facebook.presto.orc.metadata.ColumnStatistics) OrcInputStream(com.facebook.presto.orc.stream.OrcInputStream) ImmutableList(com.google.common.collect.ImmutableList) ImmutableMap(com.google.common.collect.ImmutableMap) Checkpoints.getDictionaryStreamCheckpoint(com.facebook.presto.orc.checkpoint.Checkpoints.getDictionaryStreamCheckpoint) StreamCheckpoint(com.facebook.presto.orc.checkpoint.StreamCheckpoint) HiveBloomFilter(com.facebook.presto.orc.metadata.HiveBloomFilter) RowGroupIndex(com.facebook.presto.orc.metadata.RowGroupIndex) List(java.util.List) ImmutableList(com.google.common.collect.ImmutableList) ValueStream(com.facebook.presto.orc.stream.ValueStream) OrcInputStream(com.facebook.presto.orc.stream.OrcInputStream) Stream(com.facebook.presto.orc.metadata.Stream) InputStream(java.io.InputStream)

Example 4 with ColumnStatistics

use of com.facebook.presto.orc.metadata.ColumnStatistics in project presto by prestodb.

the class OrcRecordReader method getStatisticsByColumnOrdinal.

private static Map<Integer, ColumnStatistics> getStatisticsByColumnOrdinal(OrcType rootStructType, List<ColumnStatistics> fileStats) {
    requireNonNull(rootStructType, "rootStructType is null");
    checkArgument(rootStructType.getOrcTypeKind() == OrcTypeKind.STRUCT);
    requireNonNull(fileStats, "fileStats is null");
    ImmutableMap.Builder<Integer, ColumnStatistics> statistics = ImmutableMap.builder();
    for (int ordinal = 0; ordinal < rootStructType.getFieldCount(); ordinal++) {
        ColumnStatistics element = fileStats.get(rootStructType.getFieldTypeIndex(ordinal));
        if (element != null) {
            statistics.put(ordinal, element);
        }
    }
    return statistics.build();
}
Also used : ColumnStatistics(com.facebook.presto.orc.metadata.ColumnStatistics) ImmutableMap(com.google.common.collect.ImmutableMap)

Aggregations

ColumnStatistics (com.facebook.presto.orc.metadata.ColumnStatistics)4 Checkpoints.getDictionaryStreamCheckpoint (com.facebook.presto.orc.checkpoint.Checkpoints.getDictionaryStreamCheckpoint)3 StreamCheckpoint (com.facebook.presto.orc.checkpoint.StreamCheckpoint)3 ImmutableMap (com.google.common.collect.ImmutableMap)3 RowGroupIndex (com.facebook.presto.orc.metadata.RowGroupIndex)2 HiveBloomFilter (com.facebook.presto.orc.metadata.HiveBloomFilter)1 Stream (com.facebook.presto.orc.metadata.Stream)1 OrcInputStream (com.facebook.presto.orc.stream.OrcInputStream)1 ValueStream (com.facebook.presto.orc.stream.ValueStream)1 ImmutableList (com.google.common.collect.ImmutableList)1 ImmutableSet (com.google.common.collect.ImmutableSet)1 InputStream (java.io.InputStream)1 List (java.util.List)1