Search in sources :

Example 11 with BloomFilter

use of io.trino.orc.metadata.statistics.BloomFilter in project trino by trinodb.

the class SliceDictionaryColumnWriter method getBloomFilters.

@Override
public List<StreamDataOutput> getBloomFilters(CompressedMetadataWriter metadataWriter) throws IOException {
    List<BloomFilter> bloomFilters = rowGroups.stream().map(rowGroup -> rowGroup.getColumnStatistics().getBloomFilter()).filter(Objects::nonNull).collect(toImmutableList());
    if (!bloomFilters.isEmpty()) {
        Slice slice = metadataWriter.writeBloomFilters(bloomFilters);
        Stream stream = new Stream(columnId, StreamKind.BLOOM_FILTER_UTF8, slice.length(), false);
        return ImmutableList.of(new StreamDataOutput(slice, stream));
    }
    return ImmutableList.of();
}
Also used : Slice(io.airlift.slice.Slice) PresentOutputStream(io.trino.orc.stream.PresentOutputStream) Stream(io.trino.orc.metadata.Stream) LongOutputStream(io.trino.orc.stream.LongOutputStream) LongOutputStream.createLengthOutputStream(io.trino.orc.stream.LongOutputStream.createLengthOutputStream) ByteArrayOutputStream(io.trino.orc.stream.ByteArrayOutputStream) StreamDataOutput(io.trino.orc.stream.StreamDataOutput) BloomFilter(io.trino.orc.metadata.statistics.BloomFilter)

Example 12 with BloomFilter

use of io.trino.orc.metadata.statistics.BloomFilter in project trino by trinodb.

the class SliceDirectColumnWriter method getBloomFilters.

@Override
public List<StreamDataOutput> getBloomFilters(CompressedMetadataWriter metadataWriter) throws IOException {
    List<BloomFilter> bloomFilters = rowGroupColumnStatistics.stream().map(ColumnStatistics::getBloomFilter).filter(Objects::nonNull).collect(toImmutableList());
    if (!bloomFilters.isEmpty()) {
        Slice slice = metadataWriter.writeBloomFilters(bloomFilters);
        Stream stream = new Stream(columnId, StreamKind.BLOOM_FILTER_UTF8, slice.length(), false);
        return ImmutableList.of(new StreamDataOutput(slice, stream));
    }
    return ImmutableList.of();
}
Also used : ColumnStatistics(io.trino.orc.metadata.statistics.ColumnStatistics) Slice(io.airlift.slice.Slice) PresentOutputStream(io.trino.orc.stream.PresentOutputStream) Stream(io.trino.orc.metadata.Stream) LongOutputStream(io.trino.orc.stream.LongOutputStream) LongOutputStream.createLengthOutputStream(io.trino.orc.stream.LongOutputStream.createLengthOutputStream) ByteArrayOutputStream(io.trino.orc.stream.ByteArrayOutputStream) StreamDataOutput(io.trino.orc.stream.StreamDataOutput) BloomFilter(io.trino.orc.metadata.statistics.BloomFilter)

Example 13 with BloomFilter

use of io.trino.orc.metadata.statistics.BloomFilter in project trino by trinodb.

the class StripeReader method readColumnIndexes.

private Map<StreamId, List<RowGroupIndex>> readColumnIndexes(Map<StreamId, Stream> streams, Map<StreamId, OrcChunkLoader> streamsData, Map<OrcColumnId, List<BloomFilter>> bloomFilterIndexes) throws IOException {
    ImmutableMap.Builder<StreamId, List<RowGroupIndex>> columnIndexes = ImmutableMap.builder();
    for (Entry<StreamId, Stream> entry : streams.entrySet()) {
        Stream stream = entry.getValue();
        if (stream.getStreamKind() == ROW_INDEX) {
            OrcInputStream inputStream = new OrcInputStream(streamsData.get(entry.getKey()));
            List<BloomFilter> bloomFilters = bloomFilterIndexes.get(entry.getKey().getColumnId());
            List<RowGroupIndex> rowGroupIndexes = metadataReader.readRowIndexes(hiveWriterVersion, inputStream);
            if (bloomFilters != null && !bloomFilters.isEmpty()) {
                ImmutableList.Builder<RowGroupIndex> newRowGroupIndexes = ImmutableList.builder();
                for (int i = 0; i < rowGroupIndexes.size(); i++) {
                    RowGroupIndex rowGroupIndex = rowGroupIndexes.get(i);
                    ColumnStatistics columnStatistics = rowGroupIndex.getColumnStatistics().withBloomFilter(bloomFilters.get(i));
                    newRowGroupIndexes.add(new RowGroupIndex(rowGroupIndex.getPositions(), columnStatistics));
                }
                rowGroupIndexes = newRowGroupIndexes.build();
            }
            columnIndexes.put(entry.getKey(), rowGroupIndexes);
        }
    }
    return columnIndexes.buildOrThrow();
}
Also used : ColumnStatistics(io.trino.orc.metadata.statistics.ColumnStatistics) OrcInputStream(io.trino.orc.stream.OrcInputStream) ImmutableList(com.google.common.collect.ImmutableList) ImmutableMap(com.google.common.collect.ImmutableMap) ImmutableMap.toImmutableMap(com.google.common.collect.ImmutableMap.toImmutableMap) BloomFilter(io.trino.orc.metadata.statistics.BloomFilter) StreamCheckpoint(io.trino.orc.checkpoint.StreamCheckpoint) Checkpoints.getDictionaryStreamCheckpoint(io.trino.orc.checkpoint.Checkpoints.getDictionaryStreamCheckpoint) RowGroupIndex(io.trino.orc.metadata.RowGroupIndex) List(java.util.List) ArrayList(java.util.ArrayList) ImmutableList(com.google.common.collect.ImmutableList) Stream(io.trino.orc.metadata.Stream) OrcInputStream(io.trino.orc.stream.OrcInputStream) ValueInputStream(io.trino.orc.stream.ValueInputStream) InputStream(java.io.InputStream)

Example 14 with BloomFilter

use of io.trino.orc.metadata.statistics.BloomFilter in project trino by trinodb.

the class TupleDomainOrcPredicate method columnOverlaps.

private boolean columnOverlaps(Domain predicateDomain, long numberOfRows, ColumnStatistics columnStatistics) {
    Domain stripeDomain = getDomain(predicateDomain.getType(), numberOfRows, columnStatistics);
    if (!stripeDomain.overlaps(predicateDomain)) {
        // there is no overlap between the predicate and this column
        return false;
    }
    // if bloom filters are not enabled, we cannot restrict the range overlap
    if (!orcBloomFiltersEnabled) {
        return true;
    }
    // if there an overlap in null values, the bloom filter cannot eliminate the overlap
    if (predicateDomain.isNullAllowed() && stripeDomain.isNullAllowed()) {
        return true;
    }
    // extract the discrete values from the predicate
    Optional<Collection<Object>> discreteValues = extractDiscreteValues(predicateDomain.getValues());
    if (discreteValues.isEmpty()) {
        // values are not discrete, so we can't exclude this section
        return true;
    }
    BloomFilter bloomFilter = columnStatistics.getBloomFilter();
    if (bloomFilter == null) {
        // no bloom filter so we can't exclude this section
        return true;
    }
    // if none of the discrete predicate values are found in the bloom filter, there is no overlap and the section should be skipped
    return discreteValues.get().stream().anyMatch(value -> checkInBloomFilter(bloomFilter, value, stripeDomain.getType()));
}
Also used : Collection(java.util.Collection) Domain(io.trino.spi.predicate.Domain) BloomFilter(io.trino.orc.metadata.statistics.BloomFilter)

Aggregations

BloomFilter (io.trino.orc.metadata.statistics.BloomFilter)14 Slice (io.airlift.slice.Slice)8 Stream (io.trino.orc.metadata.Stream)7 ColumnStatistics (io.trino.orc.metadata.statistics.ColumnStatistics)6 PresentOutputStream (io.trino.orc.stream.PresentOutputStream)6 StreamDataOutput (io.trino.orc.stream.StreamDataOutput)6 TupleDomainOrcPredicate.checkInBloomFilter (io.trino.orc.TupleDomainOrcPredicate.checkInBloomFilter)5 Test (org.testng.annotations.Test)5 ImmutableMap (com.google.common.collect.ImmutableMap)3 LongOutputStream (io.trino.orc.stream.LongOutputStream)3 ImmutableList (com.google.common.collect.ImmutableList)2 OrcProto (io.trino.orc.proto.OrcProto)2 CodedInputStream (io.trino.orc.protobuf.CodedInputStream)2 ByteArrayOutputStream (io.trino.orc.stream.ByteArrayOutputStream)2 LongOutputStream.createLengthOutputStream (io.trino.orc.stream.LongOutputStream.createLengthOutputStream)2 RealType (io.trino.spi.type.RealType)2 Type (io.trino.spi.type.Type)2 InputStream (java.io.InputStream)2 Map (java.util.Map)2 ImmutableList.toImmutableList (com.google.common.collect.ImmutableList.toImmutableList)1