Search in sources :

Example 6 with HiveBloomFilter

use of com.facebook.presto.orc.metadata.statistics.HiveBloomFilter in project presto by prestodb.

the class TestOrcBloomFilters method testHiveBloomFilterSerde.

@Test
public void testHiveBloomFilterSerde() {
    BloomFilter bloomFilter = new BloomFilter(1_000_000L, 0.05);
    // String
    bloomFilter.addString(TEST_STRING);
    assertTrue(bloomFilter.testString(TEST_STRING));
    assertFalse(bloomFilter.testString(TEST_STRING_NOT_WRITTEN));
    // Integer
    bloomFilter.addLong(TEST_INTEGER);
    assertTrue(bloomFilter.testLong(TEST_INTEGER));
    assertFalse(bloomFilter.testLong(TEST_INTEGER + 1));
    // Re-construct
    HiveBloomFilter hiveBloomFilter = new HiveBloomFilter(ImmutableList.copyOf(Longs.asList(bloomFilter.getBitSet())), bloomFilter.getBitSize(), bloomFilter.getNumHashFunctions());
    // String
    assertTrue(hiveBloomFilter.testString(TEST_STRING));
    assertFalse(hiveBloomFilter.testString(TEST_STRING_NOT_WRITTEN));
    // Integer
    assertTrue(hiveBloomFilter.testLong(TEST_INTEGER));
    assertFalse(hiveBloomFilter.testLong(TEST_INTEGER + 1));
}
Also used : HiveBloomFilter(com.facebook.presto.orc.metadata.statistics.HiveBloomFilter) HiveBloomFilter(com.facebook.presto.orc.metadata.statistics.HiveBloomFilter) TupleDomainOrcPredicate.checkInBloomFilter(com.facebook.presto.orc.TupleDomainOrcPredicate.checkInBloomFilter) BloomFilter(com.facebook.presto.orc.metadata.statistics.BloomFilter) Test(org.testng.annotations.Test)

Example 7 with HiveBloomFilter

use of com.facebook.presto.orc.metadata.statistics.HiveBloomFilter in project presto by prestodb.

the class TestOrcBloomFilters method testMatches.

@Test
public // simulate query on a 2 columns where 1 is used as part of the where, with and without bloom filter
void testMatches() {
    // stripe column
    Domain testingColumnHandleDomain = Domain.singleValue(BIGINT, 1234L);
    TupleDomain.ColumnDomain<String> column0 = new TupleDomain.ColumnDomain<>(COLUMN_0, testingColumnHandleDomain);
    // predicate consist of the bigint_0 = 1234
    TupleDomain<String> effectivePredicate = TupleDomain.fromColumnDomains(Optional.of(ImmutableList.of(column0)));
    TupleDomain<String> emptyEffectivePredicate = TupleDomain.all();
    // predicate column references
    List<ColumnReference<String>> columnReferences = ImmutableList.<ColumnReference<String>>builder().add(new ColumnReference<>(COLUMN_0, 0, BIGINT)).add(new ColumnReference<>(COLUMN_1, 1, BIGINT)).build();
    TupleDomainOrcPredicate<String> predicate = new TupleDomainOrcPredicate<>(effectivePredicate, columnReferences, true, Optional.empty());
    TupleDomainOrcPredicate<String> emptyPredicate = new TupleDomainOrcPredicate<>(emptyEffectivePredicate, columnReferences, true, Optional.empty());
    // assemble a matching and a non-matching bloom filter
    HiveBloomFilter hiveBloomFilter = new HiveBloomFilter(new BloomFilter(1000, 0.01));
    OrcProto.BloomFilter emptyOrcBloomFilter = toOrcBloomFilter(hiveBloomFilter);
    hiveBloomFilter.addLong(1234);
    OrcProto.BloomFilter orcBloomFilter = toOrcBloomFilter(hiveBloomFilter);
    Map<Integer, ColumnStatistics> matchingStatisticsByColumnIndex = ImmutableMap.of(0, new IntegerColumnStatistics(null, toHiveBloomFilter(orcBloomFilter), new IntegerStatistics(10L, 2000L, null)));
    Map<Integer, ColumnStatistics> nonMatchingStatisticsByColumnIndex = ImmutableMap.of(0, new IntegerColumnStatistics(null, toHiveBloomFilter(emptyOrcBloomFilter), new IntegerStatistics(10L, 2000L, null)));
    Map<Integer, ColumnStatistics> withoutBloomFilterStatisticsByColumnIndex = ImmutableMap.of(0, new IntegerColumnStatistics(null, null, new IntegerStatistics(10L, 2000L, null)));
    assertTrue(predicate.matches(1L, matchingStatisticsByColumnIndex));
    assertTrue(predicate.matches(1L, withoutBloomFilterStatisticsByColumnIndex));
    assertFalse(predicate.matches(1L, nonMatchingStatisticsByColumnIndex));
    assertTrue(emptyPredicate.matches(1L, matchingStatisticsByColumnIndex));
}
Also used : IntegerColumnStatistics(com.facebook.presto.orc.metadata.statistics.IntegerColumnStatistics) ColumnStatistics(com.facebook.presto.orc.metadata.statistics.ColumnStatistics) OrcProto(com.facebook.presto.orc.proto.OrcProto) IntegerColumnStatistics(com.facebook.presto.orc.metadata.statistics.IntegerColumnStatistics) HiveBloomFilter(com.facebook.presto.orc.metadata.statistics.HiveBloomFilter) TupleDomainOrcPredicate.checkInBloomFilter(com.facebook.presto.orc.TupleDomainOrcPredicate.checkInBloomFilter) BloomFilter(com.facebook.presto.orc.metadata.statistics.BloomFilter) TupleDomain(com.facebook.presto.common.predicate.TupleDomain) HiveBloomFilter(com.facebook.presto.orc.metadata.statistics.HiveBloomFilter) Domain(com.facebook.presto.common.predicate.Domain) TupleDomain(com.facebook.presto.common.predicate.TupleDomain) ColumnReference(com.facebook.presto.orc.TupleDomainOrcPredicate.ColumnReference) IntegerStatistics(com.facebook.presto.orc.metadata.statistics.IntegerStatistics) Test(org.testng.annotations.Test)

Example 8 with HiveBloomFilter

use of com.facebook.presto.orc.metadata.statistics.HiveBloomFilter in project presto by prestodb.

the class OrcMetadataReader method readRowIndexes.

@Override
public List<RowGroupIndex> readRowIndexes(HiveWriterVersion hiveWriterVersion, InputStream inputStream, List<HiveBloomFilter> bloomFilters) throws IOException {
    long cpuStart = THREAD_MX_BEAN.getCurrentThreadCpuTime();
    CodedInputStream input = CodedInputStream.newInstance(inputStream);
    OrcProto.RowIndex rowIndex = OrcProto.RowIndex.parseFrom(input);
    runtimeStats.addMetricValue("OrcReadRowIndexesTimeNanos", THREAD_MX_BEAN.getCurrentThreadCpuTime() - cpuStart);
    return IntStream.range(0, rowIndex.getEntryCount()).mapToObj(i -> toRowGroupIndex(hiveWriterVersion, rowIndex.getEntry(i), bloomFilters == null || bloomFilters.isEmpty() ? null : bloomFilters.get(i))).collect(toImmutableList());
}
Also used : ORIGINAL(com.facebook.presto.orc.metadata.PostScript.HiveWriterVersion.ORIGINAL) ThreadMXBean(com.sun.management.ThreadMXBean) DoubleStatistics(com.facebook.presto.orc.metadata.statistics.DoubleStatistics) OrcTypeKind(com.facebook.presto.orc.metadata.OrcType.OrcTypeKind) BinaryStatistics(com.facebook.presto.orc.metadata.statistics.BinaryStatistics) BigDecimal(java.math.BigDecimal) Slices(io.airlift.slice.Slices) Map(java.util.Map) SliceUtf8.tryGetCodePointAt(io.airlift.slice.SliceUtf8.tryGetCodePointAt) RuntimeStats(com.facebook.presto.common.RuntimeStats) OrcDataSource(com.facebook.presto.orc.OrcDataSource) StreamKind(com.facebook.presto.orc.metadata.Stream.StreamKind) ImmutableMap(com.google.common.collect.ImmutableMap) NONE(com.facebook.presto.orc.metadata.CompressionKind.NONE) ImmutableList.toImmutableList(com.google.common.collect.ImmutableList.toImmutableList) RowIndexEntry(com.facebook.presto.orc.proto.OrcProto.RowIndexEntry) ColumnStatistics.createColumnStatistics(com.facebook.presto.orc.metadata.statistics.ColumnStatistics.createColumnStatistics) ColumnStatistics(com.facebook.presto.orc.metadata.statistics.ColumnStatistics) Preconditions.checkState(com.google.common.base.Preconditions.checkState) DataSize(io.airlift.units.DataSize) List(java.util.List) ImmutableMap.toImmutableMap(com.google.common.collect.ImmutableMap.toImmutableMap) SHORT_DECIMAL_VALUE_BYTES(com.facebook.presto.orc.metadata.statistics.ShortDecimalStatisticsBuilder.SHORT_DECIMAL_VALUE_BYTES) BooleanStatistics(com.facebook.presto.orc.metadata.statistics.BooleanStatistics) CodedInputStream(com.facebook.presto.orc.protobuf.CodedInputStream) DecimalStatistics(com.facebook.presto.orc.metadata.statistics.DecimalStatistics) Optional(java.util.Optional) MIN_SUPPLEMENTARY_CODE_POINT(java.lang.Character.MIN_SUPPLEMENTARY_CODE_POINT) IntStream(java.util.stream.IntStream) Slice(io.airlift.slice.Slice) DwrfKeyProvider(com.facebook.presto.orc.DwrfKeyProvider) StringStatistics(com.facebook.presto.orc.metadata.statistics.StringStatistics) OptionalInt(java.util.OptionalInt) DateStatistics(com.facebook.presto.orc.metadata.statistics.DateStatistics) OptionalLong(java.util.OptionalLong) ZLIB(com.facebook.presto.orc.metadata.CompressionKind.ZLIB) GIGABYTE(io.airlift.units.DataSize.Unit.GIGABYTE) ImmutableList(com.google.common.collect.ImmutableList) HiveBloomFilter(com.facebook.presto.orc.metadata.statistics.HiveBloomFilter) HiveWriterVersion(com.facebook.presto.orc.metadata.PostScript.HiveWriterVersion) Objects.requireNonNull(java.util.Objects.requireNonNull) ManagementFactory(java.lang.management.ManagementFactory) Math.toIntExact(java.lang.Math.toIntExact) DwrfEncryptionProvider(com.facebook.presto.orc.DwrfEncryptionProvider) OrcDataSourceId(com.facebook.presto.orc.OrcDataSourceId) OrcDecompressor(com.facebook.presto.orc.OrcDecompressor) SliceUtf8.lengthOfCodePoint(io.airlift.slice.SliceUtf8.lengthOfCodePoint) ORC_HIVE_8732(com.facebook.presto.orc.metadata.PostScript.HiveWriterVersion.ORC_HIVE_8732) ColumnEncodingKind(com.facebook.presto.orc.metadata.ColumnEncoding.ColumnEncodingKind) StripeStatistics(com.facebook.presto.orc.metadata.statistics.StripeStatistics) SNAPPY(com.facebook.presto.orc.metadata.CompressionKind.SNAPPY) IOException(java.io.IOException) OrcProto(com.facebook.presto.orc.proto.OrcProto) IntegerStatistics(com.facebook.presto.orc.metadata.statistics.IntegerStatistics) LZ4(com.facebook.presto.orc.metadata.CompressionKind.LZ4) ByteString(com.facebook.presto.orc.protobuf.ByteString) VisibleForTesting(com.google.common.annotations.VisibleForTesting) InputStream(java.io.InputStream) ZSTD(com.facebook.presto.orc.metadata.CompressionKind.ZSTD) CodedInputStream(com.facebook.presto.orc.protobuf.CodedInputStream) OrcProto(com.facebook.presto.orc.proto.OrcProto)

Aggregations

HiveBloomFilter (com.facebook.presto.orc.metadata.statistics.HiveBloomFilter)8 CodedInputStream (com.facebook.presto.orc.protobuf.CodedInputStream)4 InputStream (java.io.InputStream)4 RuntimeStats (com.facebook.presto.common.RuntimeStats)3 TupleDomainOrcPredicate.checkInBloomFilter (com.facebook.presto.orc.TupleDomainOrcPredicate.checkInBloomFilter)3 BloomFilter (com.facebook.presto.orc.metadata.statistics.BloomFilter)3 OrcProto (com.facebook.presto.orc.proto.OrcProto)3 ImmutableList (com.google.common.collect.ImmutableList)3 Test (org.testng.annotations.Test)3 Domain (com.facebook.presto.common.predicate.Domain)2 TupleDomain (com.facebook.presto.common.predicate.TupleDomain)2 DwrfEncryptionProvider (com.facebook.presto.orc.DwrfEncryptionProvider)2 DwrfKeyProvider (com.facebook.presto.orc.DwrfKeyProvider)2 OrcDataSource (com.facebook.presto.orc.OrcDataSource)2 OrcDataSourceId (com.facebook.presto.orc.OrcDataSourceId)2 OrcDecompressor (com.facebook.presto.orc.OrcDecompressor)2 ColumnEncodingKind (com.facebook.presto.orc.metadata.ColumnEncoding.ColumnEncodingKind)2 LZ4 (com.facebook.presto.orc.metadata.CompressionKind.LZ4)2 NONE (com.facebook.presto.orc.metadata.CompressionKind.NONE)2 SNAPPY (com.facebook.presto.orc.metadata.CompressionKind.SNAPPY)2