Search in sources :

Example 1 with RichColumnDescriptor

use of io.prestosql.parquet.RichColumnDescriptor in project hetu-core by openlookeng.

the class PredicateUtils method getDictionaries.

private static Map<ColumnDescriptor, DictionaryDescriptor> getDictionaries(BlockMetaData blockMetadata, ParquetDataSource dataSource, Map<List<String>, RichColumnDescriptor> descriptorsByPath, TupleDomain<ColumnDescriptor> parquetTupleDomain) {
    ImmutableMap.Builder<ColumnDescriptor, DictionaryDescriptor> dictionaries = ImmutableMap.builder();
    for (ColumnChunkMetaData columnMetaData : blockMetadata.getColumns()) {
        RichColumnDescriptor descriptor = descriptorsByPath.get(Arrays.asList(columnMetaData.getPath().toArray()));
        if (descriptor != null) {
            if (isOnlyDictionaryEncodingPages(columnMetaData) && isColumnPredicate(descriptor, parquetTupleDomain)) {
                int totalSize = toIntExact(columnMetaData.getTotalSize());
                byte[] buffer = new byte[totalSize];
                dataSource.readFully(columnMetaData.getStartingPos(), buffer);
                Optional<DictionaryPage> dictionaryPage = readDictionaryPage(buffer, columnMetaData.getCodec());
                dictionaries.put(descriptor, new DictionaryDescriptor(descriptor, dictionaryPage));
                break;
            }
        }
    }
    return dictionaries.build();
}
Also used : ColumnChunkMetaData(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData) RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) ColumnDescriptor(org.apache.parquet.column.ColumnDescriptor) ImmutableMap(com.google.common.collect.ImmutableMap) DictionaryPage(io.prestosql.parquet.DictionaryPage)

Example 2 with RichColumnDescriptor

use of io.prestosql.parquet.RichColumnDescriptor in project hetu-core by openlookeng.

the class ParquetPageSourceFactory method getParquetTupleDomain.

public static TupleDomain<ColumnDescriptor> getParquetTupleDomain(Map<List<String>, RichColumnDescriptor> descriptorsByPath, TupleDomain<HiveColumnHandle> effectivePredicate) {
    if (effectivePredicate.isNone()) {
        return TupleDomain.none();
    }
    ImmutableMap.Builder<ColumnDescriptor, Domain> predicate = ImmutableMap.builder();
    for (Entry<HiveColumnHandle, Domain> entry : effectivePredicate.getDomains().get().entrySet()) {
        HiveColumnHandle columnHandle = entry.getKey();
        // skip looking up predicates for complex types as Parquet only stores stats for primitives
        if (!columnHandle.getHiveType().getCategory().equals(PRIMITIVE)) {
            continue;
        }
        RichColumnDescriptor descriptor = descriptorsByPath.get(ImmutableList.of(columnHandle.getName()));
        if (descriptor != null) {
            predicate.put(descriptor, entry.getValue());
        }
    }
    return TupleDomain.withColumnDomains(predicate.build());
}
Also used : RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) ColumnDescriptor(org.apache.parquet.column.ColumnDescriptor) Domain(io.prestosql.spi.predicate.Domain) TupleDomain(io.prestosql.spi.predicate.TupleDomain) ImmutableMap(com.google.common.collect.ImmutableMap) HiveColumnHandle(io.prestosql.plugin.hive.HiveColumnHandle)

Example 3 with RichColumnDescriptor

use of io.prestosql.parquet.RichColumnDescriptor in project hetu-core by openlookeng.

the class TestParquetPredicateUtils method testParquetTupleDomainMap.

@Test
public void testParquetTupleDomainMap() {
    HiveColumnHandle columnHandle = new HiveColumnHandle("my_map", HiveType.valueOf("map<int,int>"), parseTypeSignature(StandardTypes.MAP), 0, REGULAR, Optional.empty());
    MapType mapType = new MapType(INTEGER, INTEGER, methodHandle(TestParquetPredicateUtils.class, "throwUnsupportedOperationException"), methodHandle(TestParquetPredicateUtils.class, "throwUnsupportedOperationException"), methodHandle(TestParquetPredicateUtils.class, "throwUnsupportedOperationException"), methodHandle(TestParquetPredicateUtils.class, "throwUnsupportedOperationException"));
    TupleDomain<HiveColumnHandle> domain = withColumnDomains(ImmutableMap.of(columnHandle, Domain.notNull(mapType)));
    MessageType fileSchema = new MessageType("hive_schema", new GroupType(OPTIONAL, "my_map", new GroupType(REPEATED, "map", new PrimitiveType(REQUIRED, INT32, "key"), new PrimitiveType(OPTIONAL, INT32, "value"))));
    Map<List<String>, RichColumnDescriptor> descriptorsByPath = getDescriptors(fileSchema, fileSchema);
    TupleDomain<ColumnDescriptor> tupleDomain = ParquetPageSourceFactory.getParquetTupleDomain(descriptorsByPath, domain);
    assertTrue(tupleDomain.getDomains().get().isEmpty());
}
Also used : GroupType(org.apache.parquet.schema.GroupType) RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) ColumnDescriptor(org.apache.parquet.column.ColumnDescriptor) PrimitiveType(org.apache.parquet.schema.PrimitiveType) ImmutableList(com.google.common.collect.ImmutableList) List(java.util.List) HiveColumnHandle(io.prestosql.plugin.hive.HiveColumnHandle) MapType(io.prestosql.spi.type.MapType) MessageType(org.apache.parquet.schema.MessageType) Test(org.testng.annotations.Test)

Example 4 with RichColumnDescriptor

use of io.prestosql.parquet.RichColumnDescriptor in project hetu-core by openlookeng.

the class TestParquetPredicateUtils method testParquetTupleDomainStructArray.

@Test
public void testParquetTupleDomainStructArray() {
    HiveColumnHandle columnHandle = new HiveColumnHandle("my_array_struct", HiveType.valueOf("array<struct<a:int>>"), parseTypeSignature(StandardTypes.ARRAY), 0, REGULAR, Optional.empty());
    RowType.Field rowField = new RowType.Field(Optional.of("a"), INTEGER);
    RowType rowType = RowType.from(ImmutableList.of(rowField));
    TupleDomain<HiveColumnHandle> domain = withColumnDomains(ImmutableMap.of(columnHandle, Domain.notNull(new ArrayType(rowType))));
    MessageType fileSchema = new MessageType("hive_schema", new GroupType(OPTIONAL, "my_array_struct", new GroupType(REPEATED, "bag", new GroupType(OPTIONAL, "array_element", new PrimitiveType(OPTIONAL, INT32, "a")))));
    Map<List<String>, RichColumnDescriptor> descriptorsByPath = getDescriptors(fileSchema, fileSchema);
    TupleDomain<ColumnDescriptor> tupleDomain = ParquetPageSourceFactory.getParquetTupleDomain(descriptorsByPath, domain);
    assertTrue(tupleDomain.getDomains().get().isEmpty());
}
Also used : RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) ColumnDescriptor(org.apache.parquet.column.ColumnDescriptor) RowType(io.prestosql.spi.type.RowType) ArrayType(io.prestosql.spi.type.ArrayType) GroupType(org.apache.parquet.schema.GroupType) PrimitiveType(org.apache.parquet.schema.PrimitiveType) ImmutableList(com.google.common.collect.ImmutableList) List(java.util.List) HiveColumnHandle(io.prestosql.plugin.hive.HiveColumnHandle) MessageType(org.apache.parquet.schema.MessageType) Test(org.testng.annotations.Test)

Example 5 with RichColumnDescriptor

use of io.prestosql.parquet.RichColumnDescriptor in project hetu-core by openlookeng.

the class TestParquetPredicateUtils method testParquetTupleDomainPrimitive.

@Test
public void testParquetTupleDomainPrimitive() {
    HiveColumnHandle columnHandle = new HiveColumnHandle("my_primitive", HiveType.valueOf("bigint"), parseTypeSignature(StandardTypes.BIGINT), 0, REGULAR, Optional.empty());
    Domain singleValueDomain = Domain.singleValue(BIGINT, 123L);
    TupleDomain<HiveColumnHandle> domain = withColumnDomains(ImmutableMap.of(columnHandle, singleValueDomain));
    MessageType fileSchema = new MessageType("hive_schema", new PrimitiveType(OPTIONAL, INT64, "my_primitive"));
    Map<List<String>, RichColumnDescriptor> descriptorsByPath = getDescriptors(fileSchema, fileSchema);
    TupleDomain<ColumnDescriptor> tupleDomain = ParquetPageSourceFactory.getParquetTupleDomain(descriptorsByPath, domain);
    assertEquals(tupleDomain.getDomains().get().size(), 1);
    ColumnDescriptor descriptor = tupleDomain.getDomains().get().keySet().iterator().next();
    assertEquals(descriptor.getPath().length, 1);
    assertEquals(descriptor.getPath()[0], "my_primitive");
    Domain predicateDomain = Iterables.getOnlyElement(tupleDomain.getDomains().get().values());
    assertEquals(predicateDomain, singleValueDomain);
}
Also used : RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) ColumnDescriptor(org.apache.parquet.column.ColumnDescriptor) PrimitiveType(org.apache.parquet.schema.PrimitiveType) ImmutableList(com.google.common.collect.ImmutableList) List(java.util.List) TupleDomain(io.prestosql.spi.predicate.TupleDomain) Domain(io.prestosql.spi.predicate.Domain) HiveColumnHandle(io.prestosql.plugin.hive.HiveColumnHandle) MessageType(org.apache.parquet.schema.MessageType) Test(org.testng.annotations.Test)

Aggregations

RichColumnDescriptor (io.prestosql.parquet.RichColumnDescriptor)12 ColumnDescriptor (org.apache.parquet.column.ColumnDescriptor)10 ImmutableList (com.google.common.collect.ImmutableList)7 HiveColumnHandle (io.prestosql.plugin.hive.HiveColumnHandle)7 List (java.util.List)6 MessageType (org.apache.parquet.schema.MessageType)6 Domain (io.prestosql.spi.predicate.Domain)5 TupleDomain (io.prestosql.spi.predicate.TupleDomain)5 PrimitiveType (org.apache.parquet.schema.PrimitiveType)5 Test (org.testng.annotations.Test)5 GroupType (org.apache.parquet.schema.GroupType)4 ImmutableMap (com.google.common.collect.ImmutableMap)3 Optional (java.util.Optional)2 PrimitiveColumnIO (org.apache.parquet.io.PrimitiveColumnIO)2 Preconditions.checkArgument (com.google.common.base.Preconditions.checkArgument)1 Strings.nullToEmpty (com.google.common.base.Strings.nullToEmpty)1 ImmutableSet (com.google.common.collect.ImmutableSet)1 DataSize (io.airlift.units.DataSize)1 AggregatedMemoryContext (io.prestosql.memory.context.AggregatedMemoryContext)1 AggregatedMemoryContext.newSimpleAggregatedMemoryContext (io.prestosql.memory.context.AggregatedMemoryContext.newSimpleAggregatedMemoryContext)1