Search in sources :

Example 6 with AggregatedMemoryContext

use of io.prestosql.memory.context.AggregatedMemoryContext in project hetu-core by openlookeng.

the class TestBroadcastOutputBuffer method testSharedBufferBlockingNoBlockOnFull.

@Test
public void testSharedBufferBlockingNoBlockOnFull() {
    SettableFuture<?> blockedFuture = SettableFuture.create();
    MockMemoryReservationHandler reservationHandler = new MockMemoryReservationHandler(blockedFuture);
    AggregatedMemoryContext memoryContext = newRootAggregatedMemoryContext(reservationHandler, 0L);
    Page page = createPage(1);
    long pageSize = PAGES_SERDE.serialize(page).getRetainedSizeInBytes();
    // create a buffer that can only hold two pages
    BroadcastOutputBuffer buffer = createBroadcastBuffer(createInitialEmptyOutputBuffers(BROADCAST), new DataSize(pageSize * 2, BYTE), memoryContext, directExecutor());
    OutputBufferMemoryManager memoryManager = buffer.getMemoryManager();
    memoryManager.setNoBlockOnFull();
    // even if setNoBlockOnFull() is called the buffer should block on memory when we add the first page
    // as no memory is available (MockMemoryReservationHandler will return a future that is not done)
    enqueuePage(buffer, page);
    // more memory is available
    blockedFuture.set(null);
    memoryManager.onMemoryAvailable();
    assertTrue(memoryManager.getBufferBlockedFuture().isDone(), "buffer shouldn't be blocked");
    // we should be able to add one more page after more memory is available
    addPage(buffer, page);
    // the buffer is full now, but setNoBlockOnFull() is called so the buffer shouldn't block
    addPage(buffer, page);
}
Also used : DataSize(io.airlift.units.DataSize) BufferTestUtils.createPage(io.prestosql.execution.buffer.BufferTestUtils.createPage) MarkerPage(io.prestosql.spi.snapshot.MarkerPage) BufferTestUtils.addPage(io.prestosql.execution.buffer.BufferTestUtils.addPage) BufferTestUtils.enqueuePage(io.prestosql.execution.buffer.BufferTestUtils.enqueuePage) Page(io.prestosql.spi.Page) AggregatedMemoryContext.newRootAggregatedMemoryContext(io.prestosql.memory.context.AggregatedMemoryContext.newRootAggregatedMemoryContext) AggregatedMemoryContext(io.prestosql.memory.context.AggregatedMemoryContext) AggregatedMemoryContext.newSimpleAggregatedMemoryContext(io.prestosql.memory.context.AggregatedMemoryContext.newSimpleAggregatedMemoryContext) Test(org.testng.annotations.Test)

Example 7 with AggregatedMemoryContext

use of io.prestosql.memory.context.AggregatedMemoryContext in project boostkit-bigdata by kunpengcompute.

the class ParquetPageSourceFactory method createParquetPushDownPageSource.

public HivePushDownPageSource createParquetPushDownPageSource(Path path, long start, long length, com.huawei.boostkit.omnidata.model.Predicate predicate) {
    AggregatedMemoryContext systemMemoryUsage = newSimpleAggregatedMemoryContext();
    Properties transProperties = new Properties();
    transProperties.put(OMNIDATA_CLIENT_TARGET_LIST, omniDataServerTarget);
    DataSource parquetPushDownDataSource = new com.huawei.boostkit.omnidata.model.datasource.hdfs.HdfsParquetDataSource(path.toString(), start, length, false);
    TaskSource readTaskInfo = new TaskSource(parquetPushDownDataSource, predicate, TaskSource.ONE_MEGABYTES);
    DataReader<Page> dataReader = DataReaderFactory.create(transProperties, readTaskInfo, new OpenLooKengDeserializer());
    return new HivePushDownPageSource(dataReader, systemMemoryUsage);
}
Also used : HdfsParquetDataSource.buildHdfsParquetDataSource(io.prestosql.plugin.hive.parquet.HdfsParquetDataSource.buildHdfsParquetDataSource) HivePushDownPageSource(io.prestosql.plugin.hive.HivePushDownPageSource) Page(io.prestosql.spi.Page) HiveSessionProperties(io.prestosql.plugin.hive.HiveSessionProperties) Properties(java.util.Properties) AggregatedMemoryContext(io.prestosql.memory.context.AggregatedMemoryContext) AggregatedMemoryContext.newSimpleAggregatedMemoryContext(io.prestosql.memory.context.AggregatedMemoryContext.newSimpleAggregatedMemoryContext) TaskSource(com.huawei.boostkit.omnidata.model.TaskSource) HdfsParquetDataSource.buildHdfsParquetDataSource(io.prestosql.plugin.hive.parquet.HdfsParquetDataSource.buildHdfsParquetDataSource) DataSource(com.huawei.boostkit.omnidata.model.datasource.DataSource) ParquetDataSource(io.prestosql.parquet.ParquetDataSource) OpenLooKengDeserializer(com.huawei.boostkit.omnidata.decode.impl.OpenLooKengDeserializer)

Example 8 with AggregatedMemoryContext

use of io.prestosql.memory.context.AggregatedMemoryContext in project boostkit-bigdata by kunpengcompute.

the class ParquetPageSourceFactory method createParquetPageSource.

public static ParquetPageSource createParquetPageSource(HdfsEnvironment hdfsEnvironment, String user, Configuration configuration, Path path, long start, long length, long fileSize, Properties schema, List<HiveColumnHandle> columns, boolean useParquetColumnNames, boolean failOnCorruptedParquetStatistics, DataSize maxReadBlockSize, TypeManager typeManager, TupleDomain<HiveColumnHandle> effectivePredicate, FileFormatDataSourceStats stats, DateTimeZone timeZone) {
    AggregatedMemoryContext systemMemoryContext = newSimpleAggregatedMemoryContext();
    ParquetDataSource dataSource = null;
    DateTimeZone readerTimeZone = timeZone;
    try {
        FileSystem fileSystem = hdfsEnvironment.getFileSystem(user, path, configuration);
        FSDataInputStream inputStream = hdfsEnvironment.doAs(user, () -> fileSystem.open(path));
        ParquetMetadata parquetMetadata = MetadataReader.readFooter(inputStream, path, fileSize);
        FileMetaData fileMetaData = parquetMetadata.getFileMetaData();
        MessageType fileSchema = fileMetaData.getSchema();
        dataSource = buildHdfsParquetDataSource(inputStream, path, fileSize, stats);
        String writerTimeZoneId = fileMetaData.getKeyValueMetaData().get(WRITER_TIME_ZONE_KEY);
        if (writerTimeZoneId != null && !writerTimeZoneId.equalsIgnoreCase(readerTimeZone.getID())) {
            readerTimeZone = DateTimeZone.forID(writerTimeZoneId);
        }
        List<org.apache.parquet.schema.Type> fields = columns.stream().filter(column -> column.getColumnType() == REGULAR).map(column -> getParquetType(column, fileSchema, useParquetColumnNames)).filter(Objects::nonNull).collect(toList());
        MessageType requestedSchema = new MessageType(fileSchema.getName(), fields);
        ImmutableList.Builder<BlockMetaData> footerBlocks = ImmutableList.builder();
        for (BlockMetaData block : parquetMetadata.getBlocks()) {
            long firstDataPage = block.getColumns().get(0).getFirstDataPageOffset();
            if (firstDataPage >= start && firstDataPage < start + length) {
                footerBlocks.add(block);
            }
        }
        Map<List<String>, RichColumnDescriptor> descriptorsByPath = getDescriptors(fileSchema, requestedSchema);
        TupleDomain<ColumnDescriptor> parquetTupleDomain = getParquetTupleDomain(descriptorsByPath, effectivePredicate);
        Predicate parquetPredicate = buildPredicate(requestedSchema, parquetTupleDomain, descriptorsByPath);
        final ParquetDataSource finalDataSource = dataSource;
        ImmutableList.Builder<BlockMetaData> blocks = ImmutableList.builder();
        for (BlockMetaData block : footerBlocks.build()) {
            if (predicateMatches(parquetPredicate, block, finalDataSource, descriptorsByPath, parquetTupleDomain, failOnCorruptedParquetStatistics)) {
                blocks.add(block);
            }
        }
        MessageColumnIO messageColumnIO = getColumnIO(fileSchema, requestedSchema);
        ParquetReader parquetReader = new ParquetReader(Optional.ofNullable(fileMetaData.getCreatedBy()), messageColumnIO, blocks.build(), dataSource, readerTimeZone, systemMemoryContext, maxReadBlockSize);
        return new ParquetPageSource(parquetReader, fileSchema, messageColumnIO, typeManager, schema, columns, effectivePredicate, useParquetColumnNames);
    } catch (Exception e) {
        try {
            if (dataSource != null) {
                dataSource.close();
            }
        } catch (IOException ignored) {
        }
        if (e instanceof PrestoException) {
            throw (PrestoException) e;
        }
        if (e instanceof ParquetCorruptionException) {
            throw new PrestoException(HIVE_BAD_DATA, e);
        }
        if (nullToEmpty(e.getMessage()).trim().equals("Filesystem closed") || e instanceof FileNotFoundException) {
            throw new PrestoException(HIVE_CANNOT_OPEN_SPLIT, e);
        }
        String message = format("Error opening Hive split %s (offset=%s, length=%s): %s", path, start, length, e.getMessage());
        if (e instanceof BlockMissingException) {
            throw new PrestoException(HIVE_MISSING_DATA, message, e);
        }
        throw new PrestoException(HIVE_CANNOT_OPEN_SPLIT, message, e);
    }
}
Also used : DateTimeZone(org.joda.time.DateTimeZone) ParquetTypeUtils.getColumnIO(io.prestosql.parquet.ParquetTypeUtils.getColumnIO) HiveSessionProperties.isUseParquetColumnNames(io.prestosql.plugin.hive.HiveSessionProperties.isUseParquetColumnNames) FileSystem(org.apache.hadoop.fs.FileSystem) HivePartitionKey(io.prestosql.plugin.hive.HivePartitionKey) HiveColumnHandle(io.prestosql.plugin.hive.HiveColumnHandle) DataReader(com.huawei.boostkit.omnidata.reader.DataReader) BlockMissingException(org.apache.hadoop.hdfs.BlockMissingException) RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) ParquetTypeUtils.getParquetTypeByName(io.prestosql.parquet.ParquetTypeUtils.getParquetTypeByName) Preconditions.checkArgument(com.google.common.base.Preconditions.checkArgument) PredicateUtils.buildPredicate(io.prestosql.parquet.predicate.PredicateUtils.buildPredicate) ConnectorSession(io.prestosql.spi.connector.ConnectorSession) HiveConfig(io.prestosql.plugin.hive.HiveConfig) Configuration(org.apache.hadoop.conf.Configuration) Map(java.util.Map) AggregatedMemoryContext(io.prestosql.memory.context.AggregatedMemoryContext) Path(org.apache.hadoop.fs.Path) OMNIDATA_CLIENT_TARGET_LIST(com.huawei.boostkit.omnidata.transfer.OmniDataProperty.OMNIDATA_CLIENT_TARGET_LIST) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream) DataReaderFactory(com.huawei.boostkit.omnidata.reader.DataReaderFactory) HiveSessionProperties.isFailOnCorruptedParquetStatistics(io.prestosql.plugin.hive.HiveSessionProperties.isFailOnCorruptedParquetStatistics) PageSourceUtil.buildPushdownContext(io.prestosql.plugin.hive.util.PageSourceUtil.buildPushdownContext) PrestoException(io.prestosql.spi.PrestoException) HdfsParquetDataSource.buildHdfsParquetDataSource(io.prestosql.plugin.hive.parquet.HdfsParquetDataSource.buildHdfsParquetDataSource) ImmutableSet(com.google.common.collect.ImmutableSet) ImmutableMap(com.google.common.collect.ImmutableMap) Set(java.util.Set) HivePushDownPageSource(io.prestosql.plugin.hive.HivePushDownPageSource) MetadataReader(io.prestosql.parquet.reader.MetadataReader) FileFormatDataSourceStats(io.prestosql.plugin.hive.FileFormatDataSourceStats) FileNotFoundException(java.io.FileNotFoundException) String.format(java.lang.String.format) DataSource(com.huawei.boostkit.omnidata.model.datasource.DataSource) Objects(java.util.Objects) MessageType(org.apache.parquet.schema.MessageType) DataSize(io.airlift.units.DataSize) List(java.util.List) HiveOffloadExpression(io.prestosql.plugin.hive.HiveOffloadExpression) ConnectorPageSource(io.prestosql.spi.connector.ConnectorPageSource) HiveUtil.shouldUseRecordReaderFromInputFormat(io.prestosql.plugin.hive.HiveUtil.shouldUseRecordReaderFromInputFormat) HIVE_BAD_DATA(io.prestosql.plugin.hive.HiveErrorCode.HIVE_BAD_DATA) ColumnDescriptor(org.apache.parquet.column.ColumnDescriptor) BlockMetaData(org.apache.parquet.hadoop.metadata.BlockMetaData) PredicateUtils.predicateMatches(io.prestosql.parquet.predicate.PredicateUtils.predicateMatches) Domain(io.prestosql.spi.predicate.Domain) Entry(java.util.Map.Entry) Optional(java.util.Optional) IndexMetadata(io.prestosql.spi.heuristicindex.IndexMetadata) AggregatedMemoryContext.newSimpleAggregatedMemoryContext(io.prestosql.memory.context.AggregatedMemoryContext.newSimpleAggregatedMemoryContext) SplitMetadata(io.prestosql.spi.heuristicindex.SplitMetadata) MessageColumnIO(org.apache.parquet.io.MessageColumnIO) HIVE_CANNOT_OPEN_SPLIT(io.prestosql.plugin.hive.HiveErrorCode.HIVE_CANNOT_OPEN_SPLIT) Strings.nullToEmpty(com.google.common.base.Strings.nullToEmpty) ParquetReader(io.prestosql.parquet.reader.ParquetReader) HiveSessionProperties(io.prestosql.plugin.hive.HiveSessionProperties) ParquetTypeUtils.getDescriptors(io.prestosql.parquet.ParquetTypeUtils.getDescriptors) OptionalInt(java.util.OptionalInt) TaskSource(com.huawei.boostkit.omnidata.model.TaskSource) Predicate(io.prestosql.parquet.predicate.Predicate) Inject(javax.inject.Inject) HiveUtil.getDeserializerClassName(io.prestosql.plugin.hive.HiveUtil.getDeserializerClassName) HdfsEnvironment(io.prestosql.plugin.hive.HdfsEnvironment) HIVE_MISSING_DATA(io.prestosql.plugin.hive.HiveErrorCode.HIVE_MISSING_DATA) REGULAR(io.prestosql.plugin.hive.HiveColumnHandle.ColumnType.REGULAR) ImmutableList(com.google.common.collect.ImmutableList) Objects.requireNonNull(java.util.Objects.requireNonNull) DynamicFilterSupplier(io.prestosql.spi.dynamicfilter.DynamicFilterSupplier) HivePageSourceFactory(io.prestosql.plugin.hive.HivePageSourceFactory) Properties(java.util.Properties) DeleteDeltaLocations(io.prestosql.plugin.hive.DeleteDeltaLocations) TupleDomain(io.prestosql.spi.predicate.TupleDomain) TypeManager(io.prestosql.spi.type.TypeManager) Page(io.prestosql.spi.Page) IOException(java.io.IOException) PRIMITIVE(org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category.PRIMITIVE) ParquetDataSource(io.prestosql.parquet.ParquetDataSource) Collectors.toList(java.util.stream.Collectors.toList) HiveSessionProperties.getParquetMaxReadBlockSize(io.prestosql.plugin.hive.HiveSessionProperties.getParquetMaxReadBlockSize) ParquetCorruptionException(io.prestosql.parquet.ParquetCorruptionException) FileMetaData(org.apache.parquet.hadoop.metadata.FileMetaData) ParquetMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata) OpenLooKengDeserializer(com.huawei.boostkit.omnidata.decode.impl.OpenLooKengDeserializer) BlockMetaData(org.apache.parquet.hadoop.metadata.BlockMetaData) ParquetMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata) ImmutableList(com.google.common.collect.ImmutableList) RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) FileNotFoundException(java.io.FileNotFoundException) PrestoException(io.prestosql.spi.PrestoException) MessageColumnIO(org.apache.parquet.io.MessageColumnIO) PredicateUtils.buildPredicate(io.prestosql.parquet.predicate.PredicateUtils.buildPredicate) Predicate(io.prestosql.parquet.predicate.Predicate) ParquetCorruptionException(io.prestosql.parquet.ParquetCorruptionException) FileSystem(org.apache.hadoop.fs.FileSystem) List(java.util.List) ImmutableList(com.google.common.collect.ImmutableList) Collectors.toList(java.util.stream.Collectors.toList) BlockMissingException(org.apache.hadoop.hdfs.BlockMissingException) FileMetaData(org.apache.parquet.hadoop.metadata.FileMetaData) MessageType(org.apache.parquet.schema.MessageType) HdfsParquetDataSource.buildHdfsParquetDataSource(io.prestosql.plugin.hive.parquet.HdfsParquetDataSource.buildHdfsParquetDataSource) ParquetDataSource(io.prestosql.parquet.ParquetDataSource) RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) ColumnDescriptor(org.apache.parquet.column.ColumnDescriptor) ParquetReader(io.prestosql.parquet.reader.ParquetReader) IOException(java.io.IOException) AggregatedMemoryContext(io.prestosql.memory.context.AggregatedMemoryContext) AggregatedMemoryContext.newSimpleAggregatedMemoryContext(io.prestosql.memory.context.AggregatedMemoryContext.newSimpleAggregatedMemoryContext) DateTimeZone(org.joda.time.DateTimeZone) BlockMissingException(org.apache.hadoop.hdfs.BlockMissingException) PrestoException(io.prestosql.spi.PrestoException) FileNotFoundException(java.io.FileNotFoundException) IOException(java.io.IOException) ParquetCorruptionException(io.prestosql.parquet.ParquetCorruptionException) MessageType(org.apache.parquet.schema.MessageType) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream)

Example 9 with AggregatedMemoryContext

use of io.prestosql.memory.context.AggregatedMemoryContext in project hetu-core by openlookeng.

the class OrcSelectiveRecordReader method createColumnReaders.

public SelectiveColumnReader[] createColumnReaders(List<OrcColumn> fileColumns, AggregatedMemoryContext systemMemoryContext, OrcBlockFactory blockFactory, OrcCacheStore orcCacheStore, OrcCacheProperties orcCacheProperties, OrcPredicate predicate, Map<Integer, TupleDomainFilter> filters, DateTimeZone hiveStorageTimeZone, List<Integer> outputColumns, Map<Integer, Type> includedColumns, ColumnMetadata<OrcType> orcTypes, boolean useDataCache) throws OrcCorruptionException {
    int fieldCount = orcTypes.get(OrcColumnId.ROOT_COLUMN).getFieldCount();
    SelectiveColumnReader[] columnReaders = new SelectiveColumnReader[fieldCount];
    colReaderWithFilter = new IntArraySet();
    colReaderWithORFilter = new IntArraySet();
    colReaderWithoutFilter = new IntArraySet();
    IntArraySet remainingColumns = new IntArraySet();
    remainingColumns.addAll(includedColumns.keySet());
    for (int i = 0; i < fieldCount; i++) {
        // create column reader only for columns which are part of projection and filter.
        if (includedColumns.containsKey(i)) {
            int columnIndex = i;
            OrcColumn column = fileColumns.get(columnIndex);
            boolean outputRequired = outputColumns.contains(i);
            SelectiveColumnReader columnReader = null;
            if (useDataCache && orcCacheProperties.isRowDataCacheEnabled()) {
                ColumnReader cr = ColumnReaders.createColumnReader(includedColumns.get(i), column, systemMemoryContext, blockFactory.createNestedBlockFactory(block -> blockLoaded(columnIndex, block)));
                columnReader = SelectiveColumnReaders.wrapWithDataCachingStreamReader(cr, column, orcCacheStore.getRowDataCache());
            } else {
                columnReader = createColumnReader(orcTypes.get(column.getColumnId()), column, Optional.ofNullable(filters.get(i)), outputRequired ? Optional.of(includedColumns.get(i)) : Optional.empty(), hiveStorageTimeZone, systemMemoryContext);
                if (orcCacheProperties.isRowDataCacheEnabled()) {
                    columnReader = SelectiveColumnReaders.wrapWithResultCachingStreamReader(columnReader, column, predicate, orcCacheStore.getRowDataCache());
                }
            }
            columnReaders[columnIndex] = columnReader;
            if (filters.get(i) != null) {
                colReaderWithFilter.add(columnIndex);
            } else if (disjuctFilters.get(i) != null && disjuctFilters.get(i).size() > 0) {
                colReaderWithORFilter.add(columnIndex);
            } else {
                colReaderWithoutFilter.add(columnIndex);
            }
            remainingColumns.remove(columnIndex);
        }
    }
    /* if any still remaining colIdx < 0 */
    remainingColumns.removeAll(missingColumns);
    for (Integer col : remainingColumns) {
        if (col < 0) {
            /* should be always true! */
            if (filters.get(col) != null) {
                colReaderWithFilter.add(col);
            } else if (disjuctFilters.get(col) != null && disjuctFilters.get(col).size() > 0) {
                colReaderWithORFilter.add(col);
            }
        }
    }
    // specially for alter add column case:
    for (int missingColumn : missingColumns) {
        if (filters.get(missingColumn) != null) {
            colReaderWithFilter.add(missingColumn);
        } else if (disjuctFilters.get(missingColumn) != null && disjuctFilters.get(missingColumn).size() > 0) {
            colReaderWithORFilter.add(missingColumn);
        }
    }
    return columnReaders;
}
Also used : IntStream(java.util.stream.IntStream) StripeStatistics(io.prestosql.orc.metadata.statistics.StripeStatistics) DateTimeZone(org.joda.time.DateTimeZone) Arrays(java.util.Arrays) Slice(io.airlift.slice.Slice) Logger(io.airlift.log.Logger) OrcColumnId(io.prestosql.orc.metadata.OrcColumnId) RunLengthEncodedBlock(io.prestosql.spi.block.RunLengthEncodedBlock) TypeNotFoundException(io.prestosql.spi.type.TypeNotFoundException) PeekingIterator(com.google.common.collect.PeekingIterator) Function(java.util.function.Function) PostScript(io.prestosql.orc.metadata.PostScript) ArrayList(java.util.ArrayList) Preconditions.checkArgument(com.google.common.base.Preconditions.checkArgument) Map(java.util.Map) Objects.requireNonNull(java.util.Objects.requireNonNull) AggregatedMemoryContext(io.prestosql.memory.context.AggregatedMemoryContext) Type(io.prestosql.spi.type.Type) Math.toIntExact(java.lang.Math.toIntExact) SelectiveColumnReaders(io.prestosql.orc.reader.SelectiveColumnReaders) Block(io.prestosql.spi.block.Block) ColumnReaders(io.prestosql.orc.reader.ColumnReaders) SelectiveColumnReader(io.prestosql.orc.reader.SelectiveColumnReader) OrcType(io.prestosql.orc.metadata.OrcType) IntArraySet(it.unimi.dsi.fastutil.ints.IntArraySet) Set(java.util.Set) Page(io.prestosql.spi.Page) IOException(java.io.IOException) ColumnMetadata(io.prestosql.orc.metadata.ColumnMetadata) ColumnReader(io.prestosql.orc.reader.ColumnReader) MetadataReader(io.prestosql.orc.metadata.MetadataReader) StripeInformation(io.prestosql.orc.metadata.StripeInformation) DataSize(io.airlift.units.DataSize) List(java.util.List) SelectiveColumnReaders.createColumnReader(io.prestosql.orc.reader.SelectiveColumnReaders.createColumnReader) Domain(io.prestosql.spi.predicate.Domain) ColumnStatistics(io.prestosql.orc.metadata.statistics.ColumnStatistics) Optional(java.util.Optional) BitSet(java.util.BitSet) IndexMetadata(io.prestosql.spi.heuristicindex.IndexMetadata) SelectiveColumnReader(io.prestosql.orc.reader.SelectiveColumnReader) IntArraySet(it.unimi.dsi.fastutil.ints.IntArraySet) SelectiveColumnReader(io.prestosql.orc.reader.SelectiveColumnReader) ColumnReader(io.prestosql.orc.reader.ColumnReader) SelectiveColumnReaders.createColumnReader(io.prestosql.orc.reader.SelectiveColumnReaders.createColumnReader)

Example 10 with AggregatedMemoryContext

use of io.prestosql.memory.context.AggregatedMemoryContext in project hetu-core by openlookeng.

the class ParquetPageSourceFactory method createParquetPageSource.

public static ParquetPageSource createParquetPageSource(HdfsEnvironment hdfsEnvironment, String user, Configuration configuration, Path path, long start, long length, long fileSize, Properties schema, List<HiveColumnHandle> columns, boolean useParquetColumnNames, boolean failOnCorruptedParquetStatistics, DataSize maxReadBlockSize, TypeManager typeManager, TupleDomain<HiveColumnHandle> effectivePredicate, FileFormatDataSourceStats stats, DateTimeZone timeZone) {
    AggregatedMemoryContext systemMemoryContext = newSimpleAggregatedMemoryContext();
    ParquetDataSource dataSource = null;
    DateTimeZone readerTimeZone = timeZone;
    try {
        FileSystem fileSystem = hdfsEnvironment.getFileSystem(user, path, configuration);
        FSDataInputStream inputStream = hdfsEnvironment.doAs(user, () -> fileSystem.open(path));
        ParquetMetadata parquetMetadata = MetadataReader.readFooter(inputStream, path, fileSize);
        FileMetaData fileMetaData = parquetMetadata.getFileMetaData();
        MessageType fileSchema = fileMetaData.getSchema();
        dataSource = buildHdfsParquetDataSource(inputStream, path, fileSize, stats);
        String writerTimeZoneId = fileMetaData.getKeyValueMetaData().get(WRITER_TIME_ZONE_KEY);
        if (writerTimeZoneId != null && !writerTimeZoneId.equalsIgnoreCase(readerTimeZone.getID())) {
            readerTimeZone = DateTimeZone.forID(writerTimeZoneId);
        }
        List<org.apache.parquet.schema.Type> fields = columns.stream().filter(column -> column.getColumnType() == REGULAR).map(column -> getParquetType(column, fileSchema, useParquetColumnNames)).filter(Objects::nonNull).collect(toList());
        MessageType requestedSchema = new MessageType(fileSchema.getName(), fields);
        ImmutableList.Builder<BlockMetaData> footerBlocks = ImmutableList.builder();
        for (BlockMetaData block : parquetMetadata.getBlocks()) {
            long firstDataPage = block.getColumns().get(0).getFirstDataPageOffset();
            if (firstDataPage >= start && firstDataPage < start + length) {
                footerBlocks.add(block);
            }
        }
        Map<List<String>, RichColumnDescriptor> descriptorsByPath = getDescriptors(fileSchema, requestedSchema);
        TupleDomain<ColumnDescriptor> parquetTupleDomain = getParquetTupleDomain(descriptorsByPath, effectivePredicate);
        Predicate parquetPredicate = buildPredicate(requestedSchema, parquetTupleDomain, descriptorsByPath);
        final ParquetDataSource finalDataSource = dataSource;
        ImmutableList.Builder<BlockMetaData> blocks = ImmutableList.builder();
        for (BlockMetaData block : footerBlocks.build()) {
            if (predicateMatches(parquetPredicate, block, finalDataSource, descriptorsByPath, parquetTupleDomain, failOnCorruptedParquetStatistics)) {
                blocks.add(block);
            }
        }
        MessageColumnIO messageColumnIO = getColumnIO(fileSchema, requestedSchema);
        ParquetReader parquetReader = new ParquetReader(Optional.ofNullable(fileMetaData.getCreatedBy()), messageColumnIO, blocks.build(), dataSource, readerTimeZone, systemMemoryContext, maxReadBlockSize);
        return new ParquetPageSource(parquetReader, fileSchema, messageColumnIO, typeManager, schema, columns, effectivePredicate, useParquetColumnNames);
    } catch (Exception e) {
        try {
            if (dataSource != null) {
                dataSource.close();
            }
        } catch (IOException ignored) {
        }
        if (e instanceof PrestoException) {
            throw (PrestoException) e;
        }
        if (e instanceof ParquetCorruptionException) {
            throw new PrestoException(HIVE_BAD_DATA, e);
        }
        if (nullToEmpty(e.getMessage()).trim().equals("Filesystem closed") || e instanceof FileNotFoundException) {
            throw new PrestoException(HIVE_CANNOT_OPEN_SPLIT, e);
        }
        String message = format("Error opening Hive split %s (offset=%s, length=%s): %s", path, start, length, e.getMessage());
        if (e instanceof BlockMissingException) {
            throw new PrestoException(HIVE_MISSING_DATA, message, e);
        }
        throw new PrestoException(HIVE_CANNOT_OPEN_SPLIT, message, e);
    }
}
Also used : DateTimeZone(org.joda.time.DateTimeZone) ParquetTypeUtils.getColumnIO(io.prestosql.parquet.ParquetTypeUtils.getColumnIO) HiveSessionProperties.isUseParquetColumnNames(io.prestosql.plugin.hive.HiveSessionProperties.isUseParquetColumnNames) FileSystem(org.apache.hadoop.fs.FileSystem) HiveColumnHandle(io.prestosql.plugin.hive.HiveColumnHandle) BlockMissingException(org.apache.hadoop.hdfs.BlockMissingException) RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) ParquetTypeUtils.getParquetTypeByName(io.prestosql.parquet.ParquetTypeUtils.getParquetTypeByName) Preconditions.checkArgument(com.google.common.base.Preconditions.checkArgument) PredicateUtils.buildPredicate(io.prestosql.parquet.predicate.PredicateUtils.buildPredicate) ConnectorSession(io.prestosql.spi.connector.ConnectorSession) HiveConfig(io.prestosql.plugin.hive.HiveConfig) Configuration(org.apache.hadoop.conf.Configuration) Map(java.util.Map) AggregatedMemoryContext(io.prestosql.memory.context.AggregatedMemoryContext) Path(org.apache.hadoop.fs.Path) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream) HiveSessionProperties.isFailOnCorruptedParquetStatistics(io.prestosql.plugin.hive.HiveSessionProperties.isFailOnCorruptedParquetStatistics) PrestoException(io.prestosql.spi.PrestoException) HdfsParquetDataSource.buildHdfsParquetDataSource(io.prestosql.plugin.hive.parquet.HdfsParquetDataSource.buildHdfsParquetDataSource) ImmutableSet(com.google.common.collect.ImmutableSet) ImmutableMap(com.google.common.collect.ImmutableMap) Set(java.util.Set) MetadataReader(io.prestosql.parquet.reader.MetadataReader) FileFormatDataSourceStats(io.prestosql.plugin.hive.FileFormatDataSourceStats) FileNotFoundException(java.io.FileNotFoundException) String.format(java.lang.String.format) Objects(java.util.Objects) MessageType(org.apache.parquet.schema.MessageType) DataSize(io.airlift.units.DataSize) List(java.util.List) ConnectorPageSource(io.prestosql.spi.connector.ConnectorPageSource) HiveUtil.shouldUseRecordReaderFromInputFormat(io.prestosql.plugin.hive.HiveUtil.shouldUseRecordReaderFromInputFormat) HIVE_BAD_DATA(io.prestosql.plugin.hive.HiveErrorCode.HIVE_BAD_DATA) ColumnDescriptor(org.apache.parquet.column.ColumnDescriptor) BlockMetaData(org.apache.parquet.hadoop.metadata.BlockMetaData) PredicateUtils.predicateMatches(io.prestosql.parquet.predicate.PredicateUtils.predicateMatches) Domain(io.prestosql.spi.predicate.Domain) Entry(java.util.Map.Entry) Optional(java.util.Optional) IndexMetadata(io.prestosql.spi.heuristicindex.IndexMetadata) AggregatedMemoryContext.newSimpleAggregatedMemoryContext(io.prestosql.memory.context.AggregatedMemoryContext.newSimpleAggregatedMemoryContext) SplitMetadata(io.prestosql.spi.heuristicindex.SplitMetadata) MessageColumnIO(org.apache.parquet.io.MessageColumnIO) HIVE_CANNOT_OPEN_SPLIT(io.prestosql.plugin.hive.HiveErrorCode.HIVE_CANNOT_OPEN_SPLIT) Strings.nullToEmpty(com.google.common.base.Strings.nullToEmpty) ParquetReader(io.prestosql.parquet.reader.ParquetReader) ParquetTypeUtils.getDescriptors(io.prestosql.parquet.ParquetTypeUtils.getDescriptors) Predicate(io.prestosql.parquet.predicate.Predicate) Inject(javax.inject.Inject) HiveUtil.getDeserializerClassName(io.prestosql.plugin.hive.HiveUtil.getDeserializerClassName) HdfsEnvironment(io.prestosql.plugin.hive.HdfsEnvironment) HIVE_MISSING_DATA(io.prestosql.plugin.hive.HiveErrorCode.HIVE_MISSING_DATA) REGULAR(io.prestosql.plugin.hive.HiveColumnHandle.ColumnType.REGULAR) ImmutableList(com.google.common.collect.ImmutableList) Objects.requireNonNull(java.util.Objects.requireNonNull) DynamicFilterSupplier(io.prestosql.spi.dynamicfilter.DynamicFilterSupplier) HivePageSourceFactory(io.prestosql.plugin.hive.HivePageSourceFactory) Properties(java.util.Properties) DeleteDeltaLocations(io.prestosql.plugin.hive.DeleteDeltaLocations) TupleDomain(io.prestosql.spi.predicate.TupleDomain) TypeManager(io.prestosql.spi.type.TypeManager) IOException(java.io.IOException) PRIMITIVE(org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category.PRIMITIVE) ParquetDataSource(io.prestosql.parquet.ParquetDataSource) Collectors.toList(java.util.stream.Collectors.toList) HiveSessionProperties.getParquetMaxReadBlockSize(io.prestosql.plugin.hive.HiveSessionProperties.getParquetMaxReadBlockSize) ParquetCorruptionException(io.prestosql.parquet.ParquetCorruptionException) FileMetaData(org.apache.parquet.hadoop.metadata.FileMetaData) ParquetMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata) BlockMetaData(org.apache.parquet.hadoop.metadata.BlockMetaData) ParquetMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata) ImmutableList(com.google.common.collect.ImmutableList) RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) FileNotFoundException(java.io.FileNotFoundException) PrestoException(io.prestosql.spi.PrestoException) MessageColumnIO(org.apache.parquet.io.MessageColumnIO) PredicateUtils.buildPredicate(io.prestosql.parquet.predicate.PredicateUtils.buildPredicate) Predicate(io.prestosql.parquet.predicate.Predicate) ParquetCorruptionException(io.prestosql.parquet.ParquetCorruptionException) FileSystem(org.apache.hadoop.fs.FileSystem) List(java.util.List) ImmutableList(com.google.common.collect.ImmutableList) Collectors.toList(java.util.stream.Collectors.toList) BlockMissingException(org.apache.hadoop.hdfs.BlockMissingException) FileMetaData(org.apache.parquet.hadoop.metadata.FileMetaData) MessageType(org.apache.parquet.schema.MessageType) HdfsParquetDataSource.buildHdfsParquetDataSource(io.prestosql.plugin.hive.parquet.HdfsParquetDataSource.buildHdfsParquetDataSource) ParquetDataSource(io.prestosql.parquet.ParquetDataSource) RichColumnDescriptor(io.prestosql.parquet.RichColumnDescriptor) ColumnDescriptor(org.apache.parquet.column.ColumnDescriptor) ParquetReader(io.prestosql.parquet.reader.ParquetReader) IOException(java.io.IOException) AggregatedMemoryContext(io.prestosql.memory.context.AggregatedMemoryContext) AggregatedMemoryContext.newSimpleAggregatedMemoryContext(io.prestosql.memory.context.AggregatedMemoryContext.newSimpleAggregatedMemoryContext) DateTimeZone(org.joda.time.DateTimeZone) BlockMissingException(org.apache.hadoop.hdfs.BlockMissingException) PrestoException(io.prestosql.spi.PrestoException) FileNotFoundException(java.io.FileNotFoundException) IOException(java.io.IOException) ParquetCorruptionException(io.prestosql.parquet.ParquetCorruptionException) MessageType(org.apache.parquet.schema.MessageType) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream)

Aggregations

AggregatedMemoryContext (io.prestosql.memory.context.AggregatedMemoryContext)14 Page (io.prestosql.spi.Page)12 AggregatedMemoryContext.newSimpleAggregatedMemoryContext (io.prestosql.memory.context.AggregatedMemoryContext.newSimpleAggregatedMemoryContext)11 DataSize (io.airlift.units.DataSize)9 Optional (java.util.Optional)7 Properties (java.util.Properties)7 IndexMetadata (io.prestosql.spi.heuristicindex.IndexMetadata)6 Domain (io.prestosql.spi.predicate.Domain)6 IOException (java.io.IOException)6 List (java.util.List)6 Map (java.util.Map)6 DateTimeZone (org.joda.time.DateTimeZone)6 Preconditions.checkArgument (com.google.common.base.Preconditions.checkArgument)5 ImmutableMap (com.google.common.collect.ImmutableMap)5 SplitMetadata (io.prestosql.spi.heuristicindex.SplitMetadata)5 Type (io.prestosql.spi.type.Type)5 Strings.nullToEmpty (com.google.common.base.Strings.nullToEmpty)4 ImmutableList (com.google.common.collect.ImmutableList)4 Logger (io.airlift.log.Logger)4 Objects.requireNonNull (java.util.Objects.requireNonNull)4