Search in sources :

Example 16 with ConnectorPageSource

use of io.prestosql.spi.connector.ConnectorPageSource in project hetu-core by openlookeng.

the class TestOrcAcidPageSource method readFile.

private static List<Nation> readFile(Map<NationColumn, Integer> columns, TupleDomain<HiveColumnHandle> tupleDomain, Optional<DeleteDeltaLocations> deleteDeltaLocations) {
    List<HiveColumnHandle> columnHandles = columns.entrySet().stream().map(column -> toHiveColumnHandle(column.getKey(), column.getValue())).collect(toImmutableList());
    List<String> columnNames = columnHandles.stream().map(HiveColumnHandle::getName).collect(toImmutableList());
    // This file has the contains the TPC-H nation table which each row repeated 1000 times
    File nationFileWithReplicatedRows = new File(TestOrcAcidPageSource.class.getClassLoader().getResource("nationFile25kRowsSortedOnNationKey/bucket_00000").getPath());
    ConnectorPageSource pageSource = PAGE_SOURCE_FACTORY.createPageSource(new JobConf(new Configuration(false)), HiveTestUtils.SESSION, new Path(nationFileWithReplicatedRows.getAbsoluteFile().toURI()), 0, nationFileWithReplicatedRows.length(), nationFileWithReplicatedRows.length(), createSchema(), columnHandles, tupleDomain, Optional.empty(), deleteDeltaLocations, Optional.empty(), Optional.empty(), null, false, -1L).get();
    int nationKeyColumn = columnNames.indexOf("n_nationkey");
    int nameColumn = columnNames.indexOf("n_name");
    int regionKeyColumn = columnNames.indexOf("n_regionkey");
    int commentColumn = columnNames.indexOf("n_comment");
    ImmutableList.Builder<Nation> rows = ImmutableList.builder();
    while (!pageSource.isFinished()) {
        Page page = pageSource.getNextPage();
        if (page == null) {
            continue;
        }
        page = page.getLoadedPage();
        for (int position = 0; position < page.getPositionCount(); position++) {
            long nationKey = -42;
            if (nationKeyColumn >= 0) {
                nationKey = BIGINT.getLong(page.getBlock(nationKeyColumn), position);
            }
            String name = "<not read>";
            if (nameColumn >= 0) {
                name = VARCHAR.getSlice(page.getBlock(nameColumn), position).toStringUtf8();
            }
            long regionKey = -42;
            if (regionKeyColumn >= 0) {
                regionKey = BIGINT.getLong(page.getBlock(regionKeyColumn), position);
            }
            String comment = "<not read>";
            if (commentColumn >= 0) {
                comment = VARCHAR.getSlice(page.getBlock(commentColumn), position).toStringUtf8();
            }
            rows.add(new Nation(position, nationKey, name, regionKey, comment));
        }
    }
    return rows.build();
}
Also used : HiveType.toHiveType(io.prestosql.plugin.hive.HiveType.toHiveType) Nation(io.airlift.tpch.Nation) Test(org.testng.annotations.Test) HiveColumnHandle(io.prestosql.plugin.hive.HiveColumnHandle) HiveConfig(io.prestosql.plugin.hive.HiveConfig) HiveTestUtils(io.prestosql.plugin.hive.HiveTestUtils) Configuration(org.apache.hadoop.conf.Configuration) Duration(java.time.Duration) Map(java.util.Map) Path(org.apache.hadoop.fs.Path) Type(io.prestosql.spi.type.Type) LongPredicate(java.util.function.LongPredicate) COMMENT(io.airlift.tpch.NationColumn.COMMENT) BIGINT(io.prestosql.spi.type.BigintType.BIGINT) SERIALIZATION_LIB(org.apache.hadoop.hive.serde.serdeConstants.SERIALIZATION_LIB) ImmutableMap(com.google.common.collect.ImmutableMap) MetadataManager.createTestMetadataManager(io.prestosql.metadata.MetadataManager.createTestMetadataManager) Collections.nCopies(java.util.Collections.nCopies) ImmutableList.toImmutableList(com.google.common.collect.ImmutableList.toImmutableList) Set(java.util.Set) AcidUtils.deleteDeltaSubdir(org.apache.hadoop.hive.ql.io.AcidUtils.deleteDeltaSubdir) Metadata(io.prestosql.metadata.Metadata) FileFormatDataSourceStats(io.prestosql.plugin.hive.FileFormatDataSourceStats) List(java.util.List) ConnectorPageSource(io.prestosql.spi.connector.ConnectorPageSource) Domain(io.prestosql.spi.predicate.Domain) Optional(java.util.Optional) ORC(io.prestosql.plugin.hive.HiveStorageFormat.ORC) NAME(io.airlift.tpch.NationColumn.NAME) NationColumn(io.airlift.tpch.NationColumn) HiveTypeTranslator(io.prestosql.plugin.hive.HiveTypeTranslator) Assert.assertEquals(org.testng.Assert.assertEquals) ArrayList(java.util.ArrayList) NationGenerator(io.airlift.tpch.NationGenerator) OptionalLong(java.util.OptionalLong) REGULAR(io.prestosql.plugin.hive.HiveColumnHandle.ColumnType.REGULAR) VARCHAR(io.prestosql.spi.type.VarcharType.VARCHAR) ImmutableList(com.google.common.collect.ImmutableList) HivePageSourceFactory(io.prestosql.plugin.hive.HivePageSourceFactory) InternalTypeManager(io.prestosql.type.InternalTypeManager) Properties(java.util.Properties) DeleteDeltaLocations(io.prestosql.plugin.hive.DeleteDeltaLocations) TupleDomain(io.prestosql.spi.predicate.TupleDomain) TypeManager(io.prestosql.spi.type.TypeManager) Page(io.prestosql.spi.Page) TABLE_IS_TRANSACTIONAL(org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.TABLE_IS_TRANSACTIONAL) File(java.io.File) JobConf(org.apache.hadoop.mapred.JobConf) NATION_KEY(io.airlift.tpch.NationColumn.NATION_KEY) OrcCacheStore(io.prestosql.orc.OrcCacheStore) REGION_KEY(io.airlift.tpch.NationColumn.REGION_KEY) FILE_INPUT_FORMAT(org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.FILE_INPUT_FORMAT) Path(org.apache.hadoop.fs.Path) Nation(io.airlift.tpch.Nation) Configuration(org.apache.hadoop.conf.Configuration) ImmutableList.toImmutableList(com.google.common.collect.ImmutableList.toImmutableList) ImmutableList(com.google.common.collect.ImmutableList) Page(io.prestosql.spi.Page) ConnectorPageSource(io.prestosql.spi.connector.ConnectorPageSource) File(java.io.File) JobConf(org.apache.hadoop.mapred.JobConf) HiveColumnHandle(io.prestosql.plugin.hive.HiveColumnHandle)

Example 17 with ConnectorPageSource

use of io.prestosql.spi.connector.ConnectorPageSource in project hetu-core by openlookeng.

the class HivePageSourceProvider method createPageSourceInternal.

private ConnectorPageSource createPageSourceInternal(ConnectorSession session, Optional<DynamicFilterSupplier> dynamicFilterSupplier, List<Map<ColumnHandle, DynamicFilter>> dynamicFilters, HiveTableHandle hiveTable, List<HiveColumnHandle> hiveColumns, HiveSplit hiveSplit) {
    Path path = new Path(hiveSplit.getPath());
    List<Set<DynamicFilter>> dynamicFilterList = new ArrayList();
    if (dynamicFilters != null) {
        for (Map<ColumnHandle, DynamicFilter> df : dynamicFilters) {
            Set<DynamicFilter> values = df.values().stream().collect(Collectors.toSet());
            dynamicFilterList.add(values);
        }
    }
    // Filter out splits using partition values and dynamic filters
    if (dynamicFilters != null && !dynamicFilters.isEmpty() && isPartitionFiltered(hiveSplit.getPartitionKeys(), dynamicFilterList, typeManager)) {
        return new FixedPageSource(ImmutableList.of());
    }
    Configuration configuration = hdfsEnvironment.getConfiguration(new HdfsEnvironment.HdfsContext(session, hiveSplit.getDatabase(), hiveSplit.getTable()), path);
    Properties schema = hiveSplit.getSchema();
    String columnNameDelimiter = schema.containsKey(serdeConstants.COLUMN_NAME_DELIMITER) ? schema.getProperty(serdeConstants.COLUMN_NAME_DELIMITER) : String.valueOf(SerDeUtils.COMMA);
    List<String> partitionColumnNames;
    if (schema.containsKey(META_PARTITION_COLUMNS)) {
        partitionColumnNames = Arrays.asList(schema.getProperty(META_PARTITION_COLUMNS).split(columnNameDelimiter));
    } else if (schema.containsKey(META_TABLE_COLUMNS)) {
        partitionColumnNames = Arrays.asList(schema.getProperty(META_TABLE_COLUMNS).split(columnNameDelimiter));
    } else {
        partitionColumnNames = new ArrayList<>();
    }
    List<String> tableColumns = hiveColumns.stream().map(cols -> cols.getName()).collect(toList());
    List<String> missingColumns = tableColumns.stream().skip(partitionColumnNames.size()).collect(toList());
    List<IndexMetadata> indexes = new ArrayList<>();
    if (indexCache != null && session.isHeuristicIndexFilterEnabled()) {
        indexes.addAll(this.indexCache.getIndices(session.getCatalog().orElse(null), hiveTable.getSchemaTableName().toString(), hiveSplit, hiveTable.getCompactEffectivePredicate(), hiveTable.getPartitionColumns()));
        /* Bloom/Bitmap indices are checked for given table and added to the possible matchers for pushdown. */
        if (hiveTable.getDisjunctCompactEffectivePredicate().isPresent() && hiveTable.getDisjunctCompactEffectivePredicate().get().size() > 0) {
            hiveTable.getDisjunctCompactEffectivePredicate().get().forEach(orPredicate -> indexes.addAll(this.indexCache.getIndices(session.getCatalog().orElse(null), hiveTable.getSchemaTableName().toString(), hiveSplit, orPredicate, hiveTable.getPartitionColumns())));
        }
    }
    Optional<List<IndexMetadata>> indexOptional = indexes == null || indexes.isEmpty() ? Optional.empty() : Optional.of(indexes);
    URI splitUri = URI.create(URIUtil.encodePath(hiveSplit.getPath()));
    SplitMetadata splitMetadata = new SplitMetadata(splitUri.getRawPath(), hiveSplit.getLastModifiedTime());
    TupleDomain<HiveColumnHandle> predicate = TupleDomain.all();
    if (dynamicFilterSupplier.isPresent() && dynamicFilters != null && !dynamicFilters.isEmpty()) {
        if (dynamicFilters.size() == 1) {
            List<HiveColumnHandle> filteredHiveColumnHandles = hiveColumns.stream().filter(column -> dynamicFilters.get(0).containsKey(column)).collect(toList());
            HiveColumnHandle hiveColumnHandle = filteredHiveColumnHandles.get(0);
            Type type = hiveColumnHandle.getColumnMetadata(typeManager).getType();
            predicate = getPredicate(dynamicFilters.get(0).get(hiveColumnHandle), type, hiveColumnHandle);
            if (predicate.isNone()) {
                predicate = TupleDomain.all();
            }
        }
    }
    /**
     * This is main logical division point to process filter pushdown enabled case (aka as selective read flow).
     * If user configuration orc_predicate_pushdown_enabled is true and if all clause of query can be handled by hive
     * selective read flow, then hiveTable.isSuitableToPush() will be enabled.
     * (Refer HiveMetadata.checkIfSuitableToPush).
     */
    if (hiveTable.isSuitableToPush()) {
        return createSelectivePageSource(selectivePageSourceFactories, configuration, session, hiveSplit, assignUniqueIndicesToPartitionColumns(hiveColumns), typeManager, dynamicFilterSupplier, hiveSplit.getDeleteDeltaLocations(), hiveSplit.getStartRowOffsetOfFile(), indexOptional, hiveSplit.isCacheable(), hiveTable.getCompactEffectivePredicate(), hiveTable.getPredicateColumns(), hiveTable.getDisjunctCompactEffectivePredicate(), hiveSplit.getBucketConversion(), hiveSplit.getBucketNumber(), hiveSplit.getLastModifiedTime(), missingColumns);
    }
    Optional<ConnectorPageSource> pageSource = createHivePageSource(cursorProviders, pageSourceFactories, configuration, session, path, hiveSplit.getBucketNumber(), hiveSplit.getStart(), hiveSplit.getLength(), hiveSplit.getFileSize(), hiveSplit.getSchema(), hiveTable.getCompactEffectivePredicate().intersect(predicate), hiveColumns, hiveSplit.getPartitionKeys(), typeManager, hiveSplit.getColumnCoercions(), hiveSplit.getBucketConversion(), hiveSplit.isS3SelectPushdownEnabled(), dynamicFilterSupplier, hiveSplit.getDeleteDeltaLocations(), hiveSplit.getStartRowOffsetOfFile(), indexOptional, splitMetadata, hiveSplit.isCacheable(), hiveSplit.getLastModifiedTime(), hiveSplit.getCustomSplitInfo(), missingColumns);
    if (pageSource.isPresent()) {
        return pageSource.get();
    }
    throw new RuntimeException("Could not find a file reader for split " + hiveSplit);
}
Also used : Arrays(java.util.Arrays) DynamicFilter(io.prestosql.spi.dynamicfilter.DynamicFilter) BuiltInFunctionHandle(io.prestosql.spi.function.BuiltInFunctionHandle) ValueSet(io.prestosql.spi.predicate.ValueSet) Maps.uniqueIndex(com.google.common.collect.Maps.uniqueIndex) META_PARTITION_COLUMNS(io.prestosql.plugin.hive.metastore.MetastoreUtil.META_PARTITION_COLUMNS) CallExpression(io.prestosql.spi.relation.CallExpression) Preconditions.checkArgument(com.google.common.base.Preconditions.checkArgument) ConnectorSession(io.prestosql.spi.connector.ConnectorSession) HiveCoercer.createCoercer(io.prestosql.plugin.hive.coercions.HiveCoercer.createCoercer) BucketingVersion(io.prestosql.plugin.hive.HiveBucketing.BucketingVersion) FilteredDynamicFilter(io.prestosql.spi.dynamicfilter.FilteredDynamicFilter) Slices(io.airlift.slice.Slices) Configuration(org.apache.hadoop.conf.Configuration) Map(java.util.Map) Path(org.apache.hadoop.fs.Path) Type(io.prestosql.spi.type.Type) URI(java.net.URI) MAX_PARTITION_KEY_COLUMN_INDEX(io.prestosql.plugin.hive.HiveColumnHandle.MAX_PARTITION_KEY_COLUMN_INDEX) ImmutableSet(com.google.common.collect.ImmutableSet) ImmutableMap(com.google.common.collect.ImmutableMap) ImmutableList.toImmutableList(com.google.common.collect.ImmutableList.toImmutableList) org.apache.hadoop.hive.serde.serdeConstants(org.apache.hadoop.hive.serde.serdeConstants) Set(java.util.Set) Collectors(java.util.stream.Collectors) Preconditions.checkState(com.google.common.base.Preconditions.checkState) List(java.util.List) ImmutableMap.toImmutableMap(com.google.common.collect.ImmutableMap.toImmutableMap) ConnectorPageSource(io.prestosql.spi.connector.ConnectorPageSource) Domain(io.prestosql.spi.predicate.Domain) ConnectorTransactionHandle(io.prestosql.spi.connector.ConnectorTransactionHandle) URIUtil(org.eclipse.jetty.util.URIUtil) Optional(java.util.Optional) IndexMetadata(io.prestosql.spi.heuristicindex.IndexMetadata) SplitMetadata(io.prestosql.spi.heuristicindex.SplitMetadata) Slice(io.airlift.slice.Slice) FixedPageSource(io.prestosql.spi.connector.FixedPageSource) ConnectorSplit(io.prestosql.spi.connector.ConnectorSplit) META_TABLE_COLUMNS(org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_COLUMNS) OptionalInt(java.util.OptionalInt) ArrayList(java.util.ArrayList) Inject(javax.inject.Inject) HashSet(java.util.HashSet) REGULAR(io.prestosql.plugin.hive.HiveColumnHandle.ColumnType.REGULAR) ImmutableList(com.google.common.collect.ImmutableList) Range(io.prestosql.spi.predicate.Range) HiveCoercer(io.prestosql.plugin.hive.coercions.HiveCoercer) Objects.requireNonNull(java.util.Objects.requireNonNull) DynamicFilterSupplier(io.prestosql.spi.dynamicfilter.DynamicFilterSupplier) RecordCursor(io.prestosql.spi.connector.RecordCursor) Signature(io.prestosql.spi.function.Signature) SerDeUtils(org.apache.hadoop.hive.serde2.SerDeUtils) Properties(java.util.Properties) ConnectorTableHandle(io.prestosql.spi.connector.ConnectorTableHandle) TupleDomain(io.prestosql.spi.predicate.TupleDomain) TypeManager(io.prestosql.spi.type.TypeManager) HiveUtil.isPartitionFiltered(io.prestosql.plugin.hive.HiveUtil.isPartitionFiltered) CombinedDynamicFilter(io.prestosql.spi.dynamicfilter.CombinedDynamicFilter) Collectors.toList(java.util.stream.Collectors.toList) IndexCache(io.prestosql.plugin.hive.util.IndexCache) ColumnMapping.toColumnHandles(io.prestosql.plugin.hive.HivePageSourceProvider.ColumnMapping.toColumnHandles) ColumnHandle(io.prestosql.spi.connector.ColumnHandle) RowExpression(io.prestosql.spi.relation.RowExpression) RecordPageSource(io.prestosql.spi.connector.RecordPageSource) ConnectorPageSourceProvider(io.prestosql.spi.connector.ConnectorPageSourceProvider) OrcConcatPageSource(io.prestosql.plugin.hive.orc.OrcConcatPageSource) ValueSet(io.prestosql.spi.predicate.ValueSet) ImmutableSet(com.google.common.collect.ImmutableSet) Set(java.util.Set) HashSet(java.util.HashSet) Configuration(org.apache.hadoop.conf.Configuration) ArrayList(java.util.ArrayList) Properties(java.util.Properties) ConnectorPageSource(io.prestosql.spi.connector.ConnectorPageSource) URI(java.net.URI) SplitMetadata(io.prestosql.spi.heuristicindex.SplitMetadata) ImmutableList.toImmutableList(com.google.common.collect.ImmutableList.toImmutableList) List(java.util.List) ArrayList(java.util.ArrayList) ImmutableList(com.google.common.collect.ImmutableList) Collectors.toList(java.util.stream.Collectors.toList) IndexMetadata(io.prestosql.spi.heuristicindex.IndexMetadata) Path(org.apache.hadoop.fs.Path) ColumnHandle(io.prestosql.spi.connector.ColumnHandle) DynamicFilter(io.prestosql.spi.dynamicfilter.DynamicFilter) FilteredDynamicFilter(io.prestosql.spi.dynamicfilter.FilteredDynamicFilter) CombinedDynamicFilter(io.prestosql.spi.dynamicfilter.CombinedDynamicFilter) FixedPageSource(io.prestosql.spi.connector.FixedPageSource) Type(io.prestosql.spi.type.Type)

Example 18 with ConnectorPageSource

use of io.prestosql.spi.connector.ConnectorPageSource in project hetu-core by openlookeng.

the class HivePageSourceProvider method createSelectivePageSource.

/**
 * Create selective page source, which will be used for selective reader flow.
 * Unlike normal page source, selective page source required to pass below additional details to reader
 * a. Pre-filled values of all constant.
 * b. Coercion information of all columns.
 * c. Columns which required to be projected.
 * d. Total list of columns which will be read (projection + filter).
 * All these info gets used by reader.
 * @param columns List of all columns being part of scan.
 * @param effectivePredicate Predicates related to AND clause
 * @param predicateColumns Map of all columns handles being part of predicate
 * @param additionPredicates Predicates related to OR clause.
 * Remaining columns are same as for createHivePageSource.
 * @param missingColumns
 * @return
 */
private static ConnectorPageSource createSelectivePageSource(Set<HiveSelectivePageSourceFactory> selectivePageSourceFactories, Configuration configuration, ConnectorSession session, HiveSplit split, List<HiveColumnHandle> columns, TypeManager typeManager, Optional<DynamicFilterSupplier> dynamicFilterSupplier, Optional<DeleteDeltaLocations> deleteDeltaLocations, Optional<Long> startRowOffsetOfFile, Optional<List<IndexMetadata>> indexes, boolean splitCacheable, TupleDomain<HiveColumnHandle> effectivePredicate, Map<String, HiveColumnHandle> predicateColumns, Optional<List<TupleDomain<HiveColumnHandle>>> additionPredicates, Optional<HiveSplit.BucketConversion> bucketConversion, OptionalInt bucketNumber, long dataSourceLastModifiedTime, List<String> missingColumns) {
    Set<HiveColumnHandle> interimColumns = ImmutableSet.<HiveColumnHandle>builder().addAll(predicateColumns.values()).addAll(bucketConversion.map(HiveSplit.BucketConversion::getBucketColumnHandles).orElse(ImmutableList.of())).build();
    Path path = new Path(split.getPath());
    List<ColumnMapping> columnMappings = ColumnMapping.buildColumnMappings(split.getPartitionKeys(), columns, ImmutableList.copyOf(interimColumns), split.getColumnCoercions(), path, bucketNumber, true, missingColumns);
    List<ColumnMapping> regularAndInterimColumnMappings = ColumnMapping.extractRegularAndInterimColumnMappings(columnMappings);
    Optional<BucketAdaptation> bucketAdaptation = toBucketAdaptation(bucketConversion, regularAndInterimColumnMappings, bucketNumber);
    checkArgument(!bucketAdaptation.isPresent(), "Bucket conversion is not yet supported");
    // Make a list of all PREFILLED columns, which can be passed to reader. Unlike normal flow, selective read
    // flow require to pass this below at reader level as we need to make block of all column values.
    Map<Integer, String> prefilledValues = columnMappings.stream().filter(mapping -> mapping.getKind() == ColumnMappingKind.PREFILLED).collect(toImmutableMap(mapping -> mapping.getHiveColumnHandle().getHiveColumnIndex(), ColumnMapping::getPrefilledValue));
    // Make a map of column required to be coerced. This also needs to be sent to reader level as coercion
    // should be applied before adding values in block.
    Map<Integer, HiveCoercer> coercers = columnMappings.stream().filter(mapping -> mapping.getCoercionFrom().isPresent()).collect(toImmutableMap(mapping -> mapping.getHiveColumnHandle().getHiveColumnIndex(), mapping -> createCoercer(typeManager, mapping.getCoercionFrom().get(), mapping.getHiveColumnHandle().getHiveType())));
    List<Integer> outputColumns = columns.stream().map(HiveColumnHandle::getHiveColumnIndex).collect(toImmutableList());
    for (HiveSelectivePageSourceFactory pageSourceFactory : selectivePageSourceFactories) {
        Optional<? extends ConnectorPageSource> pageSource = pageSourceFactory.createPageSource(configuration, session, path, split.getStart(), split.getLength(), split.getFileSize(), split.getSchema(), toColumnHandles(columnMappings, true), prefilledValues, outputColumns, effectivePredicate, additionPredicates, deleteDeltaLocations, startRowOffsetOfFile, indexes, splitCacheable, columnMappings, coercers, dataSourceLastModifiedTime);
        if (pageSource.isPresent()) {
            return new HivePageSource(columnMappings, Optional.empty(), typeManager, pageSource.get(), dynamicFilterSupplier, session, split.getPartitionKeys());
        }
    }
    throw new IllegalStateException("Could not find a file reader for split " + split);
}
Also used : Path(org.apache.hadoop.fs.Path) Arrays(java.util.Arrays) DynamicFilter(io.prestosql.spi.dynamicfilter.DynamicFilter) BuiltInFunctionHandle(io.prestosql.spi.function.BuiltInFunctionHandle) ValueSet(io.prestosql.spi.predicate.ValueSet) Maps.uniqueIndex(com.google.common.collect.Maps.uniqueIndex) META_PARTITION_COLUMNS(io.prestosql.plugin.hive.metastore.MetastoreUtil.META_PARTITION_COLUMNS) CallExpression(io.prestosql.spi.relation.CallExpression) Preconditions.checkArgument(com.google.common.base.Preconditions.checkArgument) ConnectorSession(io.prestosql.spi.connector.ConnectorSession) HiveCoercer.createCoercer(io.prestosql.plugin.hive.coercions.HiveCoercer.createCoercer) BucketingVersion(io.prestosql.plugin.hive.HiveBucketing.BucketingVersion) FilteredDynamicFilter(io.prestosql.spi.dynamicfilter.FilteredDynamicFilter) Slices(io.airlift.slice.Slices) Configuration(org.apache.hadoop.conf.Configuration) Map(java.util.Map) Path(org.apache.hadoop.fs.Path) Type(io.prestosql.spi.type.Type) URI(java.net.URI) MAX_PARTITION_KEY_COLUMN_INDEX(io.prestosql.plugin.hive.HiveColumnHandle.MAX_PARTITION_KEY_COLUMN_INDEX) ImmutableSet(com.google.common.collect.ImmutableSet) ImmutableMap(com.google.common.collect.ImmutableMap) ImmutableList.toImmutableList(com.google.common.collect.ImmutableList.toImmutableList) org.apache.hadoop.hive.serde.serdeConstants(org.apache.hadoop.hive.serde.serdeConstants) Set(java.util.Set) Collectors(java.util.stream.Collectors) Preconditions.checkState(com.google.common.base.Preconditions.checkState) List(java.util.List) ImmutableMap.toImmutableMap(com.google.common.collect.ImmutableMap.toImmutableMap) ConnectorPageSource(io.prestosql.spi.connector.ConnectorPageSource) Domain(io.prestosql.spi.predicate.Domain) ConnectorTransactionHandle(io.prestosql.spi.connector.ConnectorTransactionHandle) URIUtil(org.eclipse.jetty.util.URIUtil) Optional(java.util.Optional) IndexMetadata(io.prestosql.spi.heuristicindex.IndexMetadata) SplitMetadata(io.prestosql.spi.heuristicindex.SplitMetadata) Slice(io.airlift.slice.Slice) FixedPageSource(io.prestosql.spi.connector.FixedPageSource) ConnectorSplit(io.prestosql.spi.connector.ConnectorSplit) META_TABLE_COLUMNS(org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_COLUMNS) OptionalInt(java.util.OptionalInt) ArrayList(java.util.ArrayList) Inject(javax.inject.Inject) HashSet(java.util.HashSet) REGULAR(io.prestosql.plugin.hive.HiveColumnHandle.ColumnType.REGULAR) ImmutableList(com.google.common.collect.ImmutableList) Range(io.prestosql.spi.predicate.Range) HiveCoercer(io.prestosql.plugin.hive.coercions.HiveCoercer) Objects.requireNonNull(java.util.Objects.requireNonNull) DynamicFilterSupplier(io.prestosql.spi.dynamicfilter.DynamicFilterSupplier) RecordCursor(io.prestosql.spi.connector.RecordCursor) Signature(io.prestosql.spi.function.Signature) SerDeUtils(org.apache.hadoop.hive.serde2.SerDeUtils) Properties(java.util.Properties) ConnectorTableHandle(io.prestosql.spi.connector.ConnectorTableHandle) TupleDomain(io.prestosql.spi.predicate.TupleDomain) TypeManager(io.prestosql.spi.type.TypeManager) HiveUtil.isPartitionFiltered(io.prestosql.plugin.hive.HiveUtil.isPartitionFiltered) CombinedDynamicFilter(io.prestosql.spi.dynamicfilter.CombinedDynamicFilter) Collectors.toList(java.util.stream.Collectors.toList) IndexCache(io.prestosql.plugin.hive.util.IndexCache) ColumnMapping.toColumnHandles(io.prestosql.plugin.hive.HivePageSourceProvider.ColumnMapping.toColumnHandles) ColumnHandle(io.prestosql.spi.connector.ColumnHandle) RowExpression(io.prestosql.spi.relation.RowExpression) RecordPageSource(io.prestosql.spi.connector.RecordPageSource) ConnectorPageSourceProvider(io.prestosql.spi.connector.ConnectorPageSourceProvider) OrcConcatPageSource(io.prestosql.plugin.hive.orc.OrcConcatPageSource) HiveCoercer(io.prestosql.plugin.hive.coercions.HiveCoercer)

Example 19 with ConnectorPageSource

use of io.prestosql.spi.connector.ConnectorPageSource in project boostkit-bigdata by kunpengcompute.

the class AbstractTestHive method testGetRecordsUnpartitioned.

@Test
public void testGetRecordsUnpartitioned() throws Exception {
    try (Transaction transaction = newTransaction()) {
        ConnectorMetadata metadata = transaction.getMetadata();
        ConnectorSession session = newSession();
        metadata.beginQuery(session);
        ConnectorTableHandle tableHandle = getTableHandle(metadata, tableUnpartitioned);
        List<ColumnHandle> columnHandles = ImmutableList.copyOf(metadata.getColumnHandles(session, tableHandle).values());
        Map<String, Integer> columnIndex = indexColumns(columnHandles);
        List<ConnectorSplit> splits = getAllSplits(tableHandle, transaction, session);
        assertThat(splits).hasSameSizeAs(tableUnpartitionedPartitions);
        for (ConnectorSplit split : splits) {
            HiveSplit hiveSplit = HiveSplitWrapper.getOnlyHiveSplit(split);
            assertEquals(hiveSplit.getPartitionKeys(), ImmutableList.of());
            long rowNumber = 0;
            try (ConnectorPageSource pageSource = pageSourceProvider.createPageSource(transaction.getTransactionHandle(), session, split, tableHandle, columnHandles)) {
                assertPageSourceType(pageSource, TEXTFILE);
                MaterializedResult result = materializeSourceDataStream(session, pageSource, getTypes(columnHandles));
                for (MaterializedRow row : result) {
                    rowNumber++;
                    if (rowNumber % 19 == 0) {
                        assertNull(row.getField(columnIndex.get("t_string")));
                    } else if (rowNumber % 19 == 1) {
                        assertEquals(row.getField(columnIndex.get("t_string")), "");
                    } else {
                        assertEquals(row.getField(columnIndex.get("t_string")), "unpartitioned");
                    }
                    assertEquals(row.getField(columnIndex.get("t_tinyint")), (byte) (1 + rowNumber));
                }
            }
            assertEquals(rowNumber, 100);
        }
    }
}
Also used : HiveColumnHandle.bucketColumnHandle(io.prestosql.plugin.hive.HiveColumnHandle.bucketColumnHandle) ColumnHandle(io.prestosql.spi.connector.ColumnHandle) ConnectorPageSource(io.prestosql.spi.connector.ConnectorPageSource) ConnectorTableHandle(io.prestosql.spi.connector.ConnectorTableHandle) ConnectorSession(io.prestosql.spi.connector.ConnectorSession) TestingConnectorSession(io.prestosql.testing.TestingConnectorSession) ConnectorMetadata(io.prestosql.spi.connector.ConnectorMetadata) MaterializedResult(io.prestosql.testing.MaterializedResult) ConnectorSplit(io.prestosql.spi.connector.ConnectorSplit) MaterializedRow(io.prestosql.testing.MaterializedRow) Test(org.testng.annotations.Test)

Example 20 with ConnectorPageSource

use of io.prestosql.spi.connector.ConnectorPageSource in project boostkit-bigdata by kunpengcompute.

the class AbstractTestHiveFileSystem method createTable.

private void createTable(SchemaTableName tableName, HiveStorageFormat storageFormat) throws Exception {
    List<ColumnMetadata> columns = ImmutableList.<ColumnMetadata>builder().add(new ColumnMetadata("id", BIGINT)).build();
    MaterializedResult data = MaterializedResult.resultBuilder(newSession(), BIGINT).row(1L).row(3L).row(2L).build();
    try (Transaction transaction = newTransaction()) {
        ConnectorMetadata metadata = transaction.getMetadata();
        ConnectorSession session = newSession();
        // begin creating the table
        ConnectorTableMetadata tableMetadata = new ConnectorTableMetadata(tableName, columns, createTableProperties(storageFormat));
        ConnectorOutputTableHandle outputHandle = metadata.beginCreateTable(session, tableMetadata, Optional.empty());
        // write the records
        ConnectorPageSink sink = pageSinkProvider.createPageSink(transaction.getTransactionHandle(), session, outputHandle);
        sink.appendPage(data.toPage());
        Collection<Slice> fragments = getFutureValue(sink.finish());
        // commit the table
        metadata.finishCreateTable(session, outputHandle, fragments, ImmutableList.of());
        transaction.commit();
        // Hack to work around the metastore not being configured for S3 or other FS.
        // The metastore tries to validate the location when creating the
        // table, which fails without explicit configuration for file system.
        // We work around that by using a dummy location when creating the
        // table and update it here to the correct location.
        metastoreClient.updateTableLocation(database, tableName.getTableName(), locationService.getTableWriteInfo(((HiveOutputTableHandle) outputHandle).getLocationHandle(), false).getTargetPath().toString());
    }
    try (Transaction transaction = newTransaction()) {
        ConnectorMetadata metadata = transaction.getMetadata();
        ConnectorSession session = newSession();
        // load the new table
        ConnectorTableHandle tableHandle = getTableHandle(metadata, tableName);
        List<ColumnHandle> columnHandles = filterNonHiddenColumnHandles(metadata.getColumnHandles(session, tableHandle).values());
        // verify the metadata
        ConnectorTableMetadata tableMetadata = metadata.getTableMetadata(session, getTableHandle(metadata, tableName));
        assertEquals(filterNonHiddenColumnMetadata(tableMetadata.getColumns()), columns);
        // verify the data
        ConnectorSplitSource splitSource = splitManager.getSplits(transaction.getTransactionHandle(), session, tableHandle, UNGROUPED_SCHEDULING);
        ConnectorSplit split = getOnlyElement(getAllSplits(splitSource));
        try (ConnectorPageSource pageSource = pageSourceProvider.createPageSource(transaction.getTransactionHandle(), session, split, tableHandle, columnHandles)) {
            MaterializedResult result = materializeSourceDataStream(session, pageSource, getTypes(columnHandles));
            assertEqualsIgnoreOrder(result.getMaterializedRows(), data.getMaterializedRows());
        }
    }
}
Also used : ColumnHandle(io.prestosql.spi.connector.ColumnHandle) AbstractTestHive.filterNonHiddenColumnMetadata(io.prestosql.plugin.hive.AbstractTestHive.filterNonHiddenColumnMetadata) ColumnMetadata(io.prestosql.spi.connector.ColumnMetadata) ConnectorSplitSource(io.prestosql.spi.connector.ConnectorSplitSource) ConnectorPageSource(io.prestosql.spi.connector.ConnectorPageSource) ConnectorTableHandle(io.prestosql.spi.connector.ConnectorTableHandle) ConnectorOutputTableHandle(io.prestosql.spi.connector.ConnectorOutputTableHandle) HiveTransaction(io.prestosql.plugin.hive.AbstractTestHive.HiveTransaction) Transaction(io.prestosql.plugin.hive.AbstractTestHive.Transaction) Slice(io.airlift.slice.Slice) ConnectorSession(io.prestosql.spi.connector.ConnectorSession) TestingConnectorSession(io.prestosql.testing.TestingConnectorSession) ConnectorMetadata(io.prestosql.spi.connector.ConnectorMetadata) MaterializedResult(io.prestosql.testing.MaterializedResult) ConnectorPageSink(io.prestosql.spi.connector.ConnectorPageSink) ConnectorSplit(io.prestosql.spi.connector.ConnectorSplit) ConnectorTableMetadata(io.prestosql.spi.connector.ConnectorTableMetadata)

Aggregations

ConnectorPageSource (io.prestosql.spi.connector.ConnectorPageSource)52 ConnectorSession (io.prestosql.spi.connector.ConnectorSession)33 TestingConnectorSession (io.prestosql.testing.TestingConnectorSession)32 ColumnHandle (io.prestosql.spi.connector.ColumnHandle)28 ConnectorTableHandle (io.prestosql.spi.connector.ConnectorTableHandle)27 Test (org.testng.annotations.Test)26 ImmutableList (com.google.common.collect.ImmutableList)24 ConnectorSplit (io.prestosql.spi.connector.ConnectorSplit)21 List (java.util.List)20 MaterializedResult (io.prestosql.testing.MaterializedResult)18 Page (io.prestosql.spi.Page)17 ConnectorMetadata (io.prestosql.spi.connector.ConnectorMetadata)16 Optional (java.util.Optional)16 Properties (java.util.Properties)16 ImmutableList.toImmutableList (com.google.common.collect.ImmutableList.toImmutableList)14 HiveColumnHandle.bucketColumnHandle (io.prestosql.plugin.hive.HiveColumnHandle.bucketColumnHandle)14 IOException (java.io.IOException)14 Collectors.toList (java.util.stream.Collectors.toList)14 ImmutableMap (com.google.common.collect.ImmutableMap)13 PrestoException (io.prestosql.spi.PrestoException)13