Search in sources :

Example 1 with SplitContext

use of com.facebook.presto.spi.SplitContext in project presto by prestodb.

the class RaptorPageSourceProvider method createPageSource.

@Override
public ConnectorPageSource createPageSource(ConnectorTransactionHandle transactionHandle, ConnectorSession session, ConnectorSplit split, List<ColumnHandle> columns, SplitContext splitContext) {
    RaptorSplit raptorSplit = (RaptorSplit) split;
    OptionalInt bucketNumber = raptorSplit.getBucketNumber();
    TupleDomain<RaptorColumnHandle> predicate = raptorSplit.getEffectivePredicate();
    ReaderAttributes attributes = ReaderAttributes.from(session);
    OptionalLong transactionId = raptorSplit.getTransactionId();
    Optional<Map<String, Type>> columnTypes = raptorSplit.getColumnTypes();
    boolean tableSupportsDeltaDelete = raptorSplit.isTableSupportsDeltaDelete();
    HdfsContext context = new HdfsContext(session);
    Map<UUID, UUID> shardDeltaMap = raptorSplit.getShardDeltaMap();
    if (raptorSplit.getShardUuids().size() == 1) {
        UUID shardUuid = raptorSplit.getShardUuids().iterator().next();
        return createPageSource(context, DEFAULT_HIVE_FILE_CONTEXT, shardUuid, Optional.ofNullable(shardDeltaMap.get(shardUuid)), tableSupportsDeltaDelete, bucketNumber, columns, predicate, attributes, transactionId, columnTypes);
    }
    Iterator<ConnectorPageSource> iterator = raptorSplit.getShardUuids().stream().map(shardUuid -> createPageSource(context, DEFAULT_HIVE_FILE_CONTEXT, shardUuid, Optional.ofNullable(shardDeltaMap.get(shardUuid)), tableSupportsDeltaDelete, bucketNumber, columns, predicate, attributes, transactionId, columnTypes)).iterator();
    return new ConcatPageSource(iterator);
}
Also used : OptionalInt(java.util.OptionalInt) ConnectorTransactionHandle(com.facebook.presto.spi.connector.ConnectorTransactionHandle) Inject(javax.inject.Inject) OptionalLong(java.util.OptionalLong) SplitContext(com.facebook.presto.spi.SplitContext) Map(java.util.Map) Objects.requireNonNull(java.util.Objects.requireNonNull) ReaderAttributes(com.facebook.presto.raptor.storage.ReaderAttributes) DEFAULT_HIVE_FILE_CONTEXT(com.facebook.presto.hive.HiveFileContext.DEFAULT_HIVE_FILE_CONTEXT) StorageManager(com.facebook.presto.raptor.storage.StorageManager) HdfsContext(com.facebook.presto.hive.HdfsContext) ConnectorPageSourceProvider(com.facebook.presto.spi.connector.ConnectorPageSourceProvider) Type(com.facebook.presto.common.type.Type) ConcatPageSource(com.facebook.presto.raptor.util.ConcatPageSource) Iterator(java.util.Iterator) HiveFileContext(com.facebook.presto.hive.HiveFileContext) UUID(java.util.UUID) TupleDomain(com.facebook.presto.common.predicate.TupleDomain) ConnectorSession(com.facebook.presto.spi.ConnectorSession) ConnectorSplit(com.facebook.presto.spi.ConnectorSplit) List(java.util.List) Collectors.toList(java.util.stream.Collectors.toList) ConnectorPageSource(com.facebook.presto.spi.ConnectorPageSource) ColumnHandle(com.facebook.presto.spi.ColumnHandle) Optional(java.util.Optional) ConcatPageSource(com.facebook.presto.raptor.util.ConcatPageSource) OptionalInt(java.util.OptionalInt) ConnectorPageSource(com.facebook.presto.spi.ConnectorPageSource) ReaderAttributes(com.facebook.presto.raptor.storage.ReaderAttributes) OptionalLong(java.util.OptionalLong) HdfsContext(com.facebook.presto.hive.HdfsContext) UUID(java.util.UUID) Map(java.util.Map)

Example 2 with SplitContext

use of com.facebook.presto.spi.SplitContext in project presto by prestodb.

the class TestDynamicPruning method testDynamicBucketPruning.

@Test
public void testDynamicBucketPruning() {
    HiveClientConfig config = new HiveClientConfig();
    MetastoreClientConfig metastoreClientConfig = new MetastoreClientConfig();
    HiveTransactionHandle transaction = new HiveTransactionHandle();
    try (TempFile tempFile = new TempFile()) {
        ConnectorPageSource emptyPageSource = createTestingPageSource(transaction, config, new SplitContext(false, getToSkipTupleDomainForPartition()), metastoreClientConfig, tempFile.file());
        assertEquals(emptyPageSource.getClass(), HiveEmptySplitPageSource.class);
        ConnectorPageSource nonEmptyPageSource = createTestingPageSource(transaction, config, new SplitContext(false, getToKeepTupleDomainForPartition()), metastoreClientConfig, tempFile.file());
        assertEquals(nonEmptyPageSource.getClass(), HivePageSource.class);
    } catch (IOException e) {
        e.printStackTrace();
        fail();
    }
}
Also used : TempFile(com.facebook.airlift.testing.TempFile) SplitContext(com.facebook.presto.spi.SplitContext) IOException(java.io.IOException) ConnectorPageSource(com.facebook.presto.spi.ConnectorPageSource) Test(org.testng.annotations.Test)

Example 3 with SplitContext

use of com.facebook.presto.spi.SplitContext in project presto by prestodb.

the class NodeScheduler method selectDistributionNodes.

public static SplitPlacementResult selectDistributionNodes(NodeMap nodeMap, NodeTaskMap nodeTaskMap, long maxSplitsWeightPerNode, long maxPendingSplitsWeightPerTask, int maxUnacknowledgedSplitsPerTask, Set<Split> splits, List<RemoteTask> existingTasks, BucketNodeMap bucketNodeMap, NodeSelectionStats nodeSelectionStats) {
    Multimap<InternalNode, Split> assignments = HashMultimap.create();
    NodeAssignmentStats assignmentStats = new NodeAssignmentStats(nodeTaskMap, nodeMap, existingTasks);
    Set<InternalNode> blockedNodes = new HashSet<>();
    for (Split split : splits) {
        // node placement is forced by the bucket to node map
        InternalNode node = bucketNodeMap.getAssignedNode(split).get();
        boolean isCacheable = bucketNodeMap.isSplitCacheable(split);
        SplitWeight splitWeight = split.getSplitWeight();
        // if node is full, don't schedule now, which will push back on the scheduling of splits
        if (canAssignSplitToDistributionNode(assignmentStats, node, maxSplitsWeightPerNode, maxPendingSplitsWeightPerTask, maxUnacknowledgedSplitsPerTask, splitWeight)) {
            if (isCacheable) {
                split = new Split(split.getConnectorId(), split.getTransactionHandle(), split.getConnectorSplit(), split.getLifespan(), new SplitContext(true));
                nodeSelectionStats.incrementBucketedPreferredNodeSelectedCount();
            } else {
                nodeSelectionStats.incrementBucketedNonPreferredNodeSelectedCount();
            }
            assignments.put(node, split);
            assignmentStats.addAssignedSplit(node, splitWeight);
        } else {
            blockedNodes.add(node);
        }
    }
    ListenableFuture<?> blocked = toWhenHasSplitQueueSpaceFuture(blockedNodes, existingTasks, calculateLowWatermark(maxPendingSplitsWeightPerTask));
    return new SplitPlacementResult(blocked, ImmutableMultimap.copyOf(assignments));
}
Also used : SplitWeight(com.facebook.presto.spi.SplitWeight) SplitContext(com.facebook.presto.spi.SplitContext) InternalNode(com.facebook.presto.metadata.InternalNode) Split(com.facebook.presto.metadata.Split) HashSet(java.util.HashSet) LinkedHashSet(java.util.LinkedHashSet)

Example 4 with SplitContext

use of com.facebook.presto.spi.SplitContext in project presto by prestodb.

the class SimpleNodeSelector method computeAssignments.

@Override
public SplitPlacementResult computeAssignments(Set<Split> splits, List<RemoteTask> existingTasks) {
    Multimap<InternalNode, Split> assignment = HashMultimap.create();
    NodeMap nodeMap = this.nodeMap.get().get();
    NodeAssignmentStats assignmentStats = new NodeAssignmentStats(nodeTaskMap, nodeMap, existingTasks);
    List<InternalNode> eligibleNodes = getEligibleNodes(maxTasksPerStage, nodeMap, existingTasks);
    NodeSelection randomNodeSelection = new RandomNodeSelection(eligibleNodes, minCandidates);
    Set<InternalNode> blockedExactNodes = new HashSet<>();
    boolean splitWaitingForAnyNode = false;
    NodeProvider nodeProvider = nodeMap.getActiveNodeProvider(nodeSelectionHashStrategy);
    OptionalInt preferredNodeCount = OptionalInt.empty();
    for (Split split : splits) {
        List<InternalNode> candidateNodes;
        switch(split.getNodeSelectionStrategy()) {
            case HARD_AFFINITY:
                candidateNodes = selectExactNodes(nodeMap, split.getPreferredNodes(nodeProvider), includeCoordinator);
                preferredNodeCount = OptionalInt.of(candidateNodes.size());
                break;
            case SOFT_AFFINITY:
                // Using all nodes for soft affinity scheduling with modular hashing because otherwise temporarily down nodes would trigger too much rehashing
                if (nodeSelectionHashStrategy == MODULAR_HASHING) {
                    nodeProvider = new ModularHashingNodeProvider(nodeMap.getAllNodes());
                }
                candidateNodes = selectExactNodes(nodeMap, split.getPreferredNodes(nodeProvider), includeCoordinator);
                preferredNodeCount = OptionalInt.of(candidateNodes.size());
                candidateNodes = ImmutableList.<InternalNode>builder().addAll(candidateNodes).addAll(randomNodeSelection.pickNodes(split)).build();
                break;
            case NO_PREFERENCE:
                candidateNodes = randomNodeSelection.pickNodes(split);
                break;
            default:
                throw new PrestoException(NODE_SELECTION_NOT_SUPPORTED, format("Unsupported node selection strategy %s", split.getNodeSelectionStrategy()));
        }
        if (candidateNodes.isEmpty()) {
            log.debug("No nodes available to schedule %s. Available nodes %s", split, nodeMap.getActiveNodes());
            throw new PrestoException(NO_NODES_AVAILABLE, "No nodes available to run query");
        }
        SplitWeight splitWeight = split.getSplitWeight();
        Optional<InternalNodeInfo> chosenNodeInfo = chooseLeastBusyNode(splitWeight, candidateNodes, assignmentStats::getTotalSplitsWeight, preferredNodeCount, maxSplitsWeightPerNode, assignmentStats);
        if (!chosenNodeInfo.isPresent()) {
            chosenNodeInfo = chooseLeastBusyNode(splitWeight, candidateNodes, assignmentStats::getQueuedSplitsWeightForStage, preferredNodeCount, maxPendingSplitsWeightPerTask, assignmentStats);
        }
        if (chosenNodeInfo.isPresent()) {
            split = new Split(split.getConnectorId(), split.getTransactionHandle(), split.getConnectorSplit(), split.getLifespan(), new SplitContext(chosenNodeInfo.get().isCacheable()));
            InternalNode chosenNode = chosenNodeInfo.get().getInternalNode();
            assignment.put(chosenNode, split);
            assignmentStats.addAssignedSplit(chosenNode, splitWeight);
        } else {
            if (split.getNodeSelectionStrategy() != HARD_AFFINITY) {
                splitWaitingForAnyNode = true;
            } else // Exact node set won't matter, if a split is waiting for any node
            if (!splitWaitingForAnyNode) {
                blockedExactNodes.addAll(candidateNodes);
            }
        }
    }
    ListenableFuture<?> blocked;
    if (splitWaitingForAnyNode) {
        blocked = toWhenHasSplitQueueSpaceFuture(existingTasks, calculateLowWatermark(maxPendingSplitsWeightPerTask));
    } else {
        blocked = toWhenHasSplitQueueSpaceFuture(blockedExactNodes, existingTasks, calculateLowWatermark(maxPendingSplitsWeightPerTask));
    }
    return new SplitPlacementResult(blocked, assignment);
}
Also used : NodeAssignmentStats(com.facebook.presto.execution.scheduler.NodeAssignmentStats) InternalNodeInfo(com.facebook.presto.execution.scheduler.InternalNodeInfo) PrestoException(com.facebook.presto.spi.PrestoException) NodeProvider(com.facebook.presto.spi.NodeProvider) ModularHashingNodeProvider(com.facebook.presto.execution.scheduler.ModularHashingNodeProvider) OptionalInt(java.util.OptionalInt) ModularHashingNodeProvider(com.facebook.presto.execution.scheduler.ModularHashingNodeProvider) SplitWeight(com.facebook.presto.spi.SplitWeight) SplitContext(com.facebook.presto.spi.SplitContext) BucketNodeMap(com.facebook.presto.execution.scheduler.BucketNodeMap) NodeMap(com.facebook.presto.execution.scheduler.NodeMap) InternalNode(com.facebook.presto.metadata.InternalNode) Split(com.facebook.presto.metadata.Split) SplitPlacementResult(com.facebook.presto.execution.scheduler.SplitPlacementResult) Sets.newHashSet(com.google.common.collect.Sets.newHashSet) HashSet(java.util.HashSet)

Example 5 with SplitContext

use of com.facebook.presto.spi.SplitContext in project presto by prestodb.

the class HivePageSourceProvider method createSelectivePageSource.

private static Optional<ConnectorPageSource> createSelectivePageSource(Set<HiveSelectivePageSourceFactory> selectivePageSourceFactories, Configuration configuration, ConnectorSession session, HiveSplit split, HiveTableLayoutHandle layout, List<HiveColumnHandle> columns, DateTimeZone hiveStorageTimeZone, TypeManager typeManager, LoadingCache<RowExpressionCacheKey, RowExpression> rowExpressionCache, SplitContext splitContext, Optional<EncryptionInformation> encryptionInformation) {
    Set<HiveColumnHandle> interimColumns = ImmutableSet.<HiveColumnHandle>builder().addAll(layout.getPredicateColumns().values()).addAll(split.getBucketConversion().map(BucketConversion::getBucketColumnHandles).orElse(ImmutableList.of())).build();
    Set<String> columnNames = columns.stream().map(HiveColumnHandle::getName).collect(toImmutableSet());
    List<HiveColumnHandle> allColumns = ImmutableList.<HiveColumnHandle>builder().addAll(columns).addAll(interimColumns.stream().filter(column -> !columnNames.contains(column.getName())).collect(toImmutableList())).build();
    Path path = new Path(split.getPath());
    List<ColumnMapping> columnMappings = ColumnMapping.buildColumnMappings(split.getPartitionKeys(), allColumns, ImmutableList.of(), split.getTableToPartitionMapping(), path, split.getTableBucketNumber(), split.getFileSize(), split.getFileModifiedTime());
    Optional<BucketAdaptation> bucketAdaptation = split.getBucketConversion().map(conversion -> toBucketAdaptation(conversion, columnMappings, split.getTableBucketNumber(), mapping -> mapping.getHiveColumnHandle().getHiveColumnIndex()));
    Map<Integer, String> prefilledValues = columnMappings.stream().filter(mapping -> mapping.getKind() == ColumnMappingKind.PREFILLED).collect(toImmutableMap(mapping -> mapping.getHiveColumnHandle().getHiveColumnIndex(), ColumnMapping::getPrefilledValue));
    Map<Integer, HiveCoercer> coercers = columnMappings.stream().filter(mapping -> mapping.getCoercionFrom().isPresent()).collect(toImmutableMap(mapping -> mapping.getHiveColumnHandle().getHiveColumnIndex(), mapping -> createCoercer(typeManager, mapping.getCoercionFrom().get(), mapping.getHiveColumnHandle().getHiveType())));
    List<Integer> outputColumns = columns.stream().map(HiveColumnHandle::getHiveColumnIndex).collect(toImmutableList());
    RowExpression optimizedRemainingPredicate = rowExpressionCache.getUnchecked(new RowExpressionCacheKey(layout.getRemainingPredicate(), session));
    if (shouldSkipBucket(layout, split, splitContext)) {
        return Optional.of(new HiveEmptySplitPageSource());
    }
    if (shouldSkipPartition(typeManager, layout, hiveStorageTimeZone, split, splitContext)) {
        return Optional.of(new HiveEmptySplitPageSource());
    }
    CacheQuota cacheQuota = generateCacheQuota(split);
    for (HiveSelectivePageSourceFactory pageSourceFactory : selectivePageSourceFactories) {
        Optional<? extends ConnectorPageSource> pageSource = pageSourceFactory.createPageSource(configuration, session, path, split.getStart(), split.getLength(), split.getFileSize(), split.getStorage(), toColumnHandles(columnMappings, true), prefilledValues, coercers, bucketAdaptation, outputColumns, splitContext.getDynamicFilterPredicate().map(filter -> filter.transform(handle -> new Subfield(((HiveColumnHandle) handle).getName())).intersect(layout.getDomainPredicate())).orElse(layout.getDomainPredicate()), optimizedRemainingPredicate, hiveStorageTimeZone, new HiveFileContext(splitContext.isCacheable(), cacheQuota, split.getExtraFileInfo().map(BinaryExtraHiveFileInfo::new), Optional.of(split.getFileSize()), split.getFileModifiedTime(), HiveSessionProperties.isVerboseRuntimeStatsEnabled(session)), encryptionInformation);
        if (pageSource.isPresent()) {
            return Optional.of(pageSource.get());
        }
    }
    return Optional.empty();
}
Also used : NestedField(com.facebook.presto.common.Subfield.NestedField) RecordPageSource(com.facebook.presto.spi.RecordPageSource) DateTimeZone(org.joda.time.DateTimeZone) LoadingCache(com.google.common.cache.LoadingCache) HiveColumnHandle.isPushedDownSubfield(com.facebook.presto.hive.HiveColumnHandle.isPushedDownSubfield) Maps.uniqueIndex(com.google.common.collect.Maps.uniqueIndex) ConnectorTransactionHandle(com.facebook.presto.spi.connector.ConnectorTransactionHandle) ColumnMapping.toColumnHandles(com.facebook.presto.hive.HivePageSourceProvider.ColumnMapping.toColumnHandles) AGGREGATED(com.facebook.presto.hive.HiveColumnHandle.ColumnType.AGGREGATED) Preconditions.checkArgument(com.google.common.base.Preconditions.checkArgument) SchemaTableName(com.facebook.presto.spi.SchemaTableName) SplitContext(com.facebook.presto.spi.SplitContext) Configuration(org.apache.hadoop.conf.Configuration) Map(java.util.Map) Path(org.apache.hadoop.fs.Path) HiveBucketing.getHiveBucketFilter(com.facebook.presto.hive.HiveBucketing.getHiveBucketFilter) ConnectorPageSourceProvider(com.facebook.presto.spi.connector.ConnectorPageSourceProvider) BucketConversion(com.facebook.presto.hive.HiveSplit.BucketConversion) ImmutableSet(com.google.common.collect.ImmutableSet) HIVE_UNKNOWN_ERROR(com.facebook.presto.hive.HiveErrorCode.HIVE_UNKNOWN_ERROR) NullableValue(com.facebook.presto.common.predicate.NullableValue) ImmutableList.toImmutableList(com.google.common.collect.ImmutableList.toImmutableList) Set(java.util.Set) String.format(java.lang.String.format) Preconditions.checkState(com.google.common.base.Preconditions.checkState) ConnectorSession(com.facebook.presto.spi.ConnectorSession) CacheLoader(com.google.common.cache.CacheLoader) RecordCursor(com.facebook.presto.spi.RecordCursor) DataSize(io.airlift.units.DataSize) List(java.util.List) ImmutableMap.toImmutableMap(com.google.common.collect.ImmutableMap.toImmutableMap) RowExpressionService(com.facebook.presto.spi.relation.RowExpressionService) Optional(java.util.Optional) CacheBuilder(com.google.common.cache.CacheBuilder) HiveCoercer.createCoercer(com.facebook.presto.hive.HiveCoercer.createCoercer) PARTITION_KEY(com.facebook.presto.hive.HiveColumnHandle.ColumnType.PARTITION_KEY) Column(com.facebook.presto.hive.metastore.Column) REGULAR(com.facebook.presto.hive.HiveColumnHandle.ColumnType.REGULAR) ConnectorTableLayoutHandle(com.facebook.presto.spi.ConnectorTableLayoutHandle) PrestoException(com.facebook.presto.spi.PrestoException) OptionalInt(java.util.OptionalInt) Function(java.util.function.Function) HiveSessionProperties.isUseRecordPageSourceForCustomSplit(com.facebook.presto.hive.HiveSessionProperties.isUseRecordPageSourceForCustomSplit) System.identityHashCode(java.lang.System.identityHashCode) Inject(javax.inject.Inject) HashSet(java.util.HashSet) Subfield(com.facebook.presto.common.Subfield) ImmutableList(com.google.common.collect.ImmutableList) TypeManager(com.facebook.presto.common.type.TypeManager) Objects.requireNonNull(java.util.Objects.requireNonNull) ImmutableSet.toImmutableSet(com.google.common.collect.ImmutableSet.toImmutableSet) Type(com.facebook.presto.common.type.Type) RowExpression(com.facebook.presto.spi.relation.RowExpression) Storage(com.facebook.presto.hive.metastore.Storage) Properties(java.util.Properties) PathElement(com.facebook.presto.common.Subfield.PathElement) HiveUtil.getPrefilledColumnValue(com.facebook.presto.hive.HiveUtil.getPrefilledColumnValue) MetastoreUtil.reconstructPartitionSchema(com.facebook.presto.hive.metastore.MetastoreUtil.reconstructPartitionSchema) Iterables.getOnlyElement(com.google.common.collect.Iterables.getOnlyElement) Domain(com.facebook.presto.common.predicate.Domain) TupleDomain(com.facebook.presto.common.predicate.TupleDomain) OPTIMIZED(com.facebook.presto.spi.relation.ExpressionOptimizer.Level.OPTIMIZED) ConnectorSplit(com.facebook.presto.spi.ConnectorSplit) Collectors.toList(java.util.stream.Collectors.toList) ConnectorPageSource(com.facebook.presto.spi.ConnectorPageSource) ColumnHandle(com.facebook.presto.spi.ColumnHandle) SYNTHESIZED(com.facebook.presto.hive.HiveColumnHandle.ColumnType.SYNTHESIZED) HiveUtil.parsePartitionValue(com.facebook.presto.hive.HiveUtil.parsePartitionValue) VisibleForTesting(com.google.common.annotations.VisibleForTesting) MetastoreUtil.getHiveSchema(com.facebook.presto.hive.metastore.MetastoreUtil.getHiveSchema) HiveUtil.shouldUseRecordReaderFromInputFormat(com.facebook.presto.hive.HiveUtil.shouldUseRecordReaderFromInputFormat) HiveColumnHandle.isPushedDownSubfield(com.facebook.presto.hive.HiveColumnHandle.isPushedDownSubfield) Subfield(com.facebook.presto.common.Subfield) Path(org.apache.hadoop.fs.Path) RowExpression(com.facebook.presto.spi.relation.RowExpression)

Aggregations

SplitContext (com.facebook.presto.spi.SplitContext)11 ConnectorPageSource (com.facebook.presto.spi.ConnectorPageSource)7 ColumnHandle (com.facebook.presto.spi.ColumnHandle)6 ConnectorSession (com.facebook.presto.spi.ConnectorSession)6 PrestoException (com.facebook.presto.spi.PrestoException)6 TupleDomain (com.facebook.presto.common.predicate.TupleDomain)5 Type (com.facebook.presto.common.type.Type)5 ConnectorSplit (com.facebook.presto.spi.ConnectorSplit)5 ConnectorPageSourceProvider (com.facebook.presto.spi.connector.ConnectorPageSourceProvider)5 ConnectorTransactionHandle (com.facebook.presto.spi.connector.ConnectorTransactionHandle)5 List (java.util.List)5 Map (java.util.Map)5 Objects.requireNonNull (java.util.Objects.requireNonNull)5 Optional (java.util.Optional)5 Inject (javax.inject.Inject)5 Domain (com.facebook.presto.common.predicate.Domain)4 TypeManager (com.facebook.presto.common.type.TypeManager)4 Split (com.facebook.presto.metadata.Split)4 Subfield (com.facebook.presto.common.Subfield)3 HdfsContext (com.facebook.presto.hive.HdfsContext)3