Search in sources :

Example 1 with HivePartition

use of io.trino.plugin.hive.HivePartition in project trino by trinodb.

the class ThriftHiveMetastore method acquireSharedLock.

private void acquireSharedLock(HiveIdentity identity, AcidTransactionOwner transactionOwner, String queryId, long transactionId, List<SchemaTableName> fullTables, List<HivePartition> partitions, DataOperationType operation, boolean isDynamicPartitionWrite) {
    requireNonNull(operation, "operation is null");
    requireNonNull(transactionOwner, "transactionOwner is null");
    requireNonNull(queryId, "queryId is null");
    if (fullTables.isEmpty() && partitions.isEmpty()) {
        return;
    }
    LockRequestBuilder request = new LockRequestBuilder(queryId).setTransactionId(transactionId).setUser(transactionOwner.toString());
    for (SchemaTableName table : fullTables) {
        request.addLockComponent(createLockComponentForOperation(table, operation, isDynamicPartitionWrite, Optional.empty()));
    }
    for (HivePartition partition : partitions) {
        request.addLockComponent(createLockComponentForOperation(partition.getTableName(), operation, isDynamicPartitionWrite, Optional.of(partition.getPartitionId())));
    }
    acquireLock(identity, format("hive transaction %s for query %s", transactionId, queryId), request.build());
}
Also used : LockRequestBuilder(org.apache.hadoop.hive.metastore.LockRequestBuilder) SchemaTableName(io.trino.spi.connector.SchemaTableName) HivePartition(io.trino.plugin.hive.HivePartition)

Example 2 with HivePartition

use of io.trino.plugin.hive.HivePartition in project trino by trinodb.

the class MetastoreHiveStatisticsProvider method calculateDataSizeForPartitioningKey.

@VisibleForTesting
static Estimate calculateDataSizeForPartitioningKey(HiveColumnHandle column, Type type, List<HivePartition> partitions, Map<String, PartitionStatistics> statistics, double averageRowsPerPartition) {
    if (!hasDataSize(type)) {
        return Estimate.unknown();
    }
    double dataSize = 0;
    for (HivePartition partition : partitions) {
        int length = getSize(partition.getKeys().get(column));
        double rowCount = getPartitionRowCount(partition.getPartitionId(), statistics).orElse(averageRowsPerPartition);
        dataSize += length * rowCount;
    }
    return Estimate.of(dataSize);
}
Also used : HivePartition(io.trino.plugin.hive.HivePartition) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Example 3 with HivePartition

use of io.trino.plugin.hive.HivePartition in project trino by trinodb.

the class HiveTransaction method getValidWriteIds.

public ValidTxnWriteIdList getValidWriteIds(AcidTransactionOwner transactionOwner, HiveMetastoreClosure metastore, HiveTableHandle tableHandle) {
    List<SchemaTableName> lockedTables;
    List<HivePartition> lockedPartitions;
    if (tableHandle.getPartitionColumns().isEmpty() || tableHandle.getPartitions().isEmpty()) {
        lockedTables = ImmutableList.of(tableHandle.getSchemaTableName());
        lockedPartitions = ImmutableList.of();
    } else {
        lockedTables = ImmutableList.of();
        lockedPartitions = tableHandle.getPartitions().get();
    }
    // Different calls for same table might need to lock different partitions so acquire locks every time
    metastore.acquireSharedReadLock(transactionOwner, queryId, transactionId, lockedTables, lockedPartitions);
    // For repeatable reads within a query, use the same list of valid transactions for a table which have once been used
    return validHiveTransactionsForTable.computeIfAbsent(tableHandle.getSchemaTableName(), schemaTableName -> new ValidTxnWriteIdList(metastore.getValidWriteIds(ImmutableList.of(schemaTableName), transactionId)));
}
Also used : ValidTxnWriteIdList(org.apache.hadoop.hive.common.ValidTxnWriteIdList) SchemaTableName(io.trino.spi.connector.SchemaTableName) HivePartition(io.trino.plugin.hive.HivePartition)

Example 4 with HivePartition

use of io.trino.plugin.hive.HivePartition in project trino by trinodb.

the class MetastoreHiveStatisticsProvider method getPartitionsSample.

@VisibleForTesting
static List<HivePartition> getPartitionsSample(List<HivePartition> partitions, int sampleSize) {
    checkArgument(sampleSize > 0, "sampleSize is expected to be greater than zero");
    if (partitions.size() <= sampleSize) {
        return partitions;
    }
    List<HivePartition> result = new ArrayList<>();
    int samplesLeft = sampleSize;
    HivePartition min = partitions.get(0);
    HivePartition max = partitions.get(0);
    for (HivePartition partition : partitions) {
        if (partition.getPartitionId().compareTo(min.getPartitionId()) < 0) {
            min = partition;
        } else if (partition.getPartitionId().compareTo(max.getPartitionId()) > 0) {
            max = partition;
        }
    }
    result.add(min);
    samplesLeft--;
    if (samplesLeft > 0) {
        result.add(max);
        samplesLeft--;
    }
    if (samplesLeft > 0) {
        HashFunction hashFunction = murmur3_128();
        Comparator<Map.Entry<HivePartition, Long>> hashComparator = Comparator.<Map.Entry<HivePartition, Long>, Long>comparing(Map.Entry::getValue).thenComparing(entry -> entry.getKey().getPartitionId());
        partitions.stream().filter(partition -> !result.contains(partition)).map(partition -> immutableEntry(partition, hashFunction.hashUnencodedChars(partition.getPartitionId()).asLong())).sorted(hashComparator).limit(samplesLeft).forEachOrdered(entry -> result.add(entry.getKey()));
    }
    return unmodifiableList(result);
}
Also used : DateStatistics(io.trino.plugin.hive.metastore.DateStatistics) Arrays(java.util.Arrays) Collections.unmodifiableList(java.util.Collections.unmodifiableList) BigDecimal(java.math.BigDecimal) StatsUtil.toStatsRepresentation(io.trino.spi.statistics.StatsUtil.toStatsRepresentation) Preconditions.checkArgument(com.google.common.base.Preconditions.checkArgument) Maps.immutableEntry(com.google.common.collect.Maps.immutableEntry) Map(java.util.Map) HIVE_CORRUPTED_COLUMN_STATISTICS(io.trino.plugin.hive.HiveErrorCode.HIVE_CORRUPTED_COLUMN_STATISTICS) HiveColumnHandle(io.trino.plugin.hive.HiveColumnHandle) INTEGER(io.trino.spi.type.IntegerType.INTEGER) SMALLINT(io.trino.spi.type.SmallintType.SMALLINT) HiveBasicStatistics(io.trino.plugin.hive.HiveBasicStatistics) ImmutableMap(com.google.common.collect.ImmutableMap) HivePartition(io.trino.plugin.hive.HivePartition) Collection(java.util.Collection) ImmutableList.toImmutableList(com.google.common.collect.ImmutableList.toImmutableList) HiveSessionProperties.isStatisticsEnabled(io.trino.plugin.hive.HiveSessionProperties.isStatisticsEnabled) Set(java.util.Set) TrinoException(io.trino.spi.TrinoException) SchemaTableName(io.trino.spi.connector.SchemaTableName) String.format(java.lang.String.format) DoubleStream(java.util.stream.DoubleStream) Preconditions.checkState(com.google.common.base.Preconditions.checkState) Objects(java.util.Objects) List(java.util.List) BIGINT(io.trino.spi.type.BigintType.BIGINT) LocalDate(java.time.LocalDate) Optional(java.util.Optional) HashFunction(com.google.common.hash.HashFunction) DecimalType(io.trino.spi.type.DecimalType) DATE(io.trino.spi.type.DateType.DATE) REAL(io.trino.spi.type.RealType.REAL) MoreObjects.toStringHelper(com.google.common.base.MoreObjects.toStringHelper) DoubleRange(io.trino.spi.statistics.DoubleRange) Verify.verifyNotNull(com.google.common.base.Verify.verifyNotNull) HiveSessionProperties.isIgnoreCorruptedStatistics(io.trino.plugin.hive.HiveSessionProperties.isIgnoreCorruptedStatistics) PartitionStatistics(io.trino.plugin.hive.PartitionStatistics) Slice(io.airlift.slice.Slice) Logger(io.airlift.log.Logger) NullableValue(io.trino.spi.predicate.NullableValue) HiveSessionProperties.getPartitionStatisticsSampleSize(io.trino.plugin.hive.HiveSessionProperties.getPartitionStatisticsSampleSize) Type(io.trino.spi.type.Type) OptionalDouble(java.util.OptionalDouble) Shorts(com.google.common.primitives.Shorts) UNPARTITIONED_ID(io.trino.plugin.hive.HivePartition.UNPARTITIONED_ID) ArrayList(java.util.ArrayList) VarcharType(io.trino.spi.type.VarcharType) OptionalLong(java.util.OptionalLong) HiveColumnStatistics(io.trino.plugin.hive.metastore.HiveColumnStatistics) Verify.verify(com.google.common.base.Verify.verify) SemiTransactionalHiveMetastore(io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore) Objects.requireNonNull(java.util.Objects.requireNonNull) ColumnHandle(io.trino.spi.connector.ColumnHandle) TableStatistics(io.trino.spi.statistics.TableStatistics) ImmutableSet.toImmutableSet(com.google.common.collect.ImmutableSet.toImmutableSet) Double.isFinite(java.lang.Double.isFinite) IntegerStatistics(io.trino.plugin.hive.metastore.IntegerStatistics) Estimate(io.trino.spi.statistics.Estimate) VerifyException(com.google.common.base.VerifyException) ColumnStatistics(io.trino.spi.statistics.ColumnStatistics) SignedBytes(com.google.common.primitives.SignedBytes) DecimalStatistics(io.trino.plugin.hive.metastore.DecimalStatistics) ConnectorSession(io.trino.spi.connector.ConnectorSession) DoubleStatistics(io.trino.plugin.hive.metastore.DoubleStatistics) Hashing.murmur3_128(com.google.common.hash.Hashing.murmur3_128) Ints(com.google.common.primitives.Ints) DOUBLE(io.trino.spi.type.DoubleType.DOUBLE) Double.isNaN(java.lang.Double.isNaN) CharType(io.trino.spi.type.CharType) VisibleForTesting(com.google.common.annotations.VisibleForTesting) TINYINT(io.trino.spi.type.TinyintType.TINYINT) Comparator(java.util.Comparator) Maps.immutableEntry(com.google.common.collect.Maps.immutableEntry) HashFunction(com.google.common.hash.HashFunction) ArrayList(java.util.ArrayList) OptionalLong(java.util.OptionalLong) Map(java.util.Map) ImmutableMap(com.google.common.collect.ImmutableMap) HivePartition(io.trino.plugin.hive.HivePartition) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Example 5 with HivePartition

use of io.trino.plugin.hive.HivePartition in project trino by trinodb.

the class MetastoreHiveStatisticsProvider method calculateRangeForPartitioningKey.

@VisibleForTesting
static Optional<DoubleRange> calculateRangeForPartitioningKey(HiveColumnHandle column, Type type, List<HivePartition> partitions) {
    List<OptionalDouble> convertedValues = partitions.stream().map(HivePartition::getKeys).map(keys -> keys.get(column)).filter(value -> !value.isNull()).map(NullableValue::getValue).map(value -> convertPartitionValueToDouble(type, value)).collect(toImmutableList());
    if (convertedValues.stream().noneMatch(OptionalDouble::isPresent)) {
        return Optional.empty();
    }
    double[] values = convertedValues.stream().peek(convertedValue -> checkState(convertedValue.isPresent(), "convertedValue is missing")).mapToDouble(OptionalDouble::getAsDouble).toArray();
    verify(values.length != 0, "No values");
    if (DoubleStream.of(values).anyMatch(Double::isNaN)) {
        return Optional.empty();
    }
    double min = DoubleStream.of(values).min().orElseThrow();
    double max = DoubleStream.of(values).max().orElseThrow();
    return Optional.of(new DoubleRange(min, max));
}
Also used : DateStatistics(io.trino.plugin.hive.metastore.DateStatistics) Arrays(java.util.Arrays) Collections.unmodifiableList(java.util.Collections.unmodifiableList) BigDecimal(java.math.BigDecimal) StatsUtil.toStatsRepresentation(io.trino.spi.statistics.StatsUtil.toStatsRepresentation) Preconditions.checkArgument(com.google.common.base.Preconditions.checkArgument) Maps.immutableEntry(com.google.common.collect.Maps.immutableEntry) Map(java.util.Map) HIVE_CORRUPTED_COLUMN_STATISTICS(io.trino.plugin.hive.HiveErrorCode.HIVE_CORRUPTED_COLUMN_STATISTICS) HiveColumnHandle(io.trino.plugin.hive.HiveColumnHandle) INTEGER(io.trino.spi.type.IntegerType.INTEGER) SMALLINT(io.trino.spi.type.SmallintType.SMALLINT) HiveBasicStatistics(io.trino.plugin.hive.HiveBasicStatistics) ImmutableMap(com.google.common.collect.ImmutableMap) HivePartition(io.trino.plugin.hive.HivePartition) Collection(java.util.Collection) ImmutableList.toImmutableList(com.google.common.collect.ImmutableList.toImmutableList) HiveSessionProperties.isStatisticsEnabled(io.trino.plugin.hive.HiveSessionProperties.isStatisticsEnabled) Set(java.util.Set) TrinoException(io.trino.spi.TrinoException) SchemaTableName(io.trino.spi.connector.SchemaTableName) String.format(java.lang.String.format) DoubleStream(java.util.stream.DoubleStream) Preconditions.checkState(com.google.common.base.Preconditions.checkState) Objects(java.util.Objects) List(java.util.List) BIGINT(io.trino.spi.type.BigintType.BIGINT) LocalDate(java.time.LocalDate) Optional(java.util.Optional) HashFunction(com.google.common.hash.HashFunction) DecimalType(io.trino.spi.type.DecimalType) DATE(io.trino.spi.type.DateType.DATE) REAL(io.trino.spi.type.RealType.REAL) MoreObjects.toStringHelper(com.google.common.base.MoreObjects.toStringHelper) DoubleRange(io.trino.spi.statistics.DoubleRange) Verify.verifyNotNull(com.google.common.base.Verify.verifyNotNull) HiveSessionProperties.isIgnoreCorruptedStatistics(io.trino.plugin.hive.HiveSessionProperties.isIgnoreCorruptedStatistics) PartitionStatistics(io.trino.plugin.hive.PartitionStatistics) Slice(io.airlift.slice.Slice) Logger(io.airlift.log.Logger) NullableValue(io.trino.spi.predicate.NullableValue) HiveSessionProperties.getPartitionStatisticsSampleSize(io.trino.plugin.hive.HiveSessionProperties.getPartitionStatisticsSampleSize) Type(io.trino.spi.type.Type) OptionalDouble(java.util.OptionalDouble) Shorts(com.google.common.primitives.Shorts) UNPARTITIONED_ID(io.trino.plugin.hive.HivePartition.UNPARTITIONED_ID) ArrayList(java.util.ArrayList) VarcharType(io.trino.spi.type.VarcharType) OptionalLong(java.util.OptionalLong) HiveColumnStatistics(io.trino.plugin.hive.metastore.HiveColumnStatistics) Verify.verify(com.google.common.base.Verify.verify) SemiTransactionalHiveMetastore(io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore) Objects.requireNonNull(java.util.Objects.requireNonNull) ColumnHandle(io.trino.spi.connector.ColumnHandle) TableStatistics(io.trino.spi.statistics.TableStatistics) ImmutableSet.toImmutableSet(com.google.common.collect.ImmutableSet.toImmutableSet) Double.isFinite(java.lang.Double.isFinite) IntegerStatistics(io.trino.plugin.hive.metastore.IntegerStatistics) Estimate(io.trino.spi.statistics.Estimate) VerifyException(com.google.common.base.VerifyException) ColumnStatistics(io.trino.spi.statistics.ColumnStatistics) SignedBytes(com.google.common.primitives.SignedBytes) DecimalStatistics(io.trino.plugin.hive.metastore.DecimalStatistics) ConnectorSession(io.trino.spi.connector.ConnectorSession) DoubleStatistics(io.trino.plugin.hive.metastore.DoubleStatistics) Hashing.murmur3_128(com.google.common.hash.Hashing.murmur3_128) Ints(com.google.common.primitives.Ints) DOUBLE(io.trino.spi.type.DoubleType.DOUBLE) Double.isNaN(java.lang.Double.isNaN) CharType(io.trino.spi.type.CharType) VisibleForTesting(com.google.common.annotations.VisibleForTesting) TINYINT(io.trino.spi.type.TinyintType.TINYINT) Comparator(java.util.Comparator) DoubleRange(io.trino.spi.statistics.DoubleRange) NullableValue(io.trino.spi.predicate.NullableValue) OptionalDouble(java.util.OptionalDouble) OptionalDouble(java.util.OptionalDouble) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Aggregations

HivePartition (io.trino.plugin.hive.HivePartition)7 SchemaTableName (io.trino.spi.connector.SchemaTableName)4 VisibleForTesting (com.google.common.annotations.VisibleForTesting)3 HiveBasicStatistics (io.trino.plugin.hive.HiveBasicStatistics)3 HiveColumnHandle (io.trino.plugin.hive.HiveColumnHandle)3 PartitionStatistics (io.trino.plugin.hive.PartitionStatistics)3 DoubleRange (io.trino.spi.statistics.DoubleRange)3 TableStatistics (io.trino.spi.statistics.TableStatistics)3 MoreObjects.toStringHelper (com.google.common.base.MoreObjects.toStringHelper)2 Preconditions.checkArgument (com.google.common.base.Preconditions.checkArgument)2 Preconditions.checkState (com.google.common.base.Preconditions.checkState)2 Verify.verify (com.google.common.base.Verify.verify)2 Verify.verifyNotNull (com.google.common.base.Verify.verifyNotNull)2 VerifyException (com.google.common.base.VerifyException)2 ImmutableList.toImmutableList (com.google.common.collect.ImmutableList.toImmutableList)2 ImmutableMap (com.google.common.collect.ImmutableMap)2 ImmutableSet.toImmutableSet (com.google.common.collect.ImmutableSet.toImmutableSet)2 Maps.immutableEntry (com.google.common.collect.Maps.immutableEntry)2 HashFunction (com.google.common.hash.HashFunction)2 Hashing.murmur3_128 (com.google.common.hash.Hashing.murmur3_128)2