Search in sources :

Example 1 with PrunedPartitionList

use of org.apache.hadoop.hive.ql.parse.PrunedPartitionList in project hive by apache.

the class GlobalLimitOptimizer method transform.

@Override
public ParseContext transform(ParseContext pctx) throws SemanticException {
    Context ctx = pctx.getContext();
    Map<String, TableScanOperator> topOps = pctx.getTopOps();
    GlobalLimitCtx globalLimitCtx = pctx.getGlobalLimitCtx();
    Map<String, SplitSample> nameToSplitSample = pctx.getNameToSplitSample();
    // is used.
    if (ctx.getTryCount() == 0 && topOps.size() == 1 && !globalLimitCtx.ifHasTransformOrUDTF() && nameToSplitSample.isEmpty()) {
        // Here we recursively check:
        // 1. whether there are exact one LIMIT in the query
        // 2. whether there is no aggregation, group-by, distinct, sort by,
        //    distributed by, or table sampling in any of the sub-query.
        // The query only qualifies if both conditions are satisfied.
        //
        // Example qualified queries:
        //    CREATE TABLE ... AS SELECT col1, col2 FROM tbl LIMIT ..
        //    INSERT OVERWRITE TABLE ... SELECT col1, hash(col2), split(col1)
        //                               FROM ... LIMIT...
        //    SELECT * FROM (SELECT col1 as col2 (SELECT * FROM ...) t1 LIMIT ...) t2);
        //
        TableScanOperator ts = topOps.values().iterator().next();
        LimitOperator tempGlobalLimit = checkQbpForGlobalLimit(ts);
        // query qualify for the optimization
        if (tempGlobalLimit != null) {
            LimitDesc tempGlobalLimitDesc = tempGlobalLimit.getConf();
            Table tab = ts.getConf().getTableMetadata();
            Set<FilterOperator> filterOps = OperatorUtils.findOperators(ts, FilterOperator.class);
            if (!tab.isPartitioned()) {
                if (filterOps.size() == 0) {
                    Integer tempOffset = tempGlobalLimitDesc.getOffset();
                    globalLimitCtx.enableOpt(tempGlobalLimitDesc.getLimit(), (tempOffset == null) ? 0 : tempOffset);
                }
            } else {
                // check if the pruner only contains partition columns
                if (onlyContainsPartnCols(tab, filterOps)) {
                    String alias = (String) topOps.keySet().toArray()[0];
                    PrunedPartitionList partsList = pctx.getPrunedPartitions(alias, ts);
                    // the filter to prune correctly
                    if (!partsList.hasUnknownPartitions()) {
                        Integer tempOffset = tempGlobalLimitDesc.getOffset();
                        globalLimitCtx.enableOpt(tempGlobalLimitDesc.getLimit(), (tempOffset == null) ? 0 : tempOffset);
                    }
                }
            }
            if (globalLimitCtx.isEnable()) {
                LOG.info("Qualify the optimize that reduces input size for 'offset' for offset " + globalLimitCtx.getGlobalOffset());
                LOG.info("Qualify the optimize that reduces input size for 'limit' for limit " + globalLimitCtx.getGlobalLimit());
            }
        }
    }
    return pctx;
}
Also used : Context(org.apache.hadoop.hive.ql.Context) ParseContext(org.apache.hadoop.hive.ql.parse.ParseContext) TableScanOperator(org.apache.hadoop.hive.ql.exec.TableScanOperator) Table(org.apache.hadoop.hive.ql.metadata.Table) SplitSample(org.apache.hadoop.hive.ql.parse.SplitSample) LimitDesc(org.apache.hadoop.hive.ql.plan.LimitDesc) FilterOperator(org.apache.hadoop.hive.ql.exec.FilterOperator) PrunedPartitionList(org.apache.hadoop.hive.ql.parse.PrunedPartitionList) LimitOperator(org.apache.hadoop.hive.ql.exec.LimitOperator) GlobalLimitCtx(org.apache.hadoop.hive.ql.parse.GlobalLimitCtx)

Example 2 with PrunedPartitionList

use of org.apache.hadoop.hive.ql.parse.PrunedPartitionList in project hive by apache.

the class TableSizeBasedBigTableSelectorForAutoSMJ method getBigTablePosition.

public int getBigTablePosition(ParseContext parseCtx, JoinOperator joinOp, Set<Integer> bigTableCandidates) throws SemanticException {
    int bigTablePos = -1;
    long maxSize = -1;
    HiveConf conf = parseCtx.getConf();
    try {
        List<TableScanOperator> topOps = new ArrayList<TableScanOperator>();
        getListTopOps(joinOp, topOps);
        int currentPos = 0;
        for (TableScanOperator topOp : topOps) {
            if (topOp == null) {
                return -1;
            }
            if (!bigTableCandidates.contains(currentPos)) {
                currentPos++;
                continue;
            }
            Table table = topOp.getConf().getTableMetadata();
            long currentSize = 0;
            if (!table.isPartitioned()) {
                currentSize = getSize(conf, table);
            } else {
                // For partitioned tables, get the size of all the partitions
                PrunedPartitionList partsList = PartitionPruner.prune(topOp, parseCtx, null);
                for (Partition part : partsList.getNotDeniedPartns()) {
                    currentSize += getSize(conf, part);
                }
            }
            if (currentSize > maxSize) {
                maxSize = currentSize;
                bigTablePos = currentPos;
            }
            currentPos++;
        }
    } catch (HiveException e) {
        throw new SemanticException(e.getMessage());
    }
    return bigTablePos;
}
Also used : Partition(org.apache.hadoop.hive.ql.metadata.Partition) TableScanOperator(org.apache.hadoop.hive.ql.exec.TableScanOperator) Table(org.apache.hadoop.hive.ql.metadata.Table) PrunedPartitionList(org.apache.hadoop.hive.ql.parse.PrunedPartitionList) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) ArrayList(java.util.ArrayList) HiveConf(org.apache.hadoop.hive.conf.HiveConf) SemanticException(org.apache.hadoop.hive.ql.parse.SemanticException)

Example 3 with PrunedPartitionList

use of org.apache.hadoop.hive.ql.parse.PrunedPartitionList in project hive by apache.

the class SparkProcessAnalyzeTable method process.

@SuppressWarnings("unchecked")
@Override
public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procContext, Object... nodeOutputs) throws SemanticException {
    GenSparkProcContext context = (GenSparkProcContext) procContext;
    TableScanOperator tableScan = (TableScanOperator) nd;
    ParseContext parseContext = context.parseContext;
    @SuppressWarnings("rawtypes") Class<? extends InputFormat> inputFormat = tableScan.getConf().getTableMetadata().getInputFormatClass();
    if (parseContext.getQueryProperties().isAnalyzeCommand()) {
        Preconditions.checkArgument(tableScan.getChildOperators() == null || tableScan.getChildOperators().size() == 0, "AssertionError: expected tableScan.getChildOperators() to be null, " + "or tableScan.getChildOperators().size() to be 0");
        String alias = null;
        for (String a : parseContext.getTopOps().keySet()) {
            if (tableScan == parseContext.getTopOps().get(a)) {
                alias = a;
            }
        }
        Preconditions.checkArgument(alias != null, "AssertionError: expected alias to be not null");
        SparkWork sparkWork = context.currentTask.getWork();
        boolean partialScan = parseContext.getQueryProperties().isPartialScanAnalyzeCommand();
        boolean noScan = parseContext.getQueryProperties().isNoScanAnalyzeCommand();
        if (inputFormat.equals(OrcInputFormat.class) && (noScan || partialScan)) {
            // ANALYZE TABLE T [PARTITION (...)] COMPUTE STATISTICS partialscan;
            // ANALYZE TABLE T [PARTITION (...)] COMPUTE STATISTICS noscan;
            // There will not be any Spark job above this task
            StatsNoJobWork snjWork = new StatsNoJobWork(tableScan.getConf().getTableMetadata().getTableSpec());
            snjWork.setStatsReliable(parseContext.getConf().getBoolVar(HiveConf.ConfVars.HIVE_STATS_RELIABLE));
            Task<StatsNoJobWork> snjTask = TaskFactory.get(snjWork, parseContext.getConf());
            snjTask.setParentTasks(null);
            context.rootTasks.remove(context.currentTask);
            context.rootTasks.add(snjTask);
            return true;
        } else {
            // ANALYZE TABLE T [PARTITION (...)] COMPUTE STATISTICS;
            // The plan consists of a simple SparkTask followed by a StatsTask.
            // The Spark task is just a simple TableScanOperator
            StatsWork statsWork = new StatsWork(tableScan.getConf().getTableMetadata().getTableSpec());
            statsWork.setAggKey(tableScan.getConf().getStatsAggPrefix());
            statsWork.setStatsTmpDir(tableScan.getConf().getTmpStatsDir());
            statsWork.setSourceTask(context.currentTask);
            statsWork.setStatsReliable(parseContext.getConf().getBoolVar(HiveConf.ConfVars.HIVE_STATS_RELIABLE));
            Task<StatsWork> statsTask = TaskFactory.get(statsWork, parseContext.getConf());
            context.currentTask.addDependentTask(statsTask);
            // The plan consists of a StatsTask only.
            if (parseContext.getQueryProperties().isNoScanAnalyzeCommand()) {
                statsTask.setParentTasks(null);
                statsWork.setNoScanAnalyzeCommand(true);
                context.rootTasks.remove(context.currentTask);
                context.rootTasks.add(statsTask);
            }
            // ANALYZE TABLE T [PARTITION (...)] COMPUTE STATISTICS partialscan;
            if (parseContext.getQueryProperties().isPartialScanAnalyzeCommand()) {
                handlePartialScanCommand(tableScan, parseContext, statsWork, context, statsTask);
            }
            // NOTE: here we should use the new partition predicate pushdown API to get a list of pruned list,
            // and pass it to setTaskPlan as the last parameter
            Set<Partition> confirmedPartns = GenMapRedUtils.getConfirmedPartitionsForScan(tableScan);
            PrunedPartitionList partitions = null;
            if (confirmedPartns.size() > 0) {
                Table source = tableScan.getConf().getTableMetadata();
                List<String> partCols = GenMapRedUtils.getPartitionColumns(tableScan);
                partitions = new PrunedPartitionList(source, confirmedPartns, partCols, false);
            }
            MapWork w = utils.createMapWork(context, tableScan, sparkWork, partitions);
            w.setGatheringStats(true);
            return true;
        }
    }
    return null;
}
Also used : Partition(org.apache.hadoop.hive.ql.metadata.Partition) TableScanOperator(org.apache.hadoop.hive.ql.exec.TableScanOperator) Table(org.apache.hadoop.hive.ql.metadata.Table) SparkWork(org.apache.hadoop.hive.ql.plan.SparkWork) PrunedPartitionList(org.apache.hadoop.hive.ql.parse.PrunedPartitionList) StatsWork(org.apache.hadoop.hive.ql.plan.StatsWork) MapWork(org.apache.hadoop.hive.ql.plan.MapWork) OrcInputFormat(org.apache.hadoop.hive.ql.io.orc.OrcInputFormat) ParseContext(org.apache.hadoop.hive.ql.parse.ParseContext) StatsNoJobWork(org.apache.hadoop.hive.ql.plan.StatsNoJobWork)

Example 4 with PrunedPartitionList

use of org.apache.hadoop.hive.ql.parse.PrunedPartitionList in project hive by apache.

the class Driver method getTablePartitionUsedColumns.

private static void getTablePartitionUsedColumns(HiveOperation op, BaseSemanticAnalyzer sem, Map<Table, List<String>> tab2Cols, Map<Partition, List<String>> part2Cols, Map<String, Boolean> tableUsePartLevelAuth) throws HiveException {
    // table to columns mapping (tab2Cols)
    if (op.equals(HiveOperation.CREATETABLE_AS_SELECT) || op.equals(HiveOperation.QUERY)) {
        SemanticAnalyzer querySem = (SemanticAnalyzer) sem;
        ParseContext parseCtx = querySem.getParseContext();
        for (Map.Entry<String, TableScanOperator> topOpMap : querySem.getParseContext().getTopOps().entrySet()) {
            TableScanOperator tableScanOp = topOpMap.getValue();
            if (!tableScanOp.isInsideView()) {
                Table tbl = tableScanOp.getConf().getTableMetadata();
                List<Integer> neededColumnIds = tableScanOp.getNeededColumnIDs();
                List<FieldSchema> columns = tbl.getCols();
                List<String> cols = new ArrayList<String>();
                for (int i = 0; i < neededColumnIds.size(); i++) {
                    cols.add(columns.get(neededColumnIds.get(i)).getName());
                }
                // table permission
                if (tbl.isPartitioned() && Boolean.TRUE.equals(tableUsePartLevelAuth.get(tbl.getTableName()))) {
                    String alias_id = topOpMap.getKey();
                    PrunedPartitionList partsList = PartitionPruner.prune(tableScanOp, parseCtx, alias_id);
                    Set<Partition> parts = partsList.getPartitions();
                    for (Partition part : parts) {
                        List<String> existingCols = part2Cols.get(part);
                        if (existingCols == null) {
                            existingCols = new ArrayList<String>();
                        }
                        existingCols.addAll(cols);
                        part2Cols.put(part, existingCols);
                    }
                } else {
                    List<String> existingCols = tab2Cols.get(tbl);
                    if (existingCols == null) {
                        existingCols = new ArrayList<String>();
                    }
                    existingCols.addAll(cols);
                    tab2Cols.put(tbl, existingCols);
                }
            }
        }
    }
}
Also used : Partition(org.apache.hadoop.hive.ql.metadata.Partition) TableScanOperator(org.apache.hadoop.hive.ql.exec.TableScanOperator) Table(org.apache.hadoop.hive.ql.metadata.Table) FieldSchema(org.apache.hadoop.hive.metastore.api.FieldSchema) ArrayList(java.util.ArrayList) SemanticAnalyzer(org.apache.hadoop.hive.ql.parse.SemanticAnalyzer) BaseSemanticAnalyzer(org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer) ImportSemanticAnalyzer(org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer) PrunedPartitionList(org.apache.hadoop.hive.ql.parse.PrunedPartitionList) ParseContext(org.apache.hadoop.hive.ql.parse.ParseContext) Map(java.util.Map) LinkedHashMap(java.util.LinkedHashMap) ImmutableMap(com.google.common.collect.ImmutableMap) HashMap(java.util.HashMap)

Example 5 with PrunedPartitionList

use of org.apache.hadoop.hive.ql.parse.PrunedPartitionList in project hive by apache.

the class AvgPartitionSizeBasedBigTableSelectorForAutoSMJ method getBigTablePosition.

public int getBigTablePosition(ParseContext parseCtx, JoinOperator joinOp, Set<Integer> bigTableCandidates) throws SemanticException {
    int bigTablePos = -1;
    long maxSize = -1;
    // number of partitions for the chosen big table
    int numPartitionsCurrentBigTable = 0;
    HiveConf conf = parseCtx.getConf();
    try {
        List<TableScanOperator> topOps = new ArrayList<TableScanOperator>();
        getListTopOps(joinOp, topOps);
        int currentPos = 0;
        for (TableScanOperator topOp : topOps) {
            if (topOp == null) {
                return -1;
            }
            if (!bigTableCandidates.contains(currentPos)) {
                currentPos++;
                continue;
            }
            // in case the sizes match, preference is
            int numPartitions = 1;
            // given to the table with fewer partitions
            Table table = topOp.getConf().getTableMetadata();
            long averageSize = 0;
            if (!table.isPartitioned()) {
                averageSize = getSize(conf, table);
            } else {
                // For partitioned tables, get the size of all the partitions
                PrunedPartitionList partsList = PartitionPruner.prune(topOp, parseCtx, null);
                numPartitions = partsList.getNotDeniedPartns().size();
                long totalSize = 0;
                for (Partition part : partsList.getNotDeniedPartns()) {
                    totalSize += getSize(conf, part);
                }
                averageSize = numPartitions == 0 ? 0 : totalSize / numPartitions;
            }
            if (averageSize > maxSize) {
                maxSize = averageSize;
                bigTablePos = currentPos;
                numPartitionsCurrentBigTable = numPartitions;
            } else // If the sizes match, prefer the table with fewer partitions
            if (averageSize == maxSize) {
                if (numPartitions < numPartitionsCurrentBigTable) {
                    bigTablePos = currentPos;
                    numPartitionsCurrentBigTable = numPartitions;
                }
            }
            currentPos++;
        }
    } catch (HiveException e) {
        throw new SemanticException(e.getMessage());
    }
    return bigTablePos;
}
Also used : Partition(org.apache.hadoop.hive.ql.metadata.Partition) TableScanOperator(org.apache.hadoop.hive.ql.exec.TableScanOperator) Table(org.apache.hadoop.hive.ql.metadata.Table) PrunedPartitionList(org.apache.hadoop.hive.ql.parse.PrunedPartitionList) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) ArrayList(java.util.ArrayList) HiveConf(org.apache.hadoop.hive.conf.HiveConf) SemanticException(org.apache.hadoop.hive.ql.parse.SemanticException)

Aggregations

PrunedPartitionList (org.apache.hadoop.hive.ql.parse.PrunedPartitionList)18 Partition (org.apache.hadoop.hive.ql.metadata.Partition)14 Table (org.apache.hadoop.hive.ql.metadata.Table)10 ArrayList (java.util.ArrayList)9 TableScanOperator (org.apache.hadoop.hive.ql.exec.TableScanOperator)8 SemanticException (org.apache.hadoop.hive.ql.parse.SemanticException)6 ParseContext (org.apache.hadoop.hive.ql.parse.ParseContext)5 HiveException (org.apache.hadoop.hive.ql.metadata.HiveException)4 HashMap (java.util.HashMap)3 Map (java.util.Map)3 LinkedHashMap (java.util.LinkedHashMap)2 List (java.util.List)2 HiveConf (org.apache.hadoop.hive.conf.HiveConf)2 FieldSchema (org.apache.hadoop.hive.metastore.api.FieldSchema)2 OrcInputFormat (org.apache.hadoop.hive.ql.io.orc.OrcInputFormat)2 ImmutableMap (com.google.common.collect.ImmutableMap)1 HashSet (java.util.HashSet)1 LinkedHashSet (java.util.LinkedHashSet)1 AtomicInteger (java.util.concurrent.atomic.AtomicInteger)1 DruidSchema (org.apache.calcite.adapter.druid.DruidSchema)1