use of org.apache.hadoop.hive.ql.plan.VectorGroupByDesc.ProcessingMode in project hive by apache.
the class Vectorizer method validateGroupByOperator.
private boolean validateGroupByOperator(GroupByOperator op, boolean isReduce, boolean isTezOrSpark) {
GroupByDesc desc = op.getConf();
if (desc.isGroupingSetsPresent()) {
setOperatorIssue("Grouping sets not supported");
return false;
}
if (desc.pruneGroupingSetId()) {
setOperatorIssue("Pruning grouping set id not supported");
return false;
}
if (desc.getMode() != GroupByDesc.Mode.HASH && desc.isDistinct()) {
setOperatorIssue("DISTINCT not supported");
return false;
}
boolean ret = validateExprNodeDesc(desc.getKeys(), "Key");
if (!ret) {
return false;
}
/**
*
* GROUP BY DEFINITIONS:
*
* GroupByDesc.Mode enumeration:
*
* The different modes of a GROUP BY operator.
*
* These descriptions are hopefully less cryptic than the comments for GroupByDesc.Mode.
*
* COMPLETE Aggregates original rows into full aggregation row(s).
*
* If the key length is 0, this is also called Global aggregation and
* 1 output row is produced.
*
* When the key length is > 0, the original rows come in ALREADY GROUPED.
*
* An example for key length > 0 is a GROUP BY being applied to the
* ALREADY GROUPED rows coming from an upstream JOIN operator. Or,
* ALREADY GROUPED rows coming from upstream MERGEPARTIAL GROUP BY
* operator.
*
* PARTIAL1 The first of 2 (or more) phases that aggregates ALREADY GROUPED
* original rows into partial aggregations.
*
* Subsequent phases PARTIAL2 (optional) and MERGEPARTIAL will merge
* the partial aggregations and output full aggregations.
*
* PARTIAL2 Accept ALREADY GROUPED partial aggregations and merge them into another
* partial aggregation. Output the merged partial aggregations.
*
* (Haven't seen this one used)
*
* PARTIALS (Behaves for non-distinct the same as PARTIAL2; and behaves for
* distinct the same as PARTIAL1.)
*
* FINAL Accept ALREADY GROUPED original rows and aggregate them into
* full aggregations.
*
* Example is a GROUP BY being applied to rows from a sorted table, where
* the group key is the table sort key (or a prefix).
*
* HASH Accept UNORDERED original rows and aggregate them into a memory table.
* Output the partial aggregations on closeOp (or low memory).
*
* Similar to PARTIAL1 except original rows are UNORDERED.
*
* Commonly used in both Mapper and Reducer nodes. Always followed by
* a Reducer with MERGEPARTIAL GROUP BY.
*
* MERGEPARTIAL Always first operator of a Reducer. Data is grouped by reduce-shuffle.
*
* (Behaves for non-distinct aggregations the same as FINAL; and behaves
* for distinct aggregations the same as COMPLETE.)
*
* The output is full aggregation(s).
*
* Used in Reducers after a stage with a HASH GROUP BY operator.
*
*
* VectorGroupByDesc.ProcessingMode for VectorGroupByOperator:
*
* GLOBAL No key. All rows --> 1 full aggregation on end of input
*
* HASH Rows aggregated in to hash table on group key -->
* 1 partial aggregation per key (normally, unless there is spilling)
*
* MERGE_PARTIAL As first operator in a REDUCER, partial aggregations come grouped from
* reduce-shuffle -->
* aggregate the partial aggregations and emit full aggregation on
* endGroup / closeOp
*
* STREAMING Rows come from PARENT operator ALREADY GROUPED -->
* aggregate the rows and emit full aggregation on key change / closeOp
*
* NOTE: Hash can spill partial result rows prematurely if it runs low on memory.
* NOTE: Streaming has to compare keys where MergePartial gets an endGroup call.
*
*
* DECIDER: Which VectorGroupByDesc.ProcessingMode for VectorGroupByOperator?
*
* Decides using GroupByDesc.Mode and whether there are keys with the
* VectorGroupByDesc.groupByDescModeToVectorProcessingMode method.
*
* Mode.COMPLETE --> (numKeys == 0 ? ProcessingMode.GLOBAL : ProcessingMode.STREAMING)
*
* Mode.HASH --> ProcessingMode.HASH
*
* Mode.MERGEPARTIAL --> (numKeys == 0 ? ProcessingMode.GLOBAL : ProcessingMode.MERGE_PARTIAL)
*
* Mode.PARTIAL1,
* Mode.PARTIAL2,
* Mode.PARTIALS,
* Mode.FINAL --> ProcessingMode.STREAMING
*
*/
boolean hasKeys = (desc.getKeys().size() > 0);
ProcessingMode processingMode = VectorGroupByDesc.groupByDescModeToVectorProcessingMode(desc.getMode(), hasKeys);
Pair<Boolean, Boolean> retPair = validateAggregationDescs(desc.getAggregators(), processingMode, hasKeys);
if (!retPair.left) {
return false;
}
// If all the aggregation outputs are primitive, we can output VectorizedRowBatch.
// Otherwise, we the rest of the operator tree will be row mode.
VectorGroupByDesc vectorDesc = new VectorGroupByDesc();
desc.setVectorDesc(vectorDesc);
vectorDesc.setVectorOutput(retPair.right);
vectorDesc.setProcessingMode(processingMode);
LOG.info("Vector GROUP BY operator will use processing mode " + processingMode.name() + ", isVectorOutput " + vectorDesc.isVectorOutput());
return true;
}
Aggregations