use of org.apache.hadoop.hive.ql.plan.VectorGroupByDesc in project hive by apache.
the class Vectorizer method validateGroupByOperator.
private boolean validateGroupByOperator(GroupByOperator op, boolean isReduce, boolean isTezOrSpark) {
GroupByDesc desc = op.getConf();
if (desc.isGroupingSetsPresent()) {
setOperatorIssue("Grouping sets not supported");
return false;
}
if (desc.pruneGroupingSetId()) {
setOperatorIssue("Pruning grouping set id not supported");
return false;
}
if (desc.getMode() != GroupByDesc.Mode.HASH && desc.isDistinct()) {
setOperatorIssue("DISTINCT not supported");
return false;
}
boolean ret = validateExprNodeDesc(desc.getKeys(), "Key");
if (!ret) {
return false;
}
/**
*
* GROUP BY DEFINITIONS:
*
* GroupByDesc.Mode enumeration:
*
* The different modes of a GROUP BY operator.
*
* These descriptions are hopefully less cryptic than the comments for GroupByDesc.Mode.
*
* COMPLETE Aggregates original rows into full aggregation row(s).
*
* If the key length is 0, this is also called Global aggregation and
* 1 output row is produced.
*
* When the key length is > 0, the original rows come in ALREADY GROUPED.
*
* An example for key length > 0 is a GROUP BY being applied to the
* ALREADY GROUPED rows coming from an upstream JOIN operator. Or,
* ALREADY GROUPED rows coming from upstream MERGEPARTIAL GROUP BY
* operator.
*
* PARTIAL1 The first of 2 (or more) phases that aggregates ALREADY GROUPED
* original rows into partial aggregations.
*
* Subsequent phases PARTIAL2 (optional) and MERGEPARTIAL will merge
* the partial aggregations and output full aggregations.
*
* PARTIAL2 Accept ALREADY GROUPED partial aggregations and merge them into another
* partial aggregation. Output the merged partial aggregations.
*
* (Haven't seen this one used)
*
* PARTIALS (Behaves for non-distinct the same as PARTIAL2; and behaves for
* distinct the same as PARTIAL1.)
*
* FINAL Accept ALREADY GROUPED original rows and aggregate them into
* full aggregations.
*
* Example is a GROUP BY being applied to rows from a sorted table, where
* the group key is the table sort key (or a prefix).
*
* HASH Accept UNORDERED original rows and aggregate them into a memory table.
* Output the partial aggregations on closeOp (or low memory).
*
* Similar to PARTIAL1 except original rows are UNORDERED.
*
* Commonly used in both Mapper and Reducer nodes. Always followed by
* a Reducer with MERGEPARTIAL GROUP BY.
*
* MERGEPARTIAL Always first operator of a Reducer. Data is grouped by reduce-shuffle.
*
* (Behaves for non-distinct aggregations the same as FINAL; and behaves
* for distinct aggregations the same as COMPLETE.)
*
* The output is full aggregation(s).
*
* Used in Reducers after a stage with a HASH GROUP BY operator.
*
*
* VectorGroupByDesc.ProcessingMode for VectorGroupByOperator:
*
* GLOBAL No key. All rows --> 1 full aggregation on end of input
*
* HASH Rows aggregated in to hash table on group key -->
* 1 partial aggregation per key (normally, unless there is spilling)
*
* MERGE_PARTIAL As first operator in a REDUCER, partial aggregations come grouped from
* reduce-shuffle -->
* aggregate the partial aggregations and emit full aggregation on
* endGroup / closeOp
*
* STREAMING Rows come from PARENT operator ALREADY GROUPED -->
* aggregate the rows and emit full aggregation on key change / closeOp
*
* NOTE: Hash can spill partial result rows prematurely if it runs low on memory.
* NOTE: Streaming has to compare keys where MergePartial gets an endGroup call.
*
*
* DECIDER: Which VectorGroupByDesc.ProcessingMode for VectorGroupByOperator?
*
* Decides using GroupByDesc.Mode and whether there are keys with the
* VectorGroupByDesc.groupByDescModeToVectorProcessingMode method.
*
* Mode.COMPLETE --> (numKeys == 0 ? ProcessingMode.GLOBAL : ProcessingMode.STREAMING)
*
* Mode.HASH --> ProcessingMode.HASH
*
* Mode.MERGEPARTIAL --> (numKeys == 0 ? ProcessingMode.GLOBAL : ProcessingMode.MERGE_PARTIAL)
*
* Mode.PARTIAL1,
* Mode.PARTIAL2,
* Mode.PARTIALS,
* Mode.FINAL --> ProcessingMode.STREAMING
*
*/
boolean hasKeys = (desc.getKeys().size() > 0);
ProcessingMode processingMode = VectorGroupByDesc.groupByDescModeToVectorProcessingMode(desc.getMode(), hasKeys);
Pair<Boolean, Boolean> retPair = validateAggregationDescs(desc.getAggregators(), processingMode, hasKeys);
if (!retPair.left) {
return false;
}
// If all the aggregation outputs are primitive, we can output VectorizedRowBatch.
// Otherwise, we the rest of the operator tree will be row mode.
VectorGroupByDesc vectorDesc = new VectorGroupByDesc();
desc.setVectorDesc(vectorDesc);
vectorDesc.setVectorOutput(retPair.right);
vectorDesc.setProcessingMode(processingMode);
LOG.info("Vector GROUP BY operator will use processing mode " + processingMode.name() + ", isVectorOutput " + vectorDesc.isVectorOutput());
return true;
}
use of org.apache.hadoop.hive.ql.plan.VectorGroupByDesc in project hive by apache.
the class Vectorizer method vectorizeOperator.
public Operator<? extends OperatorDesc> vectorizeOperator(Operator<? extends OperatorDesc> op, VectorizationContext vContext, boolean isTezOrSpark, VectorTaskColumnInfo vectorTaskColumnInfo) throws HiveException {
Operator<? extends OperatorDesc> vectorOp = null;
boolean isNative;
switch(op.getType()) {
case TABLESCAN:
vectorOp = vectorizeTableScanOperator(op, vContext);
isNative = true;
break;
case MAPJOIN:
{
if (op instanceof MapJoinOperator) {
VectorMapJoinInfo vectorMapJoinInfo = new VectorMapJoinInfo();
MapJoinDesc desc = (MapJoinDesc) op.getConf();
boolean specialize = canSpecializeMapJoin(op, desc, isTezOrSpark, vContext, vectorMapJoinInfo);
if (!specialize) {
Class<? extends Operator<?>> opClass = null;
// *NON-NATIVE* vector map differences for LEFT OUTER JOIN and Filtered...
List<ExprNodeDesc> bigTableFilters = desc.getFilters().get((byte) desc.getPosBigTable());
boolean isOuterAndFiltered = (!desc.isNoOuterJoin() && bigTableFilters.size() > 0);
if (!isOuterAndFiltered) {
opClass = VectorMapJoinOperator.class;
} else {
opClass = VectorMapJoinOuterFilteredOperator.class;
}
vectorOp = OperatorFactory.getVectorOperator(opClass, op.getCompilationOpContext(), op.getConf(), vContext);
isNative = false;
} else {
// TEMPORARY Until Native Vector Map Join with Hybrid passes tests...
// HiveConf.setBoolVar(physicalContext.getConf(),
// HiveConf.ConfVars.HIVEUSEHYBRIDGRACEHASHJOIN, false);
vectorOp = specializeMapJoinOperator(op, vContext, desc, vectorMapJoinInfo);
isNative = true;
if (vectorTaskColumnInfo != null) {
if (usesVectorUDFAdaptor(vectorMapJoinInfo.getBigTableKeyExpressions())) {
vectorTaskColumnInfo.setUsesVectorUDFAdaptor(true);
}
if (usesVectorUDFAdaptor(vectorMapJoinInfo.getBigTableValueExpressions())) {
vectorTaskColumnInfo.setUsesVectorUDFAdaptor(true);
}
}
}
} else {
Preconditions.checkState(op instanceof SMBMapJoinOperator);
SMBJoinDesc smbJoinSinkDesc = (SMBJoinDesc) op.getConf();
VectorSMBJoinDesc vectorSMBJoinDesc = new VectorSMBJoinDesc();
smbJoinSinkDesc.setVectorDesc(vectorSMBJoinDesc);
vectorOp = OperatorFactory.getVectorOperator(op.getCompilationOpContext(), smbJoinSinkDesc, vContext);
isNative = false;
}
}
break;
case REDUCESINK:
{
VectorReduceSinkInfo vectorReduceSinkInfo = new VectorReduceSinkInfo();
ReduceSinkDesc desc = (ReduceSinkDesc) op.getConf();
boolean specialize = canSpecializeReduceSink(desc, isTezOrSpark, vContext, vectorReduceSinkInfo);
if (!specialize) {
vectorOp = OperatorFactory.getVectorOperator(op.getCompilationOpContext(), op.getConf(), vContext);
isNative = false;
} else {
vectorOp = specializeReduceSinkOperator(op, vContext, desc, vectorReduceSinkInfo);
isNative = true;
if (vectorTaskColumnInfo != null) {
if (usesVectorUDFAdaptor(vectorReduceSinkInfo.getReduceSinkKeyExpressions())) {
vectorTaskColumnInfo.setUsesVectorUDFAdaptor(true);
}
if (usesVectorUDFAdaptor(vectorReduceSinkInfo.getReduceSinkValueExpressions())) {
vectorTaskColumnInfo.setUsesVectorUDFAdaptor(true);
}
}
}
}
break;
case FILTER:
{
vectorOp = vectorizeFilterOperator(op, vContext);
isNative = true;
if (vectorTaskColumnInfo != null) {
VectorFilterDesc vectorFilterDesc = (VectorFilterDesc) ((AbstractOperatorDesc) vectorOp.getConf()).getVectorDesc();
VectorExpression vectorPredicateExpr = vectorFilterDesc.getPredicateExpression();
if (usesVectorUDFAdaptor(vectorPredicateExpr)) {
vectorTaskColumnInfo.setUsesVectorUDFAdaptor(true);
}
}
}
break;
case SELECT:
{
vectorOp = vectorizeSelectOperator(op, vContext);
isNative = true;
if (vectorTaskColumnInfo != null) {
VectorSelectDesc vectorSelectDesc = (VectorSelectDesc) ((AbstractOperatorDesc) vectorOp.getConf()).getVectorDesc();
VectorExpression[] vectorSelectExprs = vectorSelectDesc.getSelectExpressions();
if (usesVectorUDFAdaptor(vectorSelectExprs)) {
vectorTaskColumnInfo.setUsesVectorUDFAdaptor(true);
}
}
}
break;
case GROUPBY:
{
vectorOp = vectorizeGroupByOperator(op, vContext);
isNative = false;
if (vectorTaskColumnInfo != null) {
VectorGroupByDesc vectorGroupByDesc = (VectorGroupByDesc) ((AbstractOperatorDesc) vectorOp.getConf()).getVectorDesc();
if (!vectorGroupByDesc.isVectorOutput()) {
vectorTaskColumnInfo.setGroupByVectorOutput(false);
}
VectorExpression[] vecKeyExpressions = vectorGroupByDesc.getKeyExpressions();
if (usesVectorUDFAdaptor(vecKeyExpressions)) {
vectorTaskColumnInfo.setUsesVectorUDFAdaptor(true);
}
VectorAggregateExpression[] vecAggregators = vectorGroupByDesc.getAggregators();
for (VectorAggregateExpression vecAggr : vecAggregators) {
if (usesVectorUDFAdaptor(vecAggr.inputExpression())) {
vectorTaskColumnInfo.setUsesVectorUDFAdaptor(true);
}
}
}
}
break;
case FILESINK:
{
FileSinkDesc fileSinkDesc = (FileSinkDesc) op.getConf();
VectorFileSinkDesc vectorFileSinkDesc = new VectorFileSinkDesc();
fileSinkDesc.setVectorDesc(vectorFileSinkDesc);
vectorOp = OperatorFactory.getVectorOperator(op.getCompilationOpContext(), fileSinkDesc, vContext);
isNative = false;
}
break;
case LIMIT:
{
LimitDesc limitDesc = (LimitDesc) op.getConf();
VectorLimitDesc vectorLimitDesc = new VectorLimitDesc();
limitDesc.setVectorDesc(vectorLimitDesc);
vectorOp = OperatorFactory.getVectorOperator(op.getCompilationOpContext(), limitDesc, vContext);
isNative = true;
}
break;
case EVENT:
{
AppMasterEventDesc eventDesc = (AppMasterEventDesc) op.getConf();
VectorAppMasterEventDesc vectorEventDesc = new VectorAppMasterEventDesc();
eventDesc.setVectorDesc(vectorEventDesc);
vectorOp = OperatorFactory.getVectorOperator(op.getCompilationOpContext(), eventDesc, vContext);
isNative = true;
}
break;
case HASHTABLESINK:
{
SparkHashTableSinkDesc sparkHashTableSinkDesc = (SparkHashTableSinkDesc) op.getConf();
VectorSparkHashTableSinkDesc vectorSparkHashTableSinkDesc = new VectorSparkHashTableSinkDesc();
sparkHashTableSinkDesc.setVectorDesc(vectorSparkHashTableSinkDesc);
vectorOp = OperatorFactory.getVectorOperator(op.getCompilationOpContext(), sparkHashTableSinkDesc, vContext);
isNative = true;
}
break;
case SPARKPRUNINGSINK:
{
SparkPartitionPruningSinkDesc sparkPartitionPruningSinkDesc = (SparkPartitionPruningSinkDesc) op.getConf();
VectorSparkPartitionPruningSinkDesc vectorSparkPartitionPruningSinkDesc = new VectorSparkPartitionPruningSinkDesc();
sparkPartitionPruningSinkDesc.setVectorDesc(vectorSparkPartitionPruningSinkDesc);
vectorOp = OperatorFactory.getVectorOperator(op.getCompilationOpContext(), sparkPartitionPruningSinkDesc, vContext);
isNative = true;
}
break;
default:
// These are children of GROUP BY operators with non-vector outputs.
isNative = false;
vectorOp = op;
break;
}
Preconditions.checkState(vectorOp != null);
if (vectorTaskColumnInfo != null && !isNative) {
vectorTaskColumnInfo.setAllNative(false);
}
LOG.debug("vectorizeOperator " + vectorOp.getClass().getName());
LOG.debug("vectorizeOperator " + vectorOp.getConf().getClass().getName());
if (vectorOp != op) {
fixupParentChildOperators(op, vectorOp);
((AbstractOperatorDesc) vectorOp.getConf()).setVectorMode(true);
}
return vectorOp;
}
use of org.apache.hadoop.hive.ql.plan.VectorGroupByDesc in project hive by apache.
the class Vectorizer method vectorizeGroupByOperator.
/*
* NOTE: The VectorGroupByDesc has already been allocated and partially populated.
*/
public static Operator<? extends OperatorDesc> vectorizeGroupByOperator(Operator<? extends OperatorDesc> groupByOp, VectorizationContext vContext) throws HiveException {
GroupByDesc groupByDesc = (GroupByDesc) groupByOp.getConf();
List<ExprNodeDesc> keysDesc = groupByDesc.getKeys();
VectorExpression[] vecKeyExpressions = vContext.getVectorExpressions(keysDesc);
ArrayList<AggregationDesc> aggrDesc = groupByDesc.getAggregators();
final int size = aggrDesc.size();
VectorAggregateExpression[] vecAggregators = new VectorAggregateExpression[size];
int[] projectedOutputColumns = new int[size];
for (int i = 0; i < size; ++i) {
AggregationDesc aggDesc = aggrDesc.get(i);
vecAggregators[i] = vContext.getAggregatorExpression(aggDesc);
// GroupBy generates a new vectorized row batch...
projectedOutputColumns[i] = i;
}
VectorGroupByDesc vectorGroupByDesc = (VectorGroupByDesc) groupByDesc.getVectorDesc();
vectorGroupByDesc.setKeyExpressions(vecKeyExpressions);
vectorGroupByDesc.setAggregators(vecAggregators);
vectorGroupByDesc.setProjectedOutputColumns(projectedOutputColumns);
return OperatorFactory.getVectorOperator(groupByOp.getCompilationOpContext(), groupByDesc, vContext);
}
use of org.apache.hadoop.hive.ql.plan.VectorGroupByDesc in project hive by apache.
the class TestVectorGroupByOperator method buildKeyGroupByDesc.
private static GroupByDesc buildKeyGroupByDesc(VectorizationContext ctx, String aggregate, String column, TypeInfo dataTypeInfo, String key, TypeInfo keyTypeInfo) {
GroupByDesc desc = buildGroupByDescType(ctx, aggregate, GenericUDAFEvaluator.Mode.PARTIAL1, column, dataTypeInfo);
((VectorGroupByDesc) desc.getVectorDesc()).setProcessingMode(ProcessingMode.HASH);
ExprNodeDesc keyExp = buildColumnDesc(ctx, key, keyTypeInfo);
ArrayList<ExprNodeDesc> keys = new ArrayList<ExprNodeDesc>();
keys.add(keyExp);
desc.setKeys(keys);
desc.getOutputColumnNames().add("_col1");
return desc;
}
use of org.apache.hadoop.hive.ql.plan.VectorGroupByDesc in project hive by apache.
the class TestVectorGroupByOperator method buildGroupByDescType.
private static GroupByDesc buildGroupByDescType(VectorizationContext ctx, String aggregate, GenericUDAFEvaluator.Mode mode, String column, TypeInfo dataType) {
AggregationDesc agg = buildAggregationDesc(ctx, aggregate, mode, column, dataType);
ArrayList<AggregationDesc> aggs = new ArrayList<AggregationDesc>();
aggs.add(agg);
ArrayList<String> outputColumnNames = new ArrayList<String>();
outputColumnNames.add("_col0");
GroupByDesc desc = new GroupByDesc();
desc.setVectorDesc(new VectorGroupByDesc());
desc.setOutputColumnNames(outputColumnNames);
desc.setAggregators(aggs);
((VectorGroupByDesc) desc.getVectorDesc()).setProcessingMode(ProcessingMode.GLOBAL);
return desc;
}
Aggregations