use of io.trino.cost.LocalCostEstimate in project trino by trinodb.
the class DetermineSemiJoinDistributionType method getSemiJoinNodeWithCost.
private PlanNodeWithCost getSemiJoinNodeWithCost(SemiJoinNode possibleJoinNode, Context context) {
TypeProvider types = context.getSymbolAllocator().getTypes();
StatsProvider stats = context.getStatsProvider();
boolean replicated = possibleJoinNode.getDistributionType().get() == REPLICATED;
/*
* HACK!
*
* Currently cost model always has to compute the total cost of an operation.
* For SEMI-JOIN the total cost consist of 4 parts:
* - Cost of exchanges that have to be introduced to execute a JOIN
* - Cost of building a hash table
* - Cost of probing a hash table
* - Cost of building an output for matched rows
*
* When output size for a SEMI-JOIN cannot be estimated the cost model returns
* UNKNOWN cost for the join.
*
* However assuming the cost of SEMI-JOIN output is always the same, we can still make
* cost based decisions based on the input cost for different types of SEMI-JOINs.
*
* TODO Decision about the distribution should be based on LocalCostEstimate only when PlanCostEstimate cannot be calculated. Otherwise cost comparator cannot take query.max-memory into account.
*/
int estimatedSourceDistributedTaskCount = taskCountEstimator.estimateSourceDistributedTaskCount(context.getSession());
LocalCostEstimate cost = calculateJoinCostWithoutOutput(possibleJoinNode.getSource(), possibleJoinNode.getFilteringSource(), stats, types, replicated, estimatedSourceDistributedTaskCount);
return new PlanNodeWithCost(cost.toPlanCost(), possibleJoinNode);
}
use of io.trino.cost.LocalCostEstimate in project trino by trinodb.
the class DetermineJoinDistributionType method getJoinNodeWithCost.
private PlanNodeWithCost getJoinNodeWithCost(Context context, JoinNode possibleJoinNode) {
TypeProvider types = context.getSymbolAllocator().getTypes();
StatsProvider stats = context.getStatsProvider();
boolean replicated = possibleJoinNode.getDistributionType().get() == REPLICATED;
/*
* HACK!
*
* Currently cost model always has to compute the total cost of an operation.
* For JOIN the total cost consist of 4 parts:
* - Cost of exchanges that have to be introduced to execute a JOIN
* - Cost of building a hash table
* - Cost of probing a hash table
* - Cost of building an output for matched rows
*
* When output size for a JOIN cannot be estimated the cost model returns
* UNKNOWN cost for the join.
*
* However assuming the cost of JOIN output is always the same, we can still make
* cost based decisions based on the input cost for different types of JOINs.
*
* Although the side flipping can be made purely based on stats (smaller side
* always goes to the right), determining JOIN type is not that simple. As when
* choosing REPLICATED over REPARTITIONED join the cost of exchanging and building
* the hash table scales with the number of nodes where the build side is replicated.
*
* TODO Decision about the distribution should be based on LocalCostEstimate only when PlanCostEstimate cannot be calculated. Otherwise cost comparator cannot take query.max-memory into account.
*/
int estimatedSourceDistributedTaskCount = taskCountEstimator.estimateSourceDistributedTaskCount(context.getSession());
LocalCostEstimate cost = calculateJoinCostWithoutOutput(possibleJoinNode.getLeft(), possibleJoinNode.getRight(), stats, types, replicated, estimatedSourceDistributedTaskCount);
return new PlanNodeWithCost(cost.toPlanCost(), possibleJoinNode);
}
Aggregations