Examples with HashJoinPhase - io.crate.execution.dsl.phases.HashJoinPhase

Example 1 with HashJoinPhase

use of io.crate.execution.dsl.phases.HashJoinPhase in project crate by crate.

the class HashJoin method build.

@Override
public ExecutionPlan build(PlannerContext plannerContext, Set<PlanHint> hints, ProjectionBuilder projectionBuilder, int limit, int offset, @Nullable OrderBy order, @Nullable Integer pageSizeHint, Row params, SubQueryResults subQueryResults) {
    ExecutionPlan leftExecutionPlan = lhs.build(plannerContext, hints, projectionBuilder, NO_LIMIT, 0, null, null, params, subQueryResults);
    ExecutionPlan rightExecutionPlan = rhs.build(plannerContext, hints, projectionBuilder, NO_LIMIT, 0, null, null, params, subQueryResults);
    LogicalPlan leftLogicalPlan = lhs;
    LogicalPlan rightLogicalPlan = rhs;
    boolean tablesSwitched = false;
    // revealed that this improves performance in most cases.
    if (lhs.numExpectedRows() < rhs.numExpectedRows()) {
        tablesSwitched = true;
        leftLogicalPlan = rhs;
        rightLogicalPlan = lhs;
        ExecutionPlan tmp = leftExecutionPlan;
        leftExecutionPlan = rightExecutionPlan;
        rightExecutionPlan = tmp;
    }
    SubQueryAndParamBinder paramBinder = new SubQueryAndParamBinder(params, subQueryResults);
    Tuple<List<Symbol>, List<Symbol>> hashSymbols = extractHashJoinSymbolsFromJoinSymbolsAndSplitPerSide(tablesSwitched);
    ResultDescription leftResultDesc = leftExecutionPlan.resultDescription();
    ResultDescription rightResultDesc = rightExecutionPlan.resultDescription();
    Collection<String> joinExecutionNodes = leftResultDesc.nodeIds();
    List<Symbol> leftOutputs = leftLogicalPlan.outputs();
    List<Symbol> rightOutputs = rightLogicalPlan.outputs();
    MergePhase leftMerge = null;
    MergePhase rightMerge = null;
    // We can only run the join distributed if no remaining limit or offset must be applied on the source relations.
    // Because on distributed joins, every join is running on a slice (modulo) set of the data and so no limit/offset
    // could be applied. Limit/offset can only be applied on the whole data set after all partial rows from the
    // shards are merged
    boolean isDistributed = leftResultDesc.hasRemainingLimitOrOffset() == false && rightResultDesc.hasRemainingLimitOrOffset() == false;
    if (joinExecutionNodes.isEmpty()) {
        // The left source might have zero execution nodes, for example in the case of `sys.shards` without any tables
        // If the join then also uses zero execution nodes, a distributed plan no longer works because
        // the source operators wouldn't have a downstream node where they can send the results to.
        // → we switch to non-distributed which results in the join running on the handlerNode.
        isDistributed = false;
    }
    if (joinExecutionNodes.size() == 1 && joinExecutionNodes.equals(rightResultDesc.nodeIds()) && !rightResultDesc.hasRemainingLimitOrOffset()) {
        // If the left and the right plan are executed on the same single node the mergePhase
        // should be omitted. This is the case if the left and right table have only one shards which
        // are on the same node
        leftExecutionPlan.setDistributionInfo(DistributionInfo.DEFAULT_SAME_NODE);
        rightExecutionPlan.setDistributionInfo(DistributionInfo.DEFAULT_SAME_NODE);
    } else {
        if (isDistributed) {
            // Run the join distributed by modulo distribution algorithm
            leftOutputs = setModuloDistribution(Lists2.map(hashSymbols.v1(), paramBinder), leftLogicalPlan.outputs(), leftExecutionPlan);
            rightOutputs = setModuloDistribution(Lists2.map(hashSymbols.v2(), paramBinder), rightLogicalPlan.outputs(), rightExecutionPlan);
        } else {
            // Run the join non-distributed on the handler node
            joinExecutionNodes = Collections.singletonList(plannerContext.handlerNode());
            leftExecutionPlan.setDistributionInfo(DistributionInfo.DEFAULT_BROADCAST);
            rightExecutionPlan.setDistributionInfo(DistributionInfo.DEFAULT_BROADCAST);
        }
        leftMerge = JoinOperations.buildMergePhaseForJoin(plannerContext, leftResultDesc, joinExecutionNodes);
        rightMerge = JoinOperations.buildMergePhaseForJoin(plannerContext, rightResultDesc, joinExecutionNodes);
    }
    List<Symbol> joinOutputs = Lists2.concat(leftOutputs, rightOutputs);
    HashJoinPhase joinPhase = new HashJoinPhase(plannerContext.jobId(), plannerContext.nextExecutionPhaseId(), "hash-join", Collections.singletonList(JoinOperations.createJoinProjection(outputs, joinOutputs)), leftMerge, rightMerge, leftOutputs.size(), rightOutputs.size(), joinExecutionNodes, InputColumns.create(paramBinder.apply(joinCondition), joinOutputs), InputColumns.create(Lists2.map(hashSymbols.v1(), paramBinder), new InputColumns.SourceSymbols(leftOutputs)), InputColumns.create(Lists2.map(hashSymbols.v2(), paramBinder), new InputColumns.SourceSymbols(rightOutputs)), Symbols.typeView(leftOutputs), leftLogicalPlan.estimatedRowSize(), leftLogicalPlan.numExpectedRows());
    return new Join(joinPhase, leftExecutionPlan, rightExecutionPlan, TopN.NO_LIMIT, 0, TopN.NO_LIMIT, outputs.size(), null);
}

Also used : HashJoinPhase(io.crate.execution.dsl.phases.HashJoinPhase) SelectSymbol(io.crate.expression.symbol.SelectSymbol) Symbol(io.crate.expression.symbol.Symbol) Join(io.crate.planner.node.dql.join.Join) ExecutionPlan(io.crate.planner.ExecutionPlan) MergePhase(io.crate.execution.dsl.phases.MergePhase) ResultDescription(io.crate.planner.ResultDescription) ArrayList(java.util.ArrayList) List(java.util.List)

Example 2 with HashJoinPhase

use of io.crate.execution.dsl.phases.HashJoinPhase in project crate by crate.

the class JoinPhaseTest method testHashJoinSerialization.

@Test
public void testHashJoinSerialization() throws Exception {
    HashJoinPhase node = new HashJoinPhase(jobId, 1, "nestedLoop", List.of(topNProjection), mp1, mp2, 2, 3, Set.of("node1", "node2"), joinCondition, List.of(Literal.of("testLeft"), Literal.of(10)), List.of(Literal.of("testRight"), Literal.of(20)), List.of(DataTypes.STRING, DataTypes.INTEGER), 111, 222);
    BytesStreamOutput output = new BytesStreamOutput();
    node.writeTo(output);
    StreamInput input = output.bytes().streamInput();
    HashJoinPhase node2 = new HashJoinPhase(input);
    assertThat(node.nodeIds(), is(node2.nodeIds()));
    assertThat(node.jobId(), is(node2.jobId()));
    assertThat(node.joinCondition(), is(node2.joinCondition()));
    assertThat(node.type(), is(node2.type()));
    assertThat(node.nodeIds(), is(node2.nodeIds()));
    assertThat(node.jobId(), is(node2.jobId()));
    assertThat(node.name(), is(node2.name()));
    assertThat(node.outputTypes(), is(node2.outputTypes()));
    assertThat(node.joinType(), is(node2.joinType()));
    assertThat(node.joinCondition(), is(node2.joinCondition()));
    assertThat(node.leftJoinConditionInputs(), is(node2.leftJoinConditionInputs()));
    assertThat(node.rightJoinConditionInputs(), is(node2.rightJoinConditionInputs()));
    assertThat(node.numLeftOutputs(), is(node2.numLeftOutputs()));
    assertThat(node.numRightOutputs(), is(node2.numRightOutputs()));
    assertThat(node.leftOutputTypes(), is(node2.leftOutputTypes()));
    assertThat(node.estimatedRowSizeForLeft(), is(node2.estimatedRowSizeForLeft()));
    assertThat(node.numberOfRowsForLeft(), is(node2.numberOfRowsForLeft()));
}

Also used : HashJoinPhase(io.crate.execution.dsl.phases.HashJoinPhase) StreamInput(org.elasticsearch.common.io.stream.StreamInput) BytesStreamOutput(org.elasticsearch.common.io.stream.BytesStreamOutput) Test(org.junit.Test)

Aggregations

HashJoinPhase (io.crate.execution.dsl.phases.HashJoinPhase)2 MergePhase (io.crate.execution.dsl.phases.MergePhase)1 SelectSymbol (io.crate.expression.symbol.SelectSymbol)1 Symbol (io.crate.expression.symbol.Symbol)1 ExecutionPlan (io.crate.planner.ExecutionPlan)1 ResultDescription (io.crate.planner.ResultDescription)1 Join (io.crate.planner.node.dql.join.Join)1 ArrayList (java.util.ArrayList)1 List (java.util.List)1 BytesStreamOutput (org.elasticsearch.common.io.stream.BytesStreamOutput)1 StreamInput (org.elasticsearch.common.io.stream.StreamInput)1 Test (org.junit.Test)1