Examples with HashDistribution - org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution

Example 1 with HashDistribution

use of org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution in project flink by apache.

the class StreamExecExchange method translateToPlanInternal.

@SuppressWarnings("unchecked")
@Override
protected Transformation<RowData> translateToPlanInternal(PlannerBase planner, ExecNodeConfig config) {
    final Transformation<RowData> inputTransform = (Transformation<RowData>) getInputEdges().get(0).translateToPlan(planner);
    final StreamPartitioner<RowData> partitioner;
    final int parallelism;
    final InputProperty inputProperty = getInputProperties().get(0);
    final InputProperty.DistributionType distributionType = inputProperty.getRequiredDistribution().getType();
    switch(distributionType) {
        case SINGLETON:
            partitioner = new GlobalPartitioner<>();
            parallelism = 1;
            break;
        case HASH:
            // TODO Eliminate duplicate keys
            int[] keys = ((HashDistribution) inputProperty.getRequiredDistribution()).getKeys();
            InternalTypeInfo<RowData> inputType = (InternalTypeInfo<RowData>) inputTransform.getOutputType();
            RowDataKeySelector keySelector = KeySelectorUtil.getRowDataSelector(keys, inputType);
            partitioner = new KeyGroupStreamPartitioner<>(keySelector, DEFAULT_LOWER_BOUND_MAX_PARALLELISM);
            parallelism = ExecutionConfig.PARALLELISM_DEFAULT;
            break;
        default:
            throw new TableException(String.format("%s is not supported now!", distributionType));
    }
    final Transformation<RowData> transformation = new PartitionTransformation<>(inputTransform, partitioner);
    createTransformationMeta(EXCHANGE_TRANSFORMATION, config).fill(transformation);
    transformation.setParallelism(parallelism);
    transformation.setOutputType(InternalTypeInfo.of(getOutputType()));
    return transformation;
}

Also used : PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) Transformation(org.apache.flink.api.dag.Transformation) TableException(org.apache.flink.table.api.TableException) InputProperty(org.apache.flink.table.planner.plan.nodes.exec.InputProperty) PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) HashDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution) InternalTypeInfo(org.apache.flink.table.runtime.typeutils.InternalTypeInfo) RowData(org.apache.flink.table.data.RowData) RowDataKeySelector(org.apache.flink.table.runtime.keyselector.RowDataKeySelector)

Example 2 with HashDistribution

use of org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution in project flink by apache.

the class BatchExecExchange method getHashDistributionDescription.

private String getHashDistributionDescription(HashDistribution hashDistribution) {
    RowType inputRowType = (RowType) getInputEdges().get(0).getOutputType();
    String[] fieldNames = Arrays.stream(hashDistribution.getKeys()).mapToObj(i -> inputRowType.getFieldNames().get(i)).toArray(String[]::new);
    return Arrays.stream(fieldNames).collect(Collectors.joining(", ", "[", "]"));
}

Also used : Arrays(java.util.Arrays) InputProperty(org.apache.flink.table.planner.plan.nodes.exec.InputProperty) ForwardForConsecutiveHashPartitioner(org.apache.flink.streaming.runtime.partitioner.ForwardForConsecutiveHashPartitioner) BinaryHashPartitioner(org.apache.flink.table.runtime.partitioner.BinaryHashPartitioner) BroadcastPartitioner(org.apache.flink.streaming.runtime.partitioner.BroadcastPartitioner) RequiredDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.RequiredDistribution) RowType(org.apache.flink.table.types.logical.RowType) ExecNode(org.apache.flink.table.planner.plan.nodes.exec.ExecNode) KeepInputAsIsDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.KeepInputAsIsDistribution) StreamPartitioner(org.apache.flink.streaming.runtime.partitioner.StreamPartitioner) StreamExchangeModeUtils.getBatchStreamExchangeMode(org.apache.flink.table.planner.utils.StreamExchangeModeUtils.getBatchStreamExchangeMode) CodeGeneratorContext(org.apache.flink.table.planner.codegen.CodeGeneratorContext) Nullable(javax.annotation.Nullable) ExecNodeContext(org.apache.flink.table.planner.plan.nodes.exec.ExecNodeContext) PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) ForwardPartitioner(org.apache.flink.streaming.runtime.partitioner.ForwardPartitioner) RowData(org.apache.flink.table.data.RowData) PlannerBase(org.apache.flink.table.planner.delegation.PlannerBase) ExecNodeConfig(org.apache.flink.table.planner.plan.nodes.exec.ExecNodeConfig) TableException(org.apache.flink.table.api.TableException) Collectors(java.util.stream.Collectors) VisibleForTesting(org.apache.flink.annotation.VisibleForTesting) GlobalPartitioner(org.apache.flink.streaming.runtime.partitioner.GlobalPartitioner) HashCodeGenerator(org.apache.flink.table.planner.codegen.HashCodeGenerator) StreamExchangeMode(org.apache.flink.streaming.api.transformations.StreamExchangeMode) HashDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution) InternalTypeInfo(org.apache.flink.table.runtime.typeutils.InternalTypeInfo) ExecEdge(org.apache.flink.table.planner.plan.nodes.exec.ExecEdge) Preconditions.checkArgument(org.apache.flink.util.Preconditions.checkArgument) ExecutionConfig(org.apache.flink.api.common.ExecutionConfig) Optional(java.util.Optional) Transformation(org.apache.flink.api.dag.Transformation) CommonExecExchange(org.apache.flink.table.planner.plan.nodes.exec.common.CommonExecExchange) Collections(java.util.Collections) RowType(org.apache.flink.table.types.logical.RowType)

Example 3 with HashDistribution

use of org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution in project flink by apache.

the class BatchExecExchange method translateToPlanInternal.

@SuppressWarnings("unchecked")
@Override
protected Transformation<RowData> translateToPlanInternal(PlannerBase planner, ExecNodeConfig config) {
    final ExecEdge inputEdge = getInputEdges().get(0);
    final Transformation<RowData> inputTransform = (Transformation<RowData>) inputEdge.translateToPlan(planner);
    final RowType inputType = (RowType) inputEdge.getOutputType();
    boolean requireUndefinedExchangeMode = false;
    final StreamPartitioner<RowData> partitioner;
    final int parallelism;
    final InputProperty inputProperty = getInputProperties().get(0);
    final RequiredDistribution requiredDistribution = inputProperty.getRequiredDistribution();
    final InputProperty.DistributionType distributionType = requiredDistribution.getType();
    switch(distributionType) {
        case ANY:
            partitioner = null;
            parallelism = ExecutionConfig.PARALLELISM_DEFAULT;
            break;
        case BROADCAST:
            partitioner = new BroadcastPartitioner<>();
            parallelism = ExecutionConfig.PARALLELISM_DEFAULT;
            break;
        case SINGLETON:
            partitioner = new GlobalPartitioner<>();
            parallelism = 1;
            break;
        case HASH:
            partitioner = createHashPartitioner(((HashDistribution) requiredDistribution), inputType, config);
            parallelism = ExecutionConfig.PARALLELISM_DEFAULT;
            break;
        case KEEP_INPUT_AS_IS:
            KeepInputAsIsDistribution keepInputAsIsDistribution = (KeepInputAsIsDistribution) requiredDistribution;
            if (keepInputAsIsDistribution.isStrict()) {
                // explicitly use ForwardPartitioner to guarantee the data distribution is
                // exactly the same as input
                partitioner = new ForwardPartitioner<>();
                requireUndefinedExchangeMode = true;
            } else {
                RequiredDistribution inputDistribution = ((KeepInputAsIsDistribution) requiredDistribution).getInputDistribution();
                checkArgument(inputDistribution instanceof HashDistribution, "Only HashDistribution is supported now");
                partitioner = new ForwardForConsecutiveHashPartitioner<>(createHashPartitioner(((HashDistribution) inputDistribution), inputType, config));
            }
            parallelism = inputTransform.getParallelism();
            break;
        default:
            throw new TableException(distributionType + "is not supported now!");
    }
    final StreamExchangeMode exchangeMode = requireUndefinedExchangeMode ? StreamExchangeMode.UNDEFINED : getBatchStreamExchangeMode(config, requiredExchangeMode);
    final Transformation<RowData> transformation = new PartitionTransformation<>(inputTransform, partitioner, exchangeMode);
    transformation.setParallelism(parallelism);
    transformation.setOutputType(InternalTypeInfo.of(getOutputType()));
    return transformation;
}

Also used : RequiredDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.RequiredDistribution) PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) Transformation(org.apache.flink.api.dag.Transformation) TableException(org.apache.flink.table.api.TableException) ExecEdge(org.apache.flink.table.planner.plan.nodes.exec.ExecEdge) InputProperty(org.apache.flink.table.planner.plan.nodes.exec.InputProperty) RowType(org.apache.flink.table.types.logical.RowType) PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) HashDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution) RowData(org.apache.flink.table.data.RowData) KeepInputAsIsDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.KeepInputAsIsDistribution) StreamExchangeModeUtils.getBatchStreamExchangeMode(org.apache.flink.table.planner.utils.StreamExchangeModeUtils.getBatchStreamExchangeMode) StreamExchangeMode(org.apache.flink.streaming.api.transformations.StreamExchangeMode)

Example 4 with HashDistribution

use of org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution in project flink by apache.

the class BatchExecExchange method getDescription.

@Override
public String getDescription() {
    // make sure the description be consistent with before, update this once plan is stable
    RequiredDistribution requiredDistribution = getInputProperties().get(0).getRequiredDistribution();
    StringBuilder sb = new StringBuilder();
    String type = requiredDistribution.getType().name().toLowerCase();
    if (type.equals("singleton")) {
        type = "single";
    } else if (requiredDistribution instanceof KeepInputAsIsDistribution && ((KeepInputAsIsDistribution) requiredDistribution).isStrict()) {
        type = "forward";
    }
    sb.append("distribution=[").append(type);
    if (requiredDistribution instanceof HashDistribution) {
        sb.append(getHashDistributionDescription((HashDistribution) requiredDistribution));
    } else if (requiredDistribution instanceof KeepInputAsIsDistribution && !((KeepInputAsIsDistribution) requiredDistribution).isStrict()) {
        KeepInputAsIsDistribution distribution = (KeepInputAsIsDistribution) requiredDistribution;
        sb.append("[hash").append(getHashDistributionDescription((HashDistribution) distribution.getInputDistribution())).append("]");
    }
    sb.append("]");
    if (requiredExchangeMode == StreamExchangeMode.BATCH) {
        sb.append(", shuffle_mode=[BATCH]");
    }
    return String.format("Exchange(%s)", sb);
}

Also used : RequiredDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.RequiredDistribution) KeepInputAsIsDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.KeepInputAsIsDistribution) HashDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution)

Example 5 with HashDistribution

use of org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution in project flink by apache.

the class RequiredDistributionJsonSerializer method serialize.

@Override
public void serialize(RequiredDistribution requiredDistribution, JsonGenerator jsonGenerator, SerializerProvider serializerProvider) throws IOException {
    jsonGenerator.writeStartObject();
    DistributionType type = requiredDistribution.getType();
    jsonGenerator.writeStringField("type", type.name());
    switch(type) {
        case ANY:
        case SINGLETON:
        case BROADCAST:
        case UNKNOWN:
            // do nothing, type name is enough
            break;
        case HASH:
            HashDistribution hashDistribution = (HashDistribution) requiredDistribution;
            jsonGenerator.writeFieldName("keys");
            jsonGenerator.writeArray(hashDistribution.getKeys(), // offset
            0, hashDistribution.getKeys().length);
            break;
        default:
            throw new TableException("Unsupported distribution type: " + type);
    }
    jsonGenerator.writeEndObject();
}

Also used : TableException(org.apache.flink.table.api.TableException) HashDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution) DistributionType(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.DistributionType)

Aggregations

HashDistribution (org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution)5 TableException (org.apache.flink.table.api.TableException)4 Transformation (org.apache.flink.api.dag.Transformation)3 PartitionTransformation (org.apache.flink.streaming.api.transformations.PartitionTransformation)3 RowData (org.apache.flink.table.data.RowData)3 InputProperty (org.apache.flink.table.planner.plan.nodes.exec.InputProperty)3 KeepInputAsIsDistribution (org.apache.flink.table.planner.plan.nodes.exec.InputProperty.KeepInputAsIsDistribution)3 RequiredDistribution (org.apache.flink.table.planner.plan.nodes.exec.InputProperty.RequiredDistribution)3 StreamExchangeMode (org.apache.flink.streaming.api.transformations.StreamExchangeMode)2 ExecEdge (org.apache.flink.table.planner.plan.nodes.exec.ExecEdge)2 StreamExchangeModeUtils.getBatchStreamExchangeMode (org.apache.flink.table.planner.utils.StreamExchangeModeUtils.getBatchStreamExchangeMode)2 InternalTypeInfo (org.apache.flink.table.runtime.typeutils.InternalTypeInfo)2 RowType (org.apache.flink.table.types.logical.RowType)2 Arrays (java.util.Arrays)1 Collections (java.util.Collections)1 Optional (java.util.Optional)1 Collectors (java.util.stream.Collectors)1 Nullable (javax.annotation.Nullable)1 VisibleForTesting (org.apache.flink.annotation.VisibleForTesting)1 ExecutionConfig (org.apache.flink.api.common.ExecutionConfig)1