Examples with PartitionTransformation - org.apache.flink.streaming.api.transformations.PartitionTransformation

Example 1 with PartitionTransformation

use of org.apache.flink.streaming.api.transformations.PartitionTransformation in project flink by apache.

the class PythonConfigUtil method configForwardPartitioner.

private static void configForwardPartitioner(Transformation<?> upTransformation, Transformation<?> transformation) throws IllegalAccessException, NoSuchFieldException {
    // set ForwardPartitioner
    PartitionTransformation<?> partitionTransform = new PartitionTransformation<>(upTransformation, new ForwardPartitioner<>());
    Field inputTransformationField = transformation.getClass().getDeclaredField("input");
    inputTransformationField.setAccessible(true);
    inputTransformationField.set(transformation, partitionTransform);
}

Also used : Field(java.lang.reflect.Field) PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation)

Example 2 with PartitionTransformation

use of org.apache.flink.streaming.api.transformations.PartitionTransformation in project flink by apache.

the class StreamExecExchange method translateToPlanInternal.

@SuppressWarnings("unchecked")
@Override
protected Transformation<RowData> translateToPlanInternal(PlannerBase planner, ExecNodeConfig config) {
    final Transformation<RowData> inputTransform = (Transformation<RowData>) getInputEdges().get(0).translateToPlan(planner);
    final StreamPartitioner<RowData> partitioner;
    final int parallelism;
    final InputProperty inputProperty = getInputProperties().get(0);
    final InputProperty.DistributionType distributionType = inputProperty.getRequiredDistribution().getType();
    switch(distributionType) {
        case SINGLETON:
            partitioner = new GlobalPartitioner<>();
            parallelism = 1;
            break;
        case HASH:
            // TODO Eliminate duplicate keys
            int[] keys = ((HashDistribution) inputProperty.getRequiredDistribution()).getKeys();
            InternalTypeInfo<RowData> inputType = (InternalTypeInfo<RowData>) inputTransform.getOutputType();
            RowDataKeySelector keySelector = KeySelectorUtil.getRowDataSelector(keys, inputType);
            partitioner = new KeyGroupStreamPartitioner<>(keySelector, DEFAULT_LOWER_BOUND_MAX_PARALLELISM);
            parallelism = ExecutionConfig.PARALLELISM_DEFAULT;
            break;
        default:
            throw new TableException(String.format("%s is not supported now!", distributionType));
    }
    final Transformation<RowData> transformation = new PartitionTransformation<>(inputTransform, partitioner);
    createTransformationMeta(EXCHANGE_TRANSFORMATION, config).fill(transformation);
    transformation.setParallelism(parallelism);
    transformation.setOutputType(InternalTypeInfo.of(getOutputType()));
    return transformation;
}

Also used : PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) Transformation(org.apache.flink.api.dag.Transformation) TableException(org.apache.flink.table.api.TableException) InputProperty(org.apache.flink.table.planner.plan.nodes.exec.InputProperty) PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) HashDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution) InternalTypeInfo(org.apache.flink.table.runtime.typeutils.InternalTypeInfo) RowData(org.apache.flink.table.data.RowData) RowDataKeySelector(org.apache.flink.table.runtime.keyselector.RowDataKeySelector)

Example 3 with PartitionTransformation

use of org.apache.flink.streaming.api.transformations.PartitionTransformation in project flink by apache.

the class StreamingJobGraphGeneratorTest method testExchangeModeUndefined.

/**
 * Test setting exchange mode to {@link StreamExchangeMode#UNDEFINED}.
 */
@Test
public void testExchangeModeUndefined() {
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    // fromElements -> Map -> Print
    DataStream<Integer> sourceDataStream = env.fromElements(1, 2, 3);
    DataStream<Integer> partitionAfterSourceDataStream = new DataStream<>(env, new PartitionTransformation<>(sourceDataStream.getTransformation(), new ForwardPartitioner<>(), StreamExchangeMode.UNDEFINED));
    DataStream<Integer> mapDataStream = partitionAfterSourceDataStream.map(value -> value).setParallelism(1);
    DataStream<Integer> partitionAfterMapDataStream = new DataStream<>(env, new PartitionTransformation<>(mapDataStream.getTransformation(), new RescalePartitioner<>(), StreamExchangeMode.UNDEFINED));
    partitionAfterMapDataStream.print().setParallelism(2);
    JobGraph jobGraph = StreamingJobGraphGenerator.createJobGraph(env.getStreamGraph());
    List<JobVertex> verticesSorted = jobGraph.getVerticesSortedTopologicallyFromSources();
    assertEquals(2, verticesSorted.size());
    // it can be chained with UNDEFINED exchange mode
    JobVertex sourceAndMapVertex = verticesSorted.get(0);
    // UNDEFINED exchange mode is translated into PIPELINED_BOUNDED result partition by default
    assertEquals(ResultPartitionType.PIPELINED_BOUNDED, sourceAndMapVertex.getProducedDataSets().get(0).getResultType());
}

Also used : Arrays(java.util.Arrays) Tuple2(org.apache.flink.api.java.tuple.Tuple2) TypeSerializerInputFormat(org.apache.flink.api.java.io.TypeSerializerInputFormat) YieldingOperatorFactory(org.apache.flink.streaming.api.operators.YieldingOperatorFactory) AbstractStreamOperatorFactory(org.apache.flink.streaming.api.operators.AbstractStreamOperatorFactory) UserCodeWrapper(org.apache.flink.api.common.operators.util.UserCodeWrapper) ResourceSpec(org.apache.flink.api.common.operators.ResourceSpec) ManagedMemoryUseCase(org.apache.flink.core.memory.ManagedMemoryUseCase) Map(java.util.Map) CoLocationGroup(org.apache.flink.runtime.jobmanager.scheduler.CoLocationGroup) ForwardPartitioner(org.apache.flink.streaming.runtime.partitioner.ForwardPartitioner) SinkFunction(org.apache.flink.streaming.api.functions.sink.SinkFunction) TaskConfig(org.apache.flink.runtime.operators.util.TaskConfig) Set(java.util.Set) FlatMapFunction(org.apache.flink.api.common.functions.FlatMapFunction) FilterFunction(org.apache.flink.api.common.functions.FilterFunction) Assert.assertFalse(org.junit.Assert.assertFalse) StreamingJobGraphGenerator.areOperatorsChainable(org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.areOperatorsChainable) Boundedness(org.apache.flink.api.connector.source.Boundedness) OneInputStreamOperatorFactory(org.apache.flink.streaming.api.operators.OneInputStreamOperatorFactory) MultipleInputTransformation(org.apache.flink.streaming.api.transformations.MultipleInputTransformation) NumberSequenceSource(org.apache.flink.api.connector.source.lib.NumberSequenceSource) CoreMatchers.equalTo(org.hamcrest.CoreMatchers.equalTo) ArrayList(java.util.ArrayList) TaskManagerOptions(org.apache.flink.configuration.TaskManagerOptions) Collector(org.apache.flink.util.Collector) Iterables(org.apache.flink.shaded.guava30.com.google.common.collect.Iterables) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) Types(org.apache.flink.api.common.typeinfo.Types) DataStreamSink(org.apache.flink.streaming.api.datastream.DataStreamSink) DiscardingOutputFormat(org.apache.flink.api.java.io.DiscardingOutputFormat) MailboxExecutor(org.apache.flink.api.common.operators.MailboxExecutor) SingleOutputStreamOperator(org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator) Assert.assertTrue(org.junit.Assert.assertTrue) Test(org.junit.Test) OneInputTransformation(org.apache.flink.streaming.api.transformations.OneInputTransformation) Assert.assertNotEquals(org.junit.Assert.assertNotEquals) StreamOperator(org.apache.flink.streaming.api.operators.StreamOperator) Assert.assertNull(org.junit.Assert.assertNull) Matcher(org.hamcrest.Matcher) Transformation(org.apache.flink.api.dag.Transformation) Assert(org.junit.Assert) SavepointRestoreSettings(org.apache.flink.runtime.jobgraph.SavepointRestoreSettings) Assert.assertEquals(org.junit.Assert.assertEquals) CoreMatchers.is(org.hamcrest.CoreMatchers.is) PipelineOptions(org.apache.flink.configuration.PipelineOptions) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) CheckpointingMode(org.apache.flink.streaming.api.CheckpointingMode) MapFunction(org.apache.flink.api.common.functions.MapFunction) BasicTypeInfo(org.apache.flink.api.common.typeinfo.BasicTypeInfo) ChainingStrategy(org.apache.flink.streaming.api.operators.ChainingStrategy) TestLogger(org.apache.flink.util.TestLogger) InputFormat(org.apache.flink.api.common.io.InputFormat) Assert.fail(org.junit.Assert.fail) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) Method(java.lang.reflect.Method) OutputFormat(org.apache.flink.api.common.io.OutputFormat) JobCheckpointingSettings(org.apache.flink.runtime.jobgraph.tasks.JobCheckpointingSettings) PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) WatermarkStrategy(org.apache.flink.api.common.eventtime.WatermarkStrategy) Collectors(java.util.stream.Collectors) ResourceProfile(org.apache.flink.runtime.clusterframework.types.ResourceProfile) SimpleOperatorFactory(org.apache.flink.streaming.api.operators.SimpleOperatorFactory) List(java.util.List) MultipleInputStreamTask(org.apache.flink.streaming.runtime.tasks.MultipleInputStreamTask) SerializedValue(org.apache.flink.util.SerializedValue) ExecutionConfig(org.apache.flink.api.common.ExecutionConfig) OperatorID(org.apache.flink.runtime.jobgraph.OperatorID) CheckpointConfig(org.apache.flink.streaming.api.environment.CheckpointConfig) ParallelSourceFunction(org.apache.flink.streaming.api.functions.source.ParallelSourceFunction) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) SlotSharingGroup(org.apache.flink.runtime.jobmanager.scheduler.SlotSharingGroup) IterativeStream(org.apache.flink.streaming.api.datastream.IterativeStream) StreamOperatorFactory(org.apache.flink.streaming.api.operators.StreamOperatorFactory) InputOutputFormatVertex(org.apache.flink.runtime.jobgraph.InputOutputFormatVertex) ResultPartitionType(org.apache.flink.runtime.io.network.partition.ResultPartitionType) HashMap(java.util.HashMap) DataStreamSource(org.apache.flink.streaming.api.datastream.DataStreamSource) JobType(org.apache.flink.runtime.jobgraph.JobType) SourceOperatorFactory(org.apache.flink.streaming.api.operators.SourceOperatorFactory) MockSource(org.apache.flink.api.connector.source.mocks.MockSource) SourceOperatorStreamTask(org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask) StreamMap(org.apache.flink.streaming.api.operators.StreamMap) ReduceFunction(org.apache.flink.api.common.functions.ReduceFunction) RebalancePartitioner(org.apache.flink.streaming.runtime.partitioner.RebalancePartitioner) DiscardingSink(org.apache.flink.streaming.api.functions.sink.DiscardingSink) Assert.assertNotNull(org.junit.Assert.assertNotNull) Configuration(org.apache.flink.configuration.Configuration) CoordinatedOperatorFactory(org.apache.flink.streaming.api.operators.CoordinatedOperatorFactory) StreamOperatorParameters(org.apache.flink.streaming.api.operators.StreamOperatorParameters) InputFormatSourceFunction(org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction) DataStream(org.apache.flink.streaming.api.datastream.DataStream) RescalePartitioner(org.apache.flink.streaming.runtime.partitioner.RescalePartitioner) FeatureMatcher(org.hamcrest.FeatureMatcher) StreamExchangeMode(org.apache.flink.streaming.api.transformations.StreamExchangeMode) TestAnyModeReadingStreamOperator(org.apache.flink.streaming.util.TestAnyModeReadingStreamOperator) OperatorCoordinator(org.apache.flink.runtime.operators.coordination.OperatorCoordinator) InputOutputFormatContainer(org.apache.flink.runtime.jobgraph.InputOutputFormatContainer) Comparator(java.util.Comparator) RuntimeExecutionMode(org.apache.flink.api.common.RuntimeExecutionMode) Collections(java.util.Collections) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) DataStream(org.apache.flink.streaming.api.datastream.DataStream) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) ForwardPartitioner(org.apache.flink.streaming.runtime.partitioner.ForwardPartitioner) RescalePartitioner(org.apache.flink.streaming.runtime.partitioner.RescalePartitioner) Test(org.junit.Test)

Example 4 with PartitionTransformation

use of org.apache.flink.streaming.api.transformations.PartitionTransformation in project flink by apache.

the class StreamingJobGraphGeneratorTest method testExchangeModePipelined.

/**
 * Test setting exchange mode to {@link StreamExchangeMode#PIPELINED}.
 */
@Test
public void testExchangeModePipelined() {
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    // fromElements -> Map -> Print
    DataStream<Integer> sourceDataStream = env.fromElements(1, 2, 3);
    DataStream<Integer> partitionAfterSourceDataStream = new DataStream<>(env, new PartitionTransformation<>(sourceDataStream.getTransformation(), new ForwardPartitioner<>(), StreamExchangeMode.PIPELINED));
    DataStream<Integer> mapDataStream = partitionAfterSourceDataStream.map(value -> value).setParallelism(1);
    DataStream<Integer> partitionAfterMapDataStream = new DataStream<>(env, new PartitionTransformation<>(mapDataStream.getTransformation(), new RescalePartitioner<>(), StreamExchangeMode.PIPELINED));
    partitionAfterMapDataStream.print().setParallelism(2);
    JobGraph jobGraph = StreamingJobGraphGenerator.createJobGraph(env.getStreamGraph());
    List<JobVertex> verticesSorted = jobGraph.getVerticesSortedTopologicallyFromSources();
    assertEquals(2, verticesSorted.size());
    // it can be chained with PIPELINED exchange mode
    JobVertex sourceAndMapVertex = verticesSorted.get(0);
    // PIPELINED exchange mode is translated into PIPELINED_BOUNDED result partition
    assertEquals(ResultPartitionType.PIPELINED_BOUNDED, sourceAndMapVertex.getProducedDataSets().get(0).getResultType());
}

Example 5 with PartitionTransformation

use of org.apache.flink.streaming.api.transformations.PartitionTransformation in project flink by apache.

the class CommonExecSink method applyKeyBy.

/**
 * Apply a primary key partition transformation to guarantee the strict ordering of changelog
 * messages.
 */
private Transformation<RowData> applyKeyBy(ReadableConfig config, Transformation<RowData> inputTransform, int[] primaryKeys, int sinkParallelism, int inputParallelism, boolean inputInsertOnly, boolean needMaterialize) {
    final ExecutionConfigOptions.SinkKeyedShuffle sinkShuffleByPk = config.get(ExecutionConfigOptions.TABLE_EXEC_SINK_KEYED_SHUFFLE);
    boolean sinkKeyBy = false;
    switch(sinkShuffleByPk) {
        case NONE:
            break;
        case AUTO:
            sinkKeyBy = inputInsertOnly && sinkParallelism != inputParallelism;
            break;
        case FORCE:
            // single parallelism has no problem
            sinkKeyBy = sinkParallelism != 1 || inputParallelism != 1;
            break;
    }
    if (!sinkKeyBy && !needMaterialize) {
        return inputTransform;
    }
    final RowDataKeySelector selector = KeySelectorUtil.getRowDataSelector(primaryKeys, getInputTypeInfo());
    final KeyGroupStreamPartitioner<RowData, RowData> partitioner = new KeyGroupStreamPartitioner<>(selector, KeyGroupRangeAssignment.DEFAULT_LOWER_BOUND_MAX_PARALLELISM);
    Transformation<RowData> partitionedTransform = new PartitionTransformation<>(inputTransform, partitioner);
    createTransformationMeta(PARTITIONER_TRANSFORMATION, "Partitioner", "Partitioner", config).fill(partitionedTransform);
    partitionedTransform.setParallelism(sinkParallelism);
    return partitionedTransform;
}

Also used : RowData(org.apache.flink.table.data.RowData) ExecutionConfigOptions(org.apache.flink.table.api.config.ExecutionConfigOptions) RowDataKeySelector(org.apache.flink.table.runtime.keyselector.RowDataKeySelector) PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) KeyGroupStreamPartitioner(org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner)

Aggregations

PartitionTransformation (org.apache.flink.streaming.api.transformations.PartitionTransformation)13 Transformation (org.apache.flink.api.dag.Transformation)10 ArrayList (java.util.ArrayList)7 List (java.util.List)7 StreamExchangeMode (org.apache.flink.streaming.api.transformations.StreamExchangeMode)7 Arrays (java.util.Arrays)6 Collections (java.util.Collections)6 HashMap (java.util.HashMap)6 Map (java.util.Map)6 ExecutionConfig (org.apache.flink.api.common.ExecutionConfig)6 ResourceSpec (org.apache.flink.api.common.operators.ResourceSpec)6 BasicTypeInfo (org.apache.flink.api.common.typeinfo.BasicTypeInfo)6 TypeInformation (org.apache.flink.api.common.typeinfo.TypeInformation)6 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)6 Configuration (org.apache.flink.configuration.Configuration)6 ManagedMemoryUseCase (org.apache.flink.core.memory.ManagedMemoryUseCase)6 ResourceProfile (org.apache.flink.runtime.clusterframework.types.ResourceProfile)6 ResultPartitionType (org.apache.flink.runtime.io.network.partition.ResultPartitionType)6 DataStream (org.apache.flink.streaming.api.datastream.DataStream)5 StreamExecutionEnvironment (org.apache.flink.streaming.api.environment.StreamExecutionEnvironment)5