Search in sources :

Example 11 with PartitionTransformation

use of org.apache.flink.streaming.api.transformations.PartitionTransformation in project flink by apache.

the class StreamingJobGraphGeneratorWithGlobalStreamExchangeModeTest method createStreamGraph.

/**
 * Topology: source(parallelism=1) --(forward)--> map1(parallelism=1) --(rescale)-->
 * map2(parallelism=2) --(rebalance)--> sink(parallelism=2).
 */
private static StreamGraph createStreamGraph(GlobalStreamExchangeMode globalStreamExchangeMode) {
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    if (globalStreamExchangeMode != GlobalStreamExchangeMode.ALL_EDGES_PIPELINED) {
        env.setBufferTimeout(-1);
    }
    final DataStream<Integer> source = env.fromElements(1, 2, 3).setParallelism(1);
    final DataStream<Integer> forward = new DataStream<>(env, new PartitionTransformation<>(source.getTransformation(), new ForwardPartitioner<>(), StreamExchangeMode.UNDEFINED));
    final DataStream<Integer> map1 = forward.map(i -> i).startNewChain().setParallelism(1);
    final DataStream<Integer> rescale = new DataStream<>(env, new PartitionTransformation<>(map1.getTransformation(), new RescalePartitioner<>(), StreamExchangeMode.UNDEFINED));
    final DataStream<Integer> map2 = rescale.map(i -> i).setParallelism(2);
    map2.rebalance().print().setParallelism(2);
    final StreamGraph streamGraph = env.getStreamGraph();
    streamGraph.setGlobalStreamExchangeMode(globalStreamExchangeMode);
    return streamGraph;
}
Also used : CoreMatchers.is(org.hamcrest.CoreMatchers.is) PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) ForwardPartitioner(org.apache.flink.streaming.runtime.partitioner.ForwardPartitioner) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) ResultPartitionType(org.apache.flink.runtime.io.network.partition.ResultPartitionType) Test(org.junit.Test) DataStream(org.apache.flink.streaming.api.datastream.DataStream) RescalePartitioner(org.apache.flink.streaming.runtime.partitioner.RescalePartitioner) Assert.assertThat(org.junit.Assert.assertThat) StreamExchangeMode(org.apache.flink.streaming.api.transformations.StreamExchangeMode) List(java.util.List) TestLogger(org.apache.flink.util.TestLogger) Assert.assertEquals(org.junit.Assert.assertEquals) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) DataStream(org.apache.flink.streaming.api.datastream.DataStream) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) ForwardPartitioner(org.apache.flink.streaming.runtime.partitioner.ForwardPartitioner) RescalePartitioner(org.apache.flink.streaming.runtime.partitioner.RescalePartitioner)

Example 12 with PartitionTransformation

use of org.apache.flink.streaming.api.transformations.PartitionTransformation in project flink by apache.

the class StreamGraphGeneratorTest method testResetBatchExchangeModeInStreamingExecution.

@Test
public void testResetBatchExchangeModeInStreamingExecution() {
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    DataStream<Integer> sourceDataStream = env.fromElements(1, 2, 3);
    PartitionTransformation<Integer> transformation = new PartitionTransformation<>(sourceDataStream.getTransformation(), new RebalancePartitioner<>(), StreamExchangeMode.BATCH);
    DataStream<Integer> partitionStream = new DataStream<>(env, transformation);
    partitionStream.map(value -> value).print();
    final StreamGraph streamGraph = env.getStreamGraph();
    Assertions.assertThat(streamGraph.getStreamEdges(1, 3)).hasSize(1).satisfies(e -> Assertions.assertThat(e.get(0).getExchangeMode()).isEqualTo(StreamExchangeMode.UNDEFINED));
}
Also used : Arrays(java.util.Arrays) Tuple2(org.apache.flink.api.java.tuple.Tuple2) BroadcastPartitioner(org.apache.flink.streaming.runtime.partitioner.BroadcastPartitioner) SlotSharingGroup(org.apache.flink.api.common.operators.SlotSharingGroup) KeyedBroadcastProcessFunction(org.apache.flink.streaming.api.functions.co.KeyedBroadcastProcessFunction) BasicTypeInfo(org.apache.flink.api.common.typeinfo.BasicTypeInfo) ShufflePartitioner(org.apache.flink.streaming.runtime.partitioner.ShufflePartitioner) ChainingStrategy(org.apache.flink.streaming.api.operators.ChainingStrategy) ResourceSpec(org.apache.flink.api.common.operators.ResourceSpec) ManagedMemoryUseCase(org.apache.flink.core.memory.ManagedMemoryUseCase) Map(java.util.Map) TestLogger(org.apache.flink.util.TestLogger) Function(org.apache.flink.api.common.functions.Function) Assertions(org.assertj.core.api.Assertions) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) CoMapFunction(org.apache.flink.streaming.api.functions.co.CoMapFunction) PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) SinkFunction(org.apache.flink.streaming.api.functions.sink.SinkFunction) StreamTask(org.apache.flink.streaming.runtime.tasks.StreamTask) Collection(java.util.Collection) ConnectedStreams(org.apache.flink.streaming.api.datastream.ConnectedStreams) TypeSafeMatcher(org.hamcrest.TypeSafeMatcher) ResourceProfile(org.apache.flink.runtime.clusterframework.types.ResourceProfile) GlobalPartitioner(org.apache.flink.streaming.runtime.partitioner.GlobalPartitioner) List(java.util.List) NoOpIntMap(org.apache.flink.streaming.util.NoOpIntMap) Matchers.equalTo(org.hamcrest.Matchers.equalTo) ExecutionConfig(org.apache.flink.api.common.ExecutionConfig) CheckpointConfig(org.apache.flink.streaming.api.environment.CheckpointConfig) Matchers.is(org.hamcrest.Matchers.is) OneInputStreamOperator(org.apache.flink.streaming.api.operators.OneInputStreamOperator) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) MultipleInputTransformation(org.apache.flink.streaming.api.transformations.MultipleInputTransformation) IterativeStream(org.apache.flink.streaming.api.datastream.IterativeStream) BroadcastStream(org.apache.flink.streaming.api.datastream.BroadcastStream) AbstractUdfStreamOperator(org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator) StreamOperatorFactory(org.apache.flink.streaming.api.operators.StreamOperatorFactory) Watermark(org.apache.flink.streaming.api.watermark.Watermark) SavepointConfigOptions(org.apache.flink.runtime.jobgraph.SavepointConfigOptions) HashMap(java.util.HashMap) MapStateDescriptor(org.apache.flink.api.common.state.MapStateDescriptor) ArrayList(java.util.ArrayList) StreamPartitioner(org.apache.flink.streaming.runtime.partitioner.StreamPartitioner) StreamRecord(org.apache.flink.streaming.runtime.streamrecord.StreamRecord) Assertions.assertThatThrownBy(org.assertj.core.api.Assertions.assertThatThrownBy) Collector(org.apache.flink.util.Collector) Matchers.iterableWithSize(org.hamcrest.Matchers.iterableWithSize) Output(org.apache.flink.streaming.api.operators.Output) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) TestExpandingSink(org.apache.flink.streaming.util.TestExpandingSink) RebalancePartitioner(org.apache.flink.streaming.runtime.partitioner.RebalancePartitioner) Description(org.hamcrest.Description) TwoInputStreamOperator(org.apache.flink.streaming.api.operators.TwoInputStreamOperator) DiscardingSink(org.apache.flink.streaming.api.functions.sink.DiscardingSink) Assert.assertNotNull(org.junit.Assert.assertNotNull) Configuration(org.apache.flink.configuration.Configuration) SingleOutputStreamOperator(org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator) Assert.assertTrue(org.junit.Assert.assertTrue) StreamOperatorParameters(org.apache.flink.streaming.api.operators.StreamOperatorParameters) Test(org.junit.Test) AbstractStreamOperator(org.apache.flink.streaming.api.operators.AbstractStreamOperator) DataStream(org.apache.flink.streaming.api.datastream.DataStream) StreamOperator(org.apache.flink.streaming.api.operators.StreamOperator) FeatureMatcher(org.hamcrest.FeatureMatcher) StreamExchangeMode(org.apache.flink.streaming.api.transformations.StreamExchangeMode) BroadcastProcessFunction(org.apache.flink.streaming.api.functions.co.BroadcastProcessFunction) Matcher(org.hamcrest.Matcher) Transformation(org.apache.flink.api.dag.Transformation) LatencyMarker(org.apache.flink.streaming.runtime.streamrecord.LatencyMarker) SavepointRestoreSettings(org.apache.flink.runtime.jobgraph.SavepointRestoreSettings) OutputTypeConfigurable(org.apache.flink.streaming.api.operators.OutputTypeConfigurable) StreamSource(org.apache.flink.streaming.api.operators.StreamSource) Collections(java.util.Collections) Assert.assertEquals(org.junit.Assert.assertEquals) DataStream(org.apache.flink.streaming.api.datastream.DataStream) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) Test(org.junit.Test)

Example 13 with PartitionTransformation

use of org.apache.flink.streaming.api.transformations.PartitionTransformation in project flink by apache.

the class BatchExecExchange method translateToPlanInternal.

@SuppressWarnings("unchecked")
@Override
protected Transformation<RowData> translateToPlanInternal(PlannerBase planner, ExecNodeConfig config) {
    final ExecEdge inputEdge = getInputEdges().get(0);
    final Transformation<RowData> inputTransform = (Transformation<RowData>) inputEdge.translateToPlan(planner);
    final RowType inputType = (RowType) inputEdge.getOutputType();
    boolean requireUndefinedExchangeMode = false;
    final StreamPartitioner<RowData> partitioner;
    final int parallelism;
    final InputProperty inputProperty = getInputProperties().get(0);
    final RequiredDistribution requiredDistribution = inputProperty.getRequiredDistribution();
    final InputProperty.DistributionType distributionType = requiredDistribution.getType();
    switch(distributionType) {
        case ANY:
            partitioner = null;
            parallelism = ExecutionConfig.PARALLELISM_DEFAULT;
            break;
        case BROADCAST:
            partitioner = new BroadcastPartitioner<>();
            parallelism = ExecutionConfig.PARALLELISM_DEFAULT;
            break;
        case SINGLETON:
            partitioner = new GlobalPartitioner<>();
            parallelism = 1;
            break;
        case HASH:
            partitioner = createHashPartitioner(((HashDistribution) requiredDistribution), inputType, config);
            parallelism = ExecutionConfig.PARALLELISM_DEFAULT;
            break;
        case KEEP_INPUT_AS_IS:
            KeepInputAsIsDistribution keepInputAsIsDistribution = (KeepInputAsIsDistribution) requiredDistribution;
            if (keepInputAsIsDistribution.isStrict()) {
                // explicitly use ForwardPartitioner to guarantee the data distribution is
                // exactly the same as input
                partitioner = new ForwardPartitioner<>();
                requireUndefinedExchangeMode = true;
            } else {
                RequiredDistribution inputDistribution = ((KeepInputAsIsDistribution) requiredDistribution).getInputDistribution();
                checkArgument(inputDistribution instanceof HashDistribution, "Only HashDistribution is supported now");
                partitioner = new ForwardForConsecutiveHashPartitioner<>(createHashPartitioner(((HashDistribution) inputDistribution), inputType, config));
            }
            parallelism = inputTransform.getParallelism();
            break;
        default:
            throw new TableException(distributionType + "is not supported now!");
    }
    final StreamExchangeMode exchangeMode = requireUndefinedExchangeMode ? StreamExchangeMode.UNDEFINED : getBatchStreamExchangeMode(config, requiredExchangeMode);
    final Transformation<RowData> transformation = new PartitionTransformation<>(inputTransform, partitioner, exchangeMode);
    transformation.setParallelism(parallelism);
    transformation.setOutputType(InternalTypeInfo.of(getOutputType()));
    return transformation;
}
Also used : RequiredDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.RequiredDistribution) PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) Transformation(org.apache.flink.api.dag.Transformation) TableException(org.apache.flink.table.api.TableException) ExecEdge(org.apache.flink.table.planner.plan.nodes.exec.ExecEdge) InputProperty(org.apache.flink.table.planner.plan.nodes.exec.InputProperty) RowType(org.apache.flink.table.types.logical.RowType) PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) HashDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution) RowData(org.apache.flink.table.data.RowData) KeepInputAsIsDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.KeepInputAsIsDistribution) StreamExchangeModeUtils.getBatchStreamExchangeMode(org.apache.flink.table.planner.utils.StreamExchangeModeUtils.getBatchStreamExchangeMode) StreamExchangeMode(org.apache.flink.streaming.api.transformations.StreamExchangeMode)

Aggregations

PartitionTransformation (org.apache.flink.streaming.api.transformations.PartitionTransformation)13 Transformation (org.apache.flink.api.dag.Transformation)10 ArrayList (java.util.ArrayList)7 List (java.util.List)7 StreamExchangeMode (org.apache.flink.streaming.api.transformations.StreamExchangeMode)7 Arrays (java.util.Arrays)6 Collections (java.util.Collections)6 HashMap (java.util.HashMap)6 Map (java.util.Map)6 ExecutionConfig (org.apache.flink.api.common.ExecutionConfig)6 ResourceSpec (org.apache.flink.api.common.operators.ResourceSpec)6 BasicTypeInfo (org.apache.flink.api.common.typeinfo.BasicTypeInfo)6 TypeInformation (org.apache.flink.api.common.typeinfo.TypeInformation)6 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)6 Configuration (org.apache.flink.configuration.Configuration)6 ManagedMemoryUseCase (org.apache.flink.core.memory.ManagedMemoryUseCase)6 ResourceProfile (org.apache.flink.runtime.clusterframework.types.ResourceProfile)6 ResultPartitionType (org.apache.flink.runtime.io.network.partition.ResultPartitionType)6 DataStream (org.apache.flink.streaming.api.datastream.DataStream)5 StreamExecutionEnvironment (org.apache.flink.streaming.api.environment.StreamExecutionEnvironment)5