Search in sources :

Example 6 with Transformation

use of org.apache.flink.api.dag.Transformation in project flink by apache.

the class UpsertKafkaDynamicTableFactoryTest method assertKafkaSource.

private void assertKafkaSource(ScanTableSource.ScanRuntimeProvider provider) {
    assertThat(provider, instanceOf(DataStreamScanProvider.class));
    final DataStreamScanProvider dataStreamScanProvider = (DataStreamScanProvider) provider;
    final Transformation<RowData> transformation = dataStreamScanProvider.produceDataStream(n -> Optional.empty(), StreamExecutionEnvironment.createLocalEnvironment()).getTransformation();
    assertThat(transformation, instanceOf(SourceTransformation.class));
    SourceTransformation<RowData, KafkaPartitionSplit, KafkaSourceEnumState> sourceTransformation = (SourceTransformation<RowData, KafkaPartitionSplit, KafkaSourceEnumState>) transformation;
    assertThat(sourceTransformation.getSource(), instanceOf(KafkaSource.class));
}
Also used : DataType(org.apache.flink.table.types.DataType) AtomicDataType(org.apache.flink.table.types.AtomicDataType) Arrays(java.util.Arrays) ResolvedSchema(org.apache.flink.table.catalog.ResolvedSchema) SourceTransformation(org.apache.flink.streaming.api.transformations.SourceTransformation) DataStreamScanProvider(org.apache.flink.table.connector.source.DataStreamScanProvider) CoreMatchers.instanceOf(org.hamcrest.CoreMatchers.instanceOf) DecodingFormat(org.apache.flink.table.connector.format.DecodingFormat) Map(java.util.Map) TestLogger(org.apache.flink.util.TestLogger) FactoryMocks.createTableSink(org.apache.flink.table.factories.utils.FactoryMocks.createTableSink) ConfluentRegistryAvroSerializationSchema(org.apache.flink.formats.avro.registry.confluent.ConfluentRegistryAvroSerializationSchema) DynamicTableSource(org.apache.flink.table.connector.source.DynamicTableSource) DynamicTableSink(org.apache.flink.table.connector.sink.DynamicTableSink) FlinkMatchers.containsCause(org.apache.flink.core.testutils.FlinkMatchers.containsCause) AVRO_CONFLUENT(org.apache.flink.streaming.connectors.kafka.table.KafkaConnectorOptionsUtil.AVRO_CONFLUENT) AvroRowDataSerializationSchema(org.apache.flink.formats.avro.AvroRowDataSerializationSchema) FactoryUtil(org.apache.flink.table.factories.FactoryUtil) DataStreamSinkProvider(org.apache.flink.table.connector.sink.DataStreamSinkProvider) ValidationException(org.apache.flink.table.api.ValidationException) Optional(java.util.Optional) ScanRuntimeProviderContext(org.apache.flink.table.runtime.connector.source.ScanRuntimeProviderContext) SerializationSchema(org.apache.flink.api.common.serialization.SerializationSchema) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) TestFormatFactory(org.apache.flink.table.factories.TestFormatFactory) DeliveryGuarantee(org.apache.flink.connector.base.DeliveryGuarantee) EncodingFormat(org.apache.flink.table.connector.format.EncodingFormat) Sink(org.apache.flink.api.connector.sink2.Sink) ChangelogMode(org.apache.flink.table.connector.ChangelogMode) StreamOperatorFactory(org.apache.flink.streaming.api.operators.StreamOperatorFactory) Column(org.apache.flink.table.catalog.Column) HashMap(java.util.HashMap) RowType(org.apache.flink.table.types.logical.RowType) ScanTableSource(org.apache.flink.table.connector.source.ScanTableSource) SinkV2Provider(org.apache.flink.table.connector.sink.SinkV2Provider) KafkaSink(org.apache.flink.connector.kafka.sink.KafkaSink) RowDataToAvroConverters(org.apache.flink.formats.avro.RowDataToAvroConverters) FactoryMocks.createTableSource(org.apache.flink.table.factories.utils.FactoryMocks.createTableSource) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) SinkWriterOperatorFactory(org.apache.flink.streaming.runtime.operators.sink.SinkWriterOperatorFactory) ExpectedException(org.junit.rules.ExpectedException) RowData(org.apache.flink.table.data.RowData) Properties(java.util.Properties) Assert.assertTrue(org.junit.Assert.assertTrue) DataTypes(org.apache.flink.table.api.DataTypes) VarCharType(org.apache.flink.table.types.logical.VarCharType) Test(org.junit.Test) BinaryRowData(org.apache.flink.table.data.binary.BinaryRowData) KafkaSourceEnumState(org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumState) DeserializationSchema(org.apache.flink.api.common.serialization.DeserializationSchema) Consumer(java.util.function.Consumer) StartupMode(org.apache.flink.streaming.connectors.kafka.config.StartupMode) Rule(org.junit.Rule) KafkaSource(org.apache.flink.connector.kafka.source.KafkaSource) UniqueConstraint(org.apache.flink.table.catalog.UniqueConstraint) SinkRuntimeProviderContext(org.apache.flink.table.runtime.connector.sink.SinkRuntimeProviderContext) FactoryMocks(org.apache.flink.table.factories.utils.FactoryMocks) KafkaPartitionSplit(org.apache.flink.connector.kafka.source.split.KafkaPartitionSplit) Transformation(org.apache.flink.api.dag.Transformation) Collections(java.util.Collections) Assert.assertEquals(org.junit.Assert.assertEquals) AvroSchemaConverter(org.apache.flink.formats.avro.typeutils.AvroSchemaConverter) KafkaPartitionSplit(org.apache.flink.connector.kafka.source.split.KafkaPartitionSplit) RowData(org.apache.flink.table.data.RowData) BinaryRowData(org.apache.flink.table.data.binary.BinaryRowData) KafkaSource(org.apache.flink.connector.kafka.source.KafkaSource) KafkaSourceEnumState(org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumState) DataStreamScanProvider(org.apache.flink.table.connector.source.DataStreamScanProvider) SourceTransformation(org.apache.flink.streaming.api.transformations.SourceTransformation)

Example 7 with Transformation

use of org.apache.flink.api.dag.Transformation in project flink by apache.

the class StreamExecDeduplicate method translateToPlanInternal.

@SuppressWarnings("unchecked")
@Override
protected Transformation<RowData> translateToPlanInternal(PlannerBase planner, ExecNodeConfig config) {
    final ExecEdge inputEdge = getInputEdges().get(0);
    final Transformation<RowData> inputTransform = (Transformation<RowData>) inputEdge.translateToPlan(planner);
    final RowType inputRowType = (RowType) inputEdge.getOutputType();
    final InternalTypeInfo<RowData> rowTypeInfo = (InternalTypeInfo<RowData>) inputTransform.getOutputType();
    final TypeSerializer<RowData> rowSerializer = rowTypeInfo.createSerializer(planner.getExecEnv().getConfig());
    final OneInputStreamOperator<RowData, RowData> operator;
    if (isRowtime) {
        operator = new RowtimeDeduplicateOperatorTranslator(config, rowTypeInfo, rowSerializer, inputRowType, keepLastRow, generateUpdateBefore).createDeduplicateOperator();
    } else {
        operator = new ProcTimeDeduplicateOperatorTranslator(config, rowTypeInfo, rowSerializer, inputRowType, keepLastRow, generateUpdateBefore).createDeduplicateOperator();
    }
    final OneInputTransformation<RowData, RowData> transform = ExecNodeUtil.createOneInputTransformation(inputTransform, createTransformationMeta(DEDUPLICATE_TRANSFORMATION, config), operator, rowTypeInfo, inputTransform.getParallelism());
    final RowDataKeySelector selector = KeySelectorUtil.getRowDataSelector(uniqueKeys, rowTypeInfo);
    transform.setStateKeySelector(selector);
    transform.setStateKeyType(selector.getProducedType());
    return transform;
}
Also used : OneInputTransformation(org.apache.flink.streaming.api.transformations.OneInputTransformation) Transformation(org.apache.flink.api.dag.Transformation) ExecEdge(org.apache.flink.table.planner.plan.nodes.exec.ExecEdge) RowType(org.apache.flink.table.types.logical.RowType) InternalTypeInfo(org.apache.flink.table.runtime.typeutils.InternalTypeInfo) RowData(org.apache.flink.table.data.RowData) RowDataKeySelector(org.apache.flink.table.runtime.keyselector.RowDataKeySelector)

Example 8 with Transformation

use of org.apache.flink.api.dag.Transformation in project flink by apache.

the class StreamExecExchange method translateToPlanInternal.

@SuppressWarnings("unchecked")
@Override
protected Transformation<RowData> translateToPlanInternal(PlannerBase planner, ExecNodeConfig config) {
    final Transformation<RowData> inputTransform = (Transformation<RowData>) getInputEdges().get(0).translateToPlan(planner);
    final StreamPartitioner<RowData> partitioner;
    final int parallelism;
    final InputProperty inputProperty = getInputProperties().get(0);
    final InputProperty.DistributionType distributionType = inputProperty.getRequiredDistribution().getType();
    switch(distributionType) {
        case SINGLETON:
            partitioner = new GlobalPartitioner<>();
            parallelism = 1;
            break;
        case HASH:
            // TODO Eliminate duplicate keys
            int[] keys = ((HashDistribution) inputProperty.getRequiredDistribution()).getKeys();
            InternalTypeInfo<RowData> inputType = (InternalTypeInfo<RowData>) inputTransform.getOutputType();
            RowDataKeySelector keySelector = KeySelectorUtil.getRowDataSelector(keys, inputType);
            partitioner = new KeyGroupStreamPartitioner<>(keySelector, DEFAULT_LOWER_BOUND_MAX_PARALLELISM);
            parallelism = ExecutionConfig.PARALLELISM_DEFAULT;
            break;
        default:
            throw new TableException(String.format("%s is not supported now!", distributionType));
    }
    final Transformation<RowData> transformation = new PartitionTransformation<>(inputTransform, partitioner);
    createTransformationMeta(EXCHANGE_TRANSFORMATION, config).fill(transformation);
    transformation.setParallelism(parallelism);
    transformation.setOutputType(InternalTypeInfo.of(getOutputType()));
    return transformation;
}
Also used : PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) Transformation(org.apache.flink.api.dag.Transformation) TableException(org.apache.flink.table.api.TableException) InputProperty(org.apache.flink.table.planner.plan.nodes.exec.InputProperty) PartitionTransformation(org.apache.flink.streaming.api.transformations.PartitionTransformation) HashDistribution(org.apache.flink.table.planner.plan.nodes.exec.InputProperty.HashDistribution) InternalTypeInfo(org.apache.flink.table.runtime.typeutils.InternalTypeInfo) RowData(org.apache.flink.table.data.RowData) RowDataKeySelector(org.apache.flink.table.runtime.keyselector.RowDataKeySelector)

Example 9 with Transformation

use of org.apache.flink.api.dag.Transformation in project flink by apache.

the class StreamExecGlobalGroupAggregate method translateToPlanInternal.

@SuppressWarnings("unchecked")
@Override
protected Transformation<RowData> translateToPlanInternal(PlannerBase planner, ExecNodeConfig config) {
    if (grouping.length > 0 && config.getStateRetentionTime() < 0) {
        LOG.warn("No state retention interval configured for a query which accumulates state. " + "Please provide a query configuration with valid retention interval to prevent excessive " + "state size. You may specify a retention time of 0 to not clean up the state.");
    }
    final ExecEdge inputEdge = getInputEdges().get(0);
    final Transformation<RowData> inputTransform = (Transformation<RowData>) inputEdge.translateToPlan(planner);
    final RowType inputRowType = (RowType) inputEdge.getOutputType();
    final AggregateInfoList localAggInfoList = AggregateUtil.transformToStreamAggregateInfoList(localAggInputRowType, JavaScalaConversionUtil.toScala(Arrays.asList(aggCalls)), aggCallNeedRetractions, needRetraction, JavaScalaConversionUtil.toScala(Optional.ofNullable(indexOfCountStar)), // isStateBackendDataViews
    false, // needDistinctInfo
    true);
    final AggregateInfoList globalAggInfoList = AggregateUtil.transformToStreamAggregateInfoList(localAggInputRowType, JavaScalaConversionUtil.toScala(Arrays.asList(aggCalls)), aggCallNeedRetractions, needRetraction, JavaScalaConversionUtil.toScala(Optional.ofNullable(indexOfCountStar)), // isStateBackendDataViews
    true, // needDistinctInfo
    true);
    final GeneratedAggsHandleFunction localAggsHandler = generateAggsHandler("LocalGroupAggsHandler", localAggInfoList, grouping.length, localAggInfoList.getAccTypes(), config, planner.getRelBuilder());
    final GeneratedAggsHandleFunction globalAggsHandler = generateAggsHandler("GlobalGroupAggsHandler", globalAggInfoList, // mergedAccOffset
    0, localAggInfoList.getAccTypes(), config, planner.getRelBuilder());
    final int indexOfCountStar = globalAggInfoList.getIndexOfCountStar();
    final LogicalType[] globalAccTypes = Arrays.stream(globalAggInfoList.getAccTypes()).map(LogicalTypeDataTypeConverter::fromDataTypeToLogicalType).toArray(LogicalType[]::new);
    final LogicalType[] globalAggValueTypes = Arrays.stream(globalAggInfoList.getActualValueTypes()).map(LogicalTypeDataTypeConverter::fromDataTypeToLogicalType).toArray(LogicalType[]::new);
    final GeneratedRecordEqualiser recordEqualiser = new EqualiserCodeGenerator(globalAggValueTypes).generateRecordEqualiser("GroupAggValueEqualiser");
    final OneInputStreamOperator<RowData, RowData> operator;
    final boolean isMiniBatchEnabled = config.get(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_ENABLED);
    if (isMiniBatchEnabled) {
        MiniBatchGlobalGroupAggFunction aggFunction = new MiniBatchGlobalGroupAggFunction(localAggsHandler, globalAggsHandler, recordEqualiser, globalAccTypes, indexOfCountStar, generateUpdateBefore, config.getStateRetentionTime());
        operator = new KeyedMapBundleOperator<>(aggFunction, AggregateUtil.createMiniBatchTrigger(config));
    } else {
        throw new TableException("Local-Global optimization is only worked in miniBatch mode");
    }
    // partitioned aggregation
    final OneInputTransformation<RowData, RowData> transform = ExecNodeUtil.createOneInputTransformation(inputTransform, createTransformationMeta(GLOBAL_GROUP_AGGREGATE_TRANSFORMATION, config), operator, InternalTypeInfo.of(getOutputType()), inputTransform.getParallelism());
    // set KeyType and Selector for state
    final RowDataKeySelector selector = KeySelectorUtil.getRowDataSelector(grouping, InternalTypeInfo.of(inputRowType));
    transform.setStateKeySelector(selector);
    transform.setStateKeyType(selector.getProducedType());
    return transform;
}
Also used : OneInputTransformation(org.apache.flink.streaming.api.transformations.OneInputTransformation) Transformation(org.apache.flink.api.dag.Transformation) TableException(org.apache.flink.table.api.TableException) AggregateInfoList(org.apache.flink.table.planner.plan.utils.AggregateInfoList) ExecEdge(org.apache.flink.table.planner.plan.nodes.exec.ExecEdge) RowType(org.apache.flink.table.types.logical.RowType) LogicalType(org.apache.flink.table.types.logical.LogicalType) GeneratedAggsHandleFunction(org.apache.flink.table.runtime.generated.GeneratedAggsHandleFunction) EqualiserCodeGenerator(org.apache.flink.table.planner.codegen.EqualiserCodeGenerator) GeneratedRecordEqualiser(org.apache.flink.table.runtime.generated.GeneratedRecordEqualiser) RowData(org.apache.flink.table.data.RowData) RowDataKeySelector(org.apache.flink.table.runtime.keyselector.RowDataKeySelector) MiniBatchGlobalGroupAggFunction(org.apache.flink.table.runtime.operators.aggregate.MiniBatchGlobalGroupAggFunction)

Example 10 with Transformation

use of org.apache.flink.api.dag.Transformation in project flink by apache.

the class StreamExecGroupAggregate method translateToPlanInternal.

@SuppressWarnings("unchecked")
@Override
protected Transformation<RowData> translateToPlanInternal(PlannerBase planner, ExecNodeConfig config) {
    if (grouping.length > 0 && config.getStateRetentionTime() < 0) {
        LOG.warn("No state retention interval configured for a query which accumulates state. " + "Please provide a query configuration with valid retention interval to prevent excessive " + "state size. You may specify a retention time of 0 to not clean up the state.");
    }
    final ExecEdge inputEdge = getInputEdges().get(0);
    final Transformation<RowData> inputTransform = (Transformation<RowData>) inputEdge.translateToPlan(planner);
    final RowType inputRowType = (RowType) inputEdge.getOutputType();
    final AggsHandlerCodeGenerator generator = new AggsHandlerCodeGenerator(new CodeGeneratorContext(config.getTableConfig()), planner.getRelBuilder(), JavaScalaConversionUtil.toScala(inputRowType.getChildren()), // TODO: but other operators do not copy this input field.....
    true).needAccumulate();
    if (needRetraction) {
        generator.needRetract();
    }
    final AggregateInfoList aggInfoList = AggregateUtil.transformToStreamAggregateInfoList(inputRowType, JavaScalaConversionUtil.toScala(Arrays.asList(aggCalls)), aggCallNeedRetractions, needRetraction, true, true);
    final GeneratedAggsHandleFunction aggsHandler = generator.generateAggsHandler("GroupAggsHandler", aggInfoList);
    final LogicalType[] accTypes = Arrays.stream(aggInfoList.getAccTypes()).map(LogicalTypeDataTypeConverter::fromDataTypeToLogicalType).toArray(LogicalType[]::new);
    final LogicalType[] aggValueTypes = Arrays.stream(aggInfoList.getActualValueTypes()).map(LogicalTypeDataTypeConverter::fromDataTypeToLogicalType).toArray(LogicalType[]::new);
    final GeneratedRecordEqualiser recordEqualiser = new EqualiserCodeGenerator(aggValueTypes).generateRecordEqualiser("GroupAggValueEqualiser");
    final int inputCountIndex = aggInfoList.getIndexOfCountStar();
    final boolean isMiniBatchEnabled = config.get(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_ENABLED);
    final OneInputStreamOperator<RowData, RowData> operator;
    if (isMiniBatchEnabled) {
        MiniBatchGroupAggFunction aggFunction = new MiniBatchGroupAggFunction(aggsHandler, recordEqualiser, accTypes, inputRowType, inputCountIndex, generateUpdateBefore, config.getStateRetentionTime());
        operator = new KeyedMapBundleOperator<>(aggFunction, AggregateUtil.createMiniBatchTrigger(config));
    } else {
        GroupAggFunction aggFunction = new GroupAggFunction(aggsHandler, recordEqualiser, accTypes, inputCountIndex, generateUpdateBefore, config.getStateRetentionTime());
        operator = new KeyedProcessOperator<>(aggFunction);
    }
    // partitioned aggregation
    final OneInputTransformation<RowData, RowData> transform = ExecNodeUtil.createOneInputTransformation(inputTransform, createTransformationMeta(GROUP_AGGREGATE_TRANSFORMATION, config), operator, InternalTypeInfo.of(getOutputType()), inputTransform.getParallelism());
    // set KeyType and Selector for state
    final RowDataKeySelector selector = KeySelectorUtil.getRowDataSelector(grouping, InternalTypeInfo.of(inputRowType));
    transform.setStateKeySelector(selector);
    transform.setStateKeyType(selector.getProducedType());
    return transform;
}
Also used : OneInputTransformation(org.apache.flink.streaming.api.transformations.OneInputTransformation) Transformation(org.apache.flink.api.dag.Transformation) AggregateInfoList(org.apache.flink.table.planner.plan.utils.AggregateInfoList) ExecEdge(org.apache.flink.table.planner.plan.nodes.exec.ExecEdge) CodeGeneratorContext(org.apache.flink.table.planner.codegen.CodeGeneratorContext) GroupAggFunction(org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction) MiniBatchGroupAggFunction(org.apache.flink.table.runtime.operators.aggregate.MiniBatchGroupAggFunction) MiniBatchGroupAggFunction(org.apache.flink.table.runtime.operators.aggregate.MiniBatchGroupAggFunction) RowType(org.apache.flink.table.types.logical.RowType) AggsHandlerCodeGenerator(org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator) LogicalType(org.apache.flink.table.types.logical.LogicalType) GeneratedAggsHandleFunction(org.apache.flink.table.runtime.generated.GeneratedAggsHandleFunction) EqualiserCodeGenerator(org.apache.flink.table.planner.codegen.EqualiserCodeGenerator) GeneratedRecordEqualiser(org.apache.flink.table.runtime.generated.GeneratedRecordEqualiser) RowData(org.apache.flink.table.data.RowData) RowDataKeySelector(org.apache.flink.table.runtime.keyselector.RowDataKeySelector)

Aggregations

Transformation (org.apache.flink.api.dag.Transformation)98 RowData (org.apache.flink.table.data.RowData)69 ExecEdge (org.apache.flink.table.planner.plan.nodes.exec.ExecEdge)53 RowType (org.apache.flink.table.types.logical.RowType)50 OneInputTransformation (org.apache.flink.streaming.api.transformations.OneInputTransformation)45 TableException (org.apache.flink.table.api.TableException)28 RowDataKeySelector (org.apache.flink.table.runtime.keyselector.RowDataKeySelector)28 ArrayList (java.util.ArrayList)25 CodeGeneratorContext (org.apache.flink.table.planner.codegen.CodeGeneratorContext)21 Configuration (org.apache.flink.configuration.Configuration)19 TwoInputTransformation (org.apache.flink.streaming.api.transformations.TwoInputTransformation)18 List (java.util.List)17 PartitionTransformation (org.apache.flink.streaming.api.transformations.PartitionTransformation)17 AggregateInfoList (org.apache.flink.table.planner.plan.utils.AggregateInfoList)17 LogicalType (org.apache.flink.table.types.logical.LogicalType)16 Test (org.junit.Test)16 StreamExecutionEnvironment (org.apache.flink.streaming.api.environment.StreamExecutionEnvironment)13 SourceTransformation (org.apache.flink.streaming.api.transformations.SourceTransformation)13 Arrays (java.util.Arrays)11 Collections (java.util.Collections)10