Search in sources :

Example 81 with Transformation

use of org.apache.flink.api.dag.Transformation in project flink by apache.

the class StreamExecDropUpdateBefore method translateToPlanInternal.

@SuppressWarnings("unchecked")
@Override
protected Transformation<RowData> translateToPlanInternal(PlannerBase planner, ExecNodeConfig config) {
    final Transformation<RowData> inputTransform = (Transformation<RowData>) getInputEdges().get(0).translateToPlan(planner);
    final StreamFilter<RowData> operator = new StreamFilter<>(new DropUpdateBeforeFunction());
    return ExecNodeUtil.createOneInputTransformation(inputTransform, createTransformationMeta(DROP_UPDATE_BEFORE_TRANSFORMATION, config), operator, inputTransform.getOutputType(), inputTransform.getParallelism());
}
Also used : RowData(org.apache.flink.table.data.RowData) Transformation(org.apache.flink.api.dag.Transformation) StreamFilter(org.apache.flink.streaming.api.operators.StreamFilter) DropUpdateBeforeFunction(org.apache.flink.table.runtime.operators.misc.DropUpdateBeforeFunction)

Example 82 with Transformation

use of org.apache.flink.api.dag.Transformation in project flink by apache.

the class StreamExecChangelogNormalize method translateToPlanInternal.

@SuppressWarnings("unchecked")
@Override
protected Transformation<RowData> translateToPlanInternal(PlannerBase planner, ExecNodeConfig config) {
    final ExecEdge inputEdge = getInputEdges().get(0);
    final Transformation<RowData> inputTransform = (Transformation<RowData>) inputEdge.translateToPlan(planner);
    final InternalTypeInfo<RowData> rowTypeInfo = (InternalTypeInfo<RowData>) inputTransform.getOutputType();
    final OneInputStreamOperator<RowData, RowData> operator;
    final long stateIdleTime = config.getStateRetentionTime();
    final boolean isMiniBatchEnabled = config.get(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_ENABLED);
    GeneratedRecordEqualiser generatedEqualiser = new EqualiserCodeGenerator(rowTypeInfo.toRowType()).generateRecordEqualiser("DeduplicateRowEqualiser");
    if (isMiniBatchEnabled) {
        TypeSerializer<RowData> rowSerializer = rowTypeInfo.createSerializer(planner.getExecEnv().getConfig());
        ProcTimeMiniBatchDeduplicateKeepLastRowFunction processFunction = new ProcTimeMiniBatchDeduplicateKeepLastRowFunction(rowTypeInfo, rowSerializer, stateIdleTime, generateUpdateBefore, // generateInsert
        true, // inputInsertOnly
        false, generatedEqualiser);
        CountBundleTrigger<RowData> trigger = AggregateUtil.createMiniBatchTrigger(config);
        operator = new KeyedMapBundleOperator<>(processFunction, trigger);
    } else {
        ProcTimeDeduplicateKeepLastRowFunction processFunction = new ProcTimeDeduplicateKeepLastRowFunction(rowTypeInfo, stateIdleTime, generateUpdateBefore, // generateInsert
        true, // inputInsertOnly
        false, generatedEqualiser);
        operator = new KeyedProcessOperator<>(processFunction);
    }
    final OneInputTransformation<RowData, RowData> transform = ExecNodeUtil.createOneInputTransformation(inputTransform, createTransformationMeta(CHANGELOG_NORMALIZE_TRANSFORMATION, config), operator, rowTypeInfo, inputTransform.getParallelism());
    final RowDataKeySelector selector = KeySelectorUtil.getRowDataSelector(uniqueKeys, rowTypeInfo);
    transform.setStateKeySelector(selector);
    transform.setStateKeyType(selector.getProducedType());
    return transform;
}
Also used : OneInputTransformation(org.apache.flink.streaming.api.transformations.OneInputTransformation) Transformation(org.apache.flink.api.dag.Transformation) ExecEdge(org.apache.flink.table.planner.plan.nodes.exec.ExecEdge) InternalTypeInfo(org.apache.flink.table.runtime.typeutils.InternalTypeInfo) EqualiserCodeGenerator(org.apache.flink.table.planner.codegen.EqualiserCodeGenerator) GeneratedRecordEqualiser(org.apache.flink.table.runtime.generated.GeneratedRecordEqualiser) RowData(org.apache.flink.table.data.RowData) ProcTimeMiniBatchDeduplicateKeepLastRowFunction(org.apache.flink.table.runtime.operators.deduplicate.ProcTimeMiniBatchDeduplicateKeepLastRowFunction) RowDataKeySelector(org.apache.flink.table.runtime.keyselector.RowDataKeySelector) ProcTimeDeduplicateKeepLastRowFunction(org.apache.flink.table.runtime.operators.deduplicate.ProcTimeDeduplicateKeepLastRowFunction)

Example 83 with Transformation

use of org.apache.flink.api.dag.Transformation in project flink by apache.

the class InternalConfigOptionsTest method testTranslateExecNodeGraphWithInternalTemporalConf.

@Test
public void testTranslateExecNodeGraphWithInternalTemporalConf() {
    Table table = tEnv.sqlQuery("SELECT LOCALTIME, LOCALTIMESTAMP, CURRENT_TIME, CURRENT_TIMESTAMP");
    RelNode relNode = planner.optimize(TableTestUtil.toRelNode(table));
    ExecNodeGraph execNodeGraph = planner.translateToExecNodeGraph(toScala(Collections.singletonList(relNode)));
    // PlannerBase#translateToExecNodeGraph will set internal temporal configurations and
    // cleanup them after translate finished
    List<Transformation<?>> transformation = planner.translateToPlan(execNodeGraph);
    // check the translation success
    Assert.assertEquals(1, transformation.size());
}
Also used : ExecNodeGraph(org.apache.flink.table.planner.plan.nodes.exec.ExecNodeGraph) Transformation(org.apache.flink.api.dag.Transformation) Table(org.apache.flink.table.api.Table) RelNode(org.apache.calcite.rel.RelNode) Test(org.junit.Test)

Example 84 with Transformation

use of org.apache.flink.api.dag.Transformation in project flink by apache.

the class StreamExecPythonGroupAggregate method translateToPlanInternal.

@SuppressWarnings("unchecked")
@Override
protected Transformation<RowData> translateToPlanInternal(PlannerBase planner, ExecNodeConfig config) {
    if (grouping.length > 0 && config.getStateRetentionTime() < 0) {
        LOG.warn("No state retention interval configured for a query which accumulates state. " + "Please provide a query configuration with valid retention interval " + "to prevent excessive state size. You may specify a retention time " + "of 0 to not clean up the state.");
    }
    final ExecEdge inputEdge = getInputEdges().get(0);
    final Transformation<RowData> inputTransform = (Transformation<RowData>) inputEdge.translateToPlan(planner);
    final RowType inputRowType = (RowType) inputEdge.getOutputType();
    final AggregateInfoList aggInfoList = AggregateUtil.transformToStreamAggregateInfoList(inputRowType, JavaScalaConversionUtil.toScala(Arrays.asList(aggCalls)), aggCallNeedRetractions, needRetraction, // isStateBackendDataViews
    true, // needDistinctInfo
    true);
    final int inputCountIndex = aggInfoList.getIndexOfCountStar();
    final boolean countStarInserted = aggInfoList.countStarInserted();
    Tuple2<PythonAggregateFunctionInfo[], DataViewSpec[][]> aggInfosAndDataViewSpecs = CommonPythonUtil.extractPythonAggregateFunctionInfos(aggInfoList, aggCalls);
    PythonAggregateFunctionInfo[] pythonFunctionInfos = aggInfosAndDataViewSpecs.f0;
    DataViewSpec[][] dataViewSpecs = aggInfosAndDataViewSpecs.f1;
    Configuration pythonConfig = CommonPythonUtil.getMergedConfig(planner.getExecEnv(), config.getTableConfig());
    final OneInputStreamOperator<RowData, RowData> operator = getPythonAggregateFunctionOperator(pythonConfig, inputRowType, InternalTypeInfo.of(getOutputType()).toRowType(), pythonFunctionInfos, dataViewSpecs, config.getStateRetentionTime(), config.getMaxIdleStateRetentionTime(), inputCountIndex, countStarInserted);
    // partitioned aggregation
    OneInputTransformation<RowData, RowData> transform = ExecNodeUtil.createOneInputTransformation(inputTransform, createTransformationName(config), createTransformationDescription(config), operator, InternalTypeInfo.of(getOutputType()), inputTransform.getParallelism());
    if (CommonPythonUtil.isPythonWorkerUsingManagedMemory(pythonConfig)) {
        transform.declareManagedMemoryUseCaseAtSlotScope(ManagedMemoryUseCase.PYTHON);
    }
    // set KeyType and Selector for state
    final RowDataKeySelector selector = KeySelectorUtil.getRowDataSelector(grouping, InternalTypeInfo.of(inputRowType));
    transform.setStateKeySelector(selector);
    transform.setStateKeyType(selector.getProducedType());
    return transform;
}
Also used : OneInputTransformation(org.apache.flink.streaming.api.transformations.OneInputTransformation) Transformation(org.apache.flink.api.dag.Transformation) AggregateInfoList(org.apache.flink.table.planner.plan.utils.AggregateInfoList) ExecEdge(org.apache.flink.table.planner.plan.nodes.exec.ExecEdge) DataViewSpec(org.apache.flink.table.runtime.dataview.DataViewSpec) Configuration(org.apache.flink.configuration.Configuration) RowType(org.apache.flink.table.types.logical.RowType) RowData(org.apache.flink.table.data.RowData) PythonAggregateFunctionInfo(org.apache.flink.table.functions.python.PythonAggregateFunctionInfo) RowDataKeySelector(org.apache.flink.table.runtime.keyselector.RowDataKeySelector)

Example 85 with Transformation

use of org.apache.flink.api.dag.Transformation in project flink by apache.

the class StreamExecPythonGroupWindowAggregate method translateToPlanInternal.

@SuppressWarnings("unchecked")
@Override
protected Transformation<RowData> translateToPlanInternal(PlannerBase planner, ExecNodeConfig config) {
    final boolean isCountWindow;
    if (window instanceof TumblingGroupWindow) {
        isCountWindow = hasRowIntervalType(((TumblingGroupWindow) window).size());
    } else if (window instanceof SlidingGroupWindow) {
        isCountWindow = hasRowIntervalType(((SlidingGroupWindow) window).size());
    } else {
        isCountWindow = false;
    }
    if (isCountWindow && grouping.length > 0 && config.getStateRetentionTime() < 0) {
        LOGGER.warn("No state retention interval configured for a query which accumulates state." + " Please provide a query configuration with valid retention interval to" + " prevent excessive state size. You may specify a retention time of 0 to" + " not clean up the state.");
    }
    final ExecEdge inputEdge = getInputEdges().get(0);
    final Transformation<RowData> inputTransform = (Transformation<RowData>) inputEdge.translateToPlan(planner);
    final RowType inputRowType = (RowType) inputEdge.getOutputType();
    final RowType outputRowType = InternalTypeInfo.of(getOutputType()).toRowType();
    final int inputTimeFieldIndex;
    if (isRowtimeAttribute(window.timeAttribute())) {
        inputTimeFieldIndex = timeFieldIndex(FlinkTypeFactory.INSTANCE().buildRelNodeRowType(inputRowType), planner.getRelBuilder(), window.timeAttribute());
        if (inputTimeFieldIndex < 0) {
            throw new TableException("Group window must defined on a time attribute, " + "but the time attribute can't be found.\n" + "This should never happen. Please file an issue.");
        }
    } else {
        inputTimeFieldIndex = -1;
    }
    final ZoneId shiftTimeZone = TimeWindowUtil.getShiftTimeZone(window.timeAttribute().getOutputDataType().getLogicalType(), config.getLocalTimeZone());
    Tuple2<WindowAssigner<?>, Trigger<?>> windowAssignerAndTrigger = generateWindowAssignerAndTrigger();
    WindowAssigner<?> windowAssigner = windowAssignerAndTrigger.f0;
    Trigger<?> trigger = windowAssignerAndTrigger.f1;
    Configuration pythonConfig = CommonPythonUtil.getMergedConfig(planner.getExecEnv(), config.getTableConfig());
    boolean isGeneralPythonUDAF = Arrays.stream(aggCalls).anyMatch(x -> PythonUtil.isPythonAggregate(x, PythonFunctionKind.GENERAL));
    OneInputTransformation<RowData, RowData> transform;
    WindowEmitStrategy emitStrategy = WindowEmitStrategy.apply(config, window);
    if (isGeneralPythonUDAF) {
        final boolean[] aggCallNeedRetractions = new boolean[aggCalls.length];
        Arrays.fill(aggCallNeedRetractions, needRetraction);
        final AggregateInfoList aggInfoList = transformToStreamAggregateInfoList(inputRowType, JavaScalaConversionUtil.toScala(Arrays.asList(aggCalls)), aggCallNeedRetractions, needRetraction, true, true);
        transform = createGeneralPythonStreamWindowGroupOneInputTransformation(inputTransform, inputRowType, outputRowType, inputTimeFieldIndex, windowAssigner, aggInfoList, emitStrategy.getAllowLateness(), pythonConfig, shiftTimeZone);
    } else {
        transform = createPandasPythonStreamWindowGroupOneInputTransformation(inputTransform, inputRowType, outputRowType, inputTimeFieldIndex, windowAssigner, trigger, emitStrategy.getAllowLateness(), pythonConfig, config, shiftTimeZone);
    }
    if (CommonPythonUtil.isPythonWorkerUsingManagedMemory(pythonConfig)) {
        transform.declareManagedMemoryUseCaseAtSlotScope(ManagedMemoryUseCase.PYTHON);
    }
    // set KeyType and Selector for state
    final RowDataKeySelector selector = KeySelectorUtil.getRowDataSelector(grouping, InternalTypeInfo.of(inputRowType));
    transform.setStateKeySelector(selector);
    transform.setStateKeyType(selector.getProducedType());
    return transform;
}
Also used : OneInputTransformation(org.apache.flink.streaming.api.transformations.OneInputTransformation) Transformation(org.apache.flink.api.dag.Transformation) TableException(org.apache.flink.table.api.TableException) AggregateInfoList(org.apache.flink.table.planner.plan.utils.AggregateInfoList) AggregateUtil.transformToStreamAggregateInfoList(org.apache.flink.table.planner.plan.utils.AggregateUtil.transformToStreamAggregateInfoList) ExecEdge(org.apache.flink.table.planner.plan.nodes.exec.ExecEdge) ZoneId(java.time.ZoneId) Configuration(org.apache.flink.configuration.Configuration) RowType(org.apache.flink.table.types.logical.RowType) RowData(org.apache.flink.table.data.RowData) SlidingWindowAssigner(org.apache.flink.table.runtime.operators.window.assigners.SlidingWindowAssigner) CountTumblingWindowAssigner(org.apache.flink.table.runtime.operators.window.assigners.CountTumblingWindowAssigner) WindowAssigner(org.apache.flink.table.runtime.operators.window.assigners.WindowAssigner) TumblingWindowAssigner(org.apache.flink.table.runtime.operators.window.assigners.TumblingWindowAssigner) CountSlidingWindowAssigner(org.apache.flink.table.runtime.operators.window.assigners.CountSlidingWindowAssigner) SessionWindowAssigner(org.apache.flink.table.runtime.operators.window.assigners.SessionWindowAssigner) TumblingGroupWindow(org.apache.flink.table.planner.plan.logical.TumblingGroupWindow) Trigger(org.apache.flink.table.runtime.operators.window.triggers.Trigger) WindowEmitStrategy(org.apache.flink.table.planner.plan.utils.WindowEmitStrategy) RowDataKeySelector(org.apache.flink.table.runtime.keyselector.RowDataKeySelector) SlidingGroupWindow(org.apache.flink.table.planner.plan.logical.SlidingGroupWindow)

Aggregations

Transformation (org.apache.flink.api.dag.Transformation)98 RowData (org.apache.flink.table.data.RowData)69 ExecEdge (org.apache.flink.table.planner.plan.nodes.exec.ExecEdge)53 RowType (org.apache.flink.table.types.logical.RowType)50 OneInputTransformation (org.apache.flink.streaming.api.transformations.OneInputTransformation)45 TableException (org.apache.flink.table.api.TableException)28 RowDataKeySelector (org.apache.flink.table.runtime.keyselector.RowDataKeySelector)28 ArrayList (java.util.ArrayList)25 CodeGeneratorContext (org.apache.flink.table.planner.codegen.CodeGeneratorContext)21 Configuration (org.apache.flink.configuration.Configuration)19 TwoInputTransformation (org.apache.flink.streaming.api.transformations.TwoInputTransformation)18 List (java.util.List)17 PartitionTransformation (org.apache.flink.streaming.api.transformations.PartitionTransformation)17 AggregateInfoList (org.apache.flink.table.planner.plan.utils.AggregateInfoList)17 LogicalType (org.apache.flink.table.types.logical.LogicalType)16 Test (org.junit.Test)16 StreamExecutionEnvironment (org.apache.flink.streaming.api.environment.StreamExecutionEnvironment)13 SourceTransformation (org.apache.flink.streaming.api.transformations.SourceTransformation)13 Arrays (java.util.Arrays)11 Collections (java.util.Collections)10