Search in sources :

Example 6 with PythonFunctionInfo

use of org.apache.flink.table.functions.python.PythonFunctionInfo in project flink by apache.

the class BatchExecPythonOverAggregate method getPythonOverWindowAggregateFunctionOperator.

@SuppressWarnings("unchecked")
private OneInputStreamOperator<RowData, RowData> getPythonOverWindowAggregateFunctionOperator(ExecNodeConfig config, Configuration pythonConfig, RowType inputRowType, RowType outputRowType, boolean[] isRangeWindows, int[] udafInputOffsets, PythonFunctionInfo[] pythonFunctionInfos) {
    Class<?> clazz = CommonPythonUtil.loadClass(ARROW_PYTHON_OVER_WINDOW_AGGREGATE_FUNCTION_OPERATOR_NAME);
    RowType udfInputType = (RowType) Projection.of(udafInputOffsets).project(inputRowType);
    RowType udfOutputType = (RowType) Projection.range(inputRowType.getFieldCount(), outputRowType.getFieldCount()).project(outputRowType);
    PartitionSpec partitionSpec = overSpec.getPartition();
    List<OverSpec.GroupSpec> groups = overSpec.getGroups();
    SortSpec sortSpec = groups.get(groups.size() - 1).getSort();
    try {
        Constructor<?> ctor = clazz.getConstructor(Configuration.class, PythonFunctionInfo[].class, RowType.class, RowType.class, RowType.class, long[].class, long[].class, boolean[].class, int[].class, int.class, boolean.class, GeneratedProjection.class, GeneratedProjection.class, GeneratedProjection.class);
        return (OneInputStreamOperator<RowData, RowData>) ctor.newInstance(pythonConfig, pythonFunctionInfos, inputRowType, udfInputType, udfOutputType, lowerBoundary.stream().mapToLong(i -> i).toArray(), upperBoundary.stream().mapToLong(i -> i).toArray(), isRangeWindows, aggWindowIndex.stream().mapToInt(i -> i).toArray(), sortSpec.getFieldIndices()[0], sortSpec.getAscendingOrders()[0], ProjectionCodeGenerator.generateProjection(CodeGeneratorContext.apply(config.getTableConfig()), "UdafInputProjection", inputRowType, udfInputType, udafInputOffsets), ProjectionCodeGenerator.generateProjection(CodeGeneratorContext.apply(config.getTableConfig()), "GroupKey", inputRowType, (RowType) Projection.of(partitionSpec.getFieldIndices()).project(inputRowType), partitionSpec.getFieldIndices()), ProjectionCodeGenerator.generateProjection(CodeGeneratorContext.apply(config.getTableConfig()), "GroupSet", inputRowType, (RowType) Projection.of(partitionSpec.getFieldIndices()).project(inputRowType), partitionSpec.getFieldIndices()));
    } catch (NoSuchMethodException | InstantiationException | IllegalAccessException | InvocationTargetException e) {
        throw new TableException("Python BatchArrowPythonOverWindowAggregateFunctionOperator constructed failed.", e);
    }
}
Also used : OverAggregateUtil(org.apache.flink.table.planner.plan.utils.OverAggregateUtil) InputProperty(org.apache.flink.table.planner.plan.nodes.exec.InputProperty) Tuple2(org.apache.flink.api.java.tuple.Tuple2) RowType(org.apache.flink.table.types.logical.RowType) Constructor(java.lang.reflect.Constructor) ExecNode(org.apache.flink.table.planner.plan.nodes.exec.ExecNode) ArrayList(java.util.ArrayList) ExecNodeUtil(org.apache.flink.table.planner.plan.nodes.exec.utils.ExecNodeUtil) ManagedMemoryUseCase(org.apache.flink.core.memory.ManagedMemoryUseCase) PartitionSpec(org.apache.flink.table.planner.plan.nodes.exec.spec.PartitionSpec) CodeGeneratorContext(org.apache.flink.table.planner.codegen.CodeGeneratorContext) Projection(org.apache.flink.table.connector.Projection) ProjectionCodeGenerator(org.apache.flink.table.planner.codegen.ProjectionCodeGenerator) ExecNodeContext(org.apache.flink.table.planner.plan.nodes.exec.ExecNodeContext) RowData(org.apache.flink.table.data.RowData) PlannerBase(org.apache.flink.table.planner.delegation.PlannerBase) CommonPythonUtil(org.apache.flink.table.planner.plan.nodes.exec.utils.CommonPythonUtil) ExecNodeConfig(org.apache.flink.table.planner.plan.nodes.exec.ExecNodeConfig) Configuration(org.apache.flink.configuration.Configuration) TableException(org.apache.flink.table.api.TableException) PythonFunctionInfo(org.apache.flink.table.functions.python.PythonFunctionInfo) OverSpec(org.apache.flink.table.planner.plan.nodes.exec.spec.OverSpec) OneInputTransformation(org.apache.flink.streaming.api.transformations.OneInputTransformation) InvocationTargetException(java.lang.reflect.InvocationTargetException) List(java.util.List) InternalTypeInfo(org.apache.flink.table.runtime.typeutils.InternalTypeInfo) ExecEdge(org.apache.flink.table.planner.plan.nodes.exec.ExecEdge) AggregateCall(org.apache.calcite.rel.core.AggregateCall) GeneratedProjection(org.apache.flink.table.runtime.generated.GeneratedProjection) Transformation(org.apache.flink.api.dag.Transformation) OneInputStreamOperator(org.apache.flink.streaming.api.operators.OneInputStreamOperator) SortSpec(org.apache.flink.table.planner.plan.nodes.exec.spec.SortSpec) PythonFunctionInfo(org.apache.flink.table.functions.python.PythonFunctionInfo) TableException(org.apache.flink.table.api.TableException) RowType(org.apache.flink.table.types.logical.RowType) PartitionSpec(org.apache.flink.table.planner.plan.nodes.exec.spec.PartitionSpec) InvocationTargetException(java.lang.reflect.InvocationTargetException) OneInputStreamOperator(org.apache.flink.streaming.api.operators.OneInputStreamOperator) SortSpec(org.apache.flink.table.planner.plan.nodes.exec.spec.SortSpec)

Example 7 with PythonFunctionInfo

use of org.apache.flink.table.functions.python.PythonFunctionInfo in project flink by apache.

the class CommonExecPythonCorrelate method extractPythonTableFunctionInfo.

private Tuple2<int[], PythonFunctionInfo> extractPythonTableFunctionInfo() {
    LinkedHashMap<RexNode, Integer> inputNodes = new LinkedHashMap<>();
    PythonFunctionInfo pythonTableFunctionInfo = CommonPythonUtil.createPythonFunctionInfo(invocation, inputNodes);
    int[] udtfInputOffsets = inputNodes.keySet().stream().filter(x -> x instanceof RexInputRef).map(x -> ((RexInputRef) x).getIndex()).mapToInt(i -> i).toArray();
    return Tuple2.of(udtfInputOffsets, pythonTableFunctionInfo);
}
Also used : InputProperty(org.apache.flink.table.planner.plan.nodes.exec.InputProperty) Tuple2(org.apache.flink.api.java.tuple.Tuple2) RowType(org.apache.flink.table.types.logical.RowType) Constructor(java.lang.reflect.Constructor) ExecNode(org.apache.flink.table.planner.plan.nodes.exec.ExecNode) LinkedHashMap(java.util.LinkedHashMap) ExecNodeUtil(org.apache.flink.table.planner.plan.nodes.exec.utils.ExecNodeUtil) RexNode(org.apache.calcite.rex.RexNode) ManagedMemoryUseCase(org.apache.flink.core.memory.ManagedMemoryUseCase) CodeGeneratorContext(org.apache.flink.table.planner.codegen.CodeGeneratorContext) FlinkJoinType(org.apache.flink.table.runtime.operators.join.FlinkJoinType) Projection(org.apache.flink.table.connector.Projection) ProjectionCodeGenerator(org.apache.flink.table.planner.codegen.ProjectionCodeGenerator) ExecNodeContext(org.apache.flink.table.planner.plan.nodes.exec.ExecNodeContext) RowData(org.apache.flink.table.data.RowData) PlannerBase(org.apache.flink.table.planner.delegation.PlannerBase) CommonPythonUtil(org.apache.flink.table.planner.plan.nodes.exec.utils.CommonPythonUtil) SingleTransformationTranslator(org.apache.flink.table.planner.plan.nodes.exec.SingleTransformationTranslator) ExecNodeConfig(org.apache.flink.table.planner.plan.nodes.exec.ExecNodeConfig) Configuration(org.apache.flink.configuration.Configuration) TableException(org.apache.flink.table.api.TableException) PythonFunctionInfo(org.apache.flink.table.functions.python.PythonFunctionInfo) OneInputTransformation(org.apache.flink.streaming.api.transformations.OneInputTransformation) RexInputRef(org.apache.calcite.rex.RexInputRef) List(java.util.List) InternalTypeInfo(org.apache.flink.table.runtime.typeutils.InternalTypeInfo) ExecEdge(org.apache.flink.table.planner.plan.nodes.exec.ExecEdge) Preconditions.checkArgument(org.apache.flink.util.Preconditions.checkArgument) ExecNodeBase(org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase) GeneratedProjection(org.apache.flink.table.runtime.generated.GeneratedProjection) Transformation(org.apache.flink.api.dag.Transformation) OneInputStreamOperator(org.apache.flink.streaming.api.operators.OneInputStreamOperator) RexCall(org.apache.calcite.rex.RexCall) PythonFunctionInfo(org.apache.flink.table.functions.python.PythonFunctionInfo) RexInputRef(org.apache.calcite.rex.RexInputRef) RexNode(org.apache.calcite.rex.RexNode) LinkedHashMap(java.util.LinkedHashMap)

Example 8 with PythonFunctionInfo

use of org.apache.flink.table.functions.python.PythonFunctionInfo in project flink by apache.

the class BatchExecPythonGroupWindowAggregate method createPythonOneInputTransformation.

private OneInputTransformation<RowData, RowData> createPythonOneInputTransformation(Transformation<RowData> inputTransform, RowType inputRowType, RowType outputRowType, int maxLimitSize, long windowSize, long slideSize, Configuration pythonConfig, ExecNodeConfig config) {
    int[] namePropertyTypeArray = Arrays.stream(namedWindowProperties).mapToInt(p -> {
        WindowProperty property = p.getProperty();
        if (property instanceof WindowStart) {
            return 0;
        }
        if (property instanceof WindowEnd) {
            return 1;
        }
        if (property instanceof RowtimeAttribute) {
            return 2;
        }
        throw new TableException("Unexpected property " + property);
    }).toArray();
    Tuple2<int[], PythonFunctionInfo[]> aggInfos = CommonPythonUtil.extractPythonAggregateFunctionInfosFromAggregateCall(aggCalls);
    int[] pythonUdafInputOffsets = aggInfos.f0;
    PythonFunctionInfo[] pythonFunctionInfos = aggInfos.f1;
    OneInputStreamOperator<RowData, RowData> pythonOperator = getPythonGroupWindowAggregateFunctionOperator(config, pythonConfig, inputRowType, outputRowType, maxLimitSize, windowSize, slideSize, namePropertyTypeArray, pythonUdafInputOffsets, pythonFunctionInfos);
    return ExecNodeUtil.createOneInputTransformation(inputTransform, createTransformationName(config), createTransformationDescription(config), pythonOperator, InternalTypeInfo.of(outputRowType), inputTransform.getParallelism());
}
Also used : Arrays(java.util.Arrays) InputProperty(org.apache.flink.table.planner.plan.nodes.exec.InputProperty) Tuple2(org.apache.flink.api.java.tuple.Tuple2) RowtimeAttribute(org.apache.flink.table.runtime.groupwindow.RowtimeAttribute) RowType(org.apache.flink.table.types.logical.RowType) Constructor(java.lang.reflect.Constructor) ExecNode(org.apache.flink.table.planner.plan.nodes.exec.ExecNode) ExecNodeUtil(org.apache.flink.table.planner.plan.nodes.exec.utils.ExecNodeUtil) WindowEnd(org.apache.flink.table.runtime.groupwindow.WindowEnd) ManagedMemoryUseCase(org.apache.flink.core.memory.ManagedMemoryUseCase) CodeGeneratorContext(org.apache.flink.table.planner.codegen.CodeGeneratorContext) Projection(org.apache.flink.table.connector.Projection) ProjectionCodeGenerator(org.apache.flink.table.planner.codegen.ProjectionCodeGenerator) WindowCodeGenerator(org.apache.flink.table.planner.codegen.agg.batch.WindowCodeGenerator) ExecNodeContext(org.apache.flink.table.planner.plan.nodes.exec.ExecNodeContext) WindowStart(org.apache.flink.table.runtime.groupwindow.WindowStart) RowData(org.apache.flink.table.data.RowData) PlannerBase(org.apache.flink.table.planner.delegation.PlannerBase) CommonPythonUtil(org.apache.flink.table.planner.plan.nodes.exec.utils.CommonPythonUtil) SingleTransformationTranslator(org.apache.flink.table.planner.plan.nodes.exec.SingleTransformationTranslator) ExecNodeConfig(org.apache.flink.table.planner.plan.nodes.exec.ExecNodeConfig) Configuration(org.apache.flink.configuration.Configuration) TableException(org.apache.flink.table.api.TableException) PythonFunctionInfo(org.apache.flink.table.functions.python.PythonFunctionInfo) OneInputTransformation(org.apache.flink.streaming.api.transformations.OneInputTransformation) InvocationTargetException(java.lang.reflect.InvocationTargetException) InternalTypeInfo(org.apache.flink.table.runtime.typeutils.InternalTypeInfo) ExecEdge(org.apache.flink.table.planner.plan.nodes.exec.ExecEdge) LogicalWindow(org.apache.flink.table.planner.plan.logical.LogicalWindow) AggregateCall(org.apache.calcite.rel.core.AggregateCall) ExecNodeBase(org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase) GeneratedProjection(org.apache.flink.table.runtime.generated.GeneratedProjection) Transformation(org.apache.flink.api.dag.Transformation) OneInputStreamOperator(org.apache.flink.streaming.api.operators.OneInputStreamOperator) ExecutionConfigOptions(org.apache.flink.table.api.config.ExecutionConfigOptions) WindowProperty(org.apache.flink.table.runtime.groupwindow.WindowProperty) Collections(java.util.Collections) NamedWindowProperty(org.apache.flink.table.runtime.groupwindow.NamedWindowProperty) RowtimeAttribute(org.apache.flink.table.runtime.groupwindow.RowtimeAttribute) PythonFunctionInfo(org.apache.flink.table.functions.python.PythonFunctionInfo) WindowProperty(org.apache.flink.table.runtime.groupwindow.WindowProperty) NamedWindowProperty(org.apache.flink.table.runtime.groupwindow.NamedWindowProperty) TableException(org.apache.flink.table.api.TableException) RowData(org.apache.flink.table.data.RowData) WindowStart(org.apache.flink.table.runtime.groupwindow.WindowStart) WindowEnd(org.apache.flink.table.runtime.groupwindow.WindowEnd)

Example 9 with PythonFunctionInfo

use of org.apache.flink.table.functions.python.PythonFunctionInfo in project flink by apache.

the class CommonExecPythonCalc method getPythonScalarFunctionOperator.

@SuppressWarnings("unchecked")
private OneInputStreamOperator<RowData, RowData> getPythonScalarFunctionOperator(ExecNodeConfig config, Configuration pythonConfig, InternalTypeInfo<RowData> inputRowTypeInfo, InternalTypeInfo<RowData> outputRowTypeInfo, int[] udfInputOffsets, PythonFunctionInfo[] pythonFunctionInfos, int[] forwardedFields, boolean isArrow) {
    Class<?> clazz;
    boolean isInProcessMode = CommonPythonUtil.isPythonWorkerInProcessMode(pythonConfig);
    if (isArrow) {
        clazz = CommonPythonUtil.loadClass(ARROW_PYTHON_SCALAR_FUNCTION_OPERATOR_NAME);
    } else {
        if (isInProcessMode) {
            clazz = CommonPythonUtil.loadClass(PYTHON_SCALAR_FUNCTION_OPERATOR_NAME);
        } else {
            clazz = CommonPythonUtil.loadClass(EMBEDDED_PYTHON_SCALAR_FUNCTION_OPERATOR_NAME);
        }
    }
    final RowType inputType = inputRowTypeInfo.toRowType();
    final RowType outputType = outputRowTypeInfo.toRowType();
    final RowType udfInputType = (RowType) Projection.of(udfInputOffsets).project(inputType);
    final RowType forwardedFieldType = (RowType) Projection.of(forwardedFields).project(inputType);
    final RowType udfOutputType = (RowType) Projection.range(forwardedFields.length, outputType.getFieldCount()).project(outputType);
    try {
        if (isInProcessMode) {
            Constructor<?> ctor = clazz.getConstructor(Configuration.class, PythonFunctionInfo[].class, RowType.class, RowType.class, RowType.class, GeneratedProjection.class, GeneratedProjection.class);
            return (OneInputStreamOperator<RowData, RowData>) ctor.newInstance(pythonConfig, pythonFunctionInfos, inputType, udfInputType, udfOutputType, ProjectionCodeGenerator.generateProjection(CodeGeneratorContext.apply(config.getTableConfig()), "UdfInputProjection", inputType, udfInputType, udfInputOffsets), ProjectionCodeGenerator.generateProjection(CodeGeneratorContext.apply(config.getTableConfig()), "ForwardedFieldProjection", inputType, forwardedFieldType, forwardedFields));
        } else {
            if (forwardedFields.length > 0) {
                Constructor<?> ctor = clazz.getConstructor(Configuration.class, PythonFunctionInfo[].class, RowType.class, RowType.class, RowType.class, int[].class, GeneratedProjection.class);
                return (OneInputStreamOperator<RowData, RowData>) ctor.newInstance(pythonConfig, pythonFunctionInfos, inputType, udfInputType, udfOutputType, udfInputOffsets, ProjectionCodeGenerator.generateProjection(CodeGeneratorContext.apply(config.getTableConfig()), "ForwardedFieldProjection", inputType, forwardedFieldType, forwardedFields));
            } else {
                Constructor<?> ctor = clazz.getConstructor(Configuration.class, PythonFunctionInfo[].class, RowType.class, RowType.class, RowType.class, int[].class);
                return (OneInputStreamOperator<RowData, RowData>) ctor.newInstance(pythonConfig, pythonFunctionInfos, inputType, udfInputType, udfOutputType, udfInputOffsets);
            }
        }
    } catch (Exception e) {
        throw new TableException("Python Scalar Function Operator constructed failed.", e);
    }
}
Also used : PythonFunctionInfo(org.apache.flink.table.functions.python.PythonFunctionInfo) TableException(org.apache.flink.table.api.TableException) OneInputStreamOperator(org.apache.flink.streaming.api.operators.OneInputStreamOperator) RowType(org.apache.flink.table.types.logical.RowType) TableException(org.apache.flink.table.api.TableException)

Example 10 with PythonFunctionInfo

use of org.apache.flink.table.functions.python.PythonFunctionInfo in project flink by apache.

the class EmbeddedPythonScalarFunctionOperator method getUserDefinedFunctionsProto.

@Override
public FlinkFnApi.UserDefinedFunctions getUserDefinedFunctionsProto() {
    FlinkFnApi.UserDefinedFunctions.Builder builder = FlinkFnApi.UserDefinedFunctions.newBuilder();
    // add udf proto
    for (PythonFunctionInfo pythonFunctionInfo : scalarFunctions) {
        builder.addUdfs(ProtoUtils.getUserDefinedFunctionProto(pythonFunctionInfo));
    }
    builder.setMetricEnabled(pythonConfig.isMetricEnabled());
    builder.setProfileEnabled(pythonConfig.isProfileEnabled());
    return builder.build();
}
Also used : PythonFunctionInfo(org.apache.flink.table.functions.python.PythonFunctionInfo)

Aggregations

PythonFunctionInfo (org.apache.flink.table.functions.python.PythonFunctionInfo)18 RowType (org.apache.flink.table.types.logical.RowType)13 TableException (org.apache.flink.table.api.TableException)9 OneInputStreamOperator (org.apache.flink.streaming.api.operators.OneInputStreamOperator)8 InvocationTargetException (java.lang.reflect.InvocationTargetException)7 RowData (org.apache.flink.table.data.RowData)6 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)4 Configuration (org.apache.flink.configuration.Configuration)4 GeneratedProjection (org.apache.flink.table.runtime.generated.GeneratedProjection)4 InternalTypeInfo (org.apache.flink.table.runtime.typeutils.InternalTypeInfo)4 Constructor (java.lang.reflect.Constructor)3 ArrayList (java.util.ArrayList)3 List (java.util.List)3 AggregateCall (org.apache.calcite.rel.core.AggregateCall)3 RexCall (org.apache.calcite.rex.RexCall)3 RexNode (org.apache.calcite.rex.RexNode)3 Transformation (org.apache.flink.api.dag.Transformation)3 ManagedMemoryUseCase (org.apache.flink.core.memory.ManagedMemoryUseCase)3 OneInputTransformation (org.apache.flink.streaming.api.transformations.OneInputTransformation)3 Projection (org.apache.flink.table.connector.Projection)3