Search in sources :

Example 1 with PythonFunctionInfo

use of org.apache.flink.table.functions.python.PythonFunctionInfo in project flink by apache.

the class AbstractPythonScalarFunctionOperator method getUserDefinedFunctionsProto.

/**
 * Gets the proto representation of the Python user-defined functions to be executed.
 */
@Override
public FlinkFnApi.UserDefinedFunctions getUserDefinedFunctionsProto() {
    FlinkFnApi.UserDefinedFunctions.Builder builder = FlinkFnApi.UserDefinedFunctions.newBuilder();
    // add udf proto
    for (PythonFunctionInfo pythonFunctionInfo : scalarFunctions) {
        builder.addUdfs(ProtoUtils.getUserDefinedFunctionProto(pythonFunctionInfo));
    }
    builder.setMetricEnabled(pythonConfig.isMetricEnabled());
    builder.setProfileEnabled(pythonConfig.isProfileEnabled());
    return builder.build();
}
Also used : PythonFunctionInfo(org.apache.flink.table.functions.python.PythonFunctionInfo)

Example 2 with PythonFunctionInfo

use of org.apache.flink.table.functions.python.PythonFunctionInfo in project flink by apache.

the class AbstractBatchArrowPythonAggregateFunctionOperatorTest method getTestHarness.

public OneInputStreamOperatorTestHarness<RowData, RowData> getTestHarness(Configuration config) throws Exception {
    RowType inputType = getInputType();
    RowType outputType = getOutputType();
    AbstractArrowPythonAggregateFunctionOperator operator = getTestOperator(config, new PythonFunctionInfo[] { new PythonFunctionInfo(PythonScalarFunctionOperatorTestBase.DummyPythonFunction.INSTANCE, new Integer[] { 0 }) }, inputType, outputType, new int[] { 0 }, new int[] { 2 });
    OneInputStreamOperatorTestHarness<RowData, RowData> testHarness = new OneInputStreamOperatorTestHarness<>(operator);
    testHarness.getStreamConfig().setManagedMemoryFractionOperatorOfUseCase(ManagedMemoryUseCase.PYTHON, 0.5);
    testHarness.setup(new RowDataSerializer(outputType));
    return testHarness;
}
Also used : PythonFunctionInfo(org.apache.flink.table.functions.python.PythonFunctionInfo) RowData(org.apache.flink.table.data.RowData) AbstractArrowPythonAggregateFunctionOperator(org.apache.flink.table.runtime.operators.python.aggregate.arrow.AbstractArrowPythonAggregateFunctionOperator) RowType(org.apache.flink.table.types.logical.RowType) OneInputStreamOperatorTestHarness(org.apache.flink.streaming.util.OneInputStreamOperatorTestHarness) RowDataSerializer(org.apache.flink.table.runtime.typeutils.RowDataSerializer)

Example 3 with PythonFunctionInfo

use of org.apache.flink.table.functions.python.PythonFunctionInfo in project flink by apache.

the class AbstractStreamArrowPythonAggregateFunctionOperatorTest method getTestHarness.

public OneInputStreamOperatorTestHarness<RowData, RowData> getTestHarness(Configuration config) throws Exception {
    RowType inputType = getInputType();
    RowType outputType = getOutputType();
    AbstractArrowPythonAggregateFunctionOperator operator = getTestOperator(config, new PythonFunctionInfo[] { new PythonFunctionInfo(PythonScalarFunctionOperatorTestBase.DummyPythonFunction.INSTANCE, new Integer[] { 0 }) }, inputType, outputType, new int[] { 0 }, new int[] { 2 });
    int[] grouping = new int[] { 0 };
    RowDataKeySelector keySelector = KeySelectorUtil.getRowDataSelector(grouping, InternalTypeInfo.of(getInputType()));
    OneInputStreamOperatorTestHarness<RowData, RowData> testHarness = new KeyedOneInputStreamOperatorTestHarness<>(operator, keySelector, keySelector.getProducedType());
    testHarness.getStreamConfig().setManagedMemoryFractionOperatorOfUseCase(ManagedMemoryUseCase.PYTHON, 0.5);
    testHarness.setup(new RowDataSerializer(outputType));
    return testHarness;
}
Also used : PythonFunctionInfo(org.apache.flink.table.functions.python.PythonFunctionInfo) RowData(org.apache.flink.table.data.RowData) AbstractArrowPythonAggregateFunctionOperator(org.apache.flink.table.runtime.operators.python.aggregate.arrow.AbstractArrowPythonAggregateFunctionOperator) RowDataKeySelector(org.apache.flink.table.runtime.keyselector.RowDataKeySelector) RowType(org.apache.flink.table.types.logical.RowType) KeyedOneInputStreamOperatorTestHarness(org.apache.flink.streaming.util.KeyedOneInputStreamOperatorTestHarness) RowDataSerializer(org.apache.flink.table.runtime.typeutils.RowDataSerializer)

Example 4 with PythonFunctionInfo

use of org.apache.flink.table.functions.python.PythonFunctionInfo in project flink by apache.

the class CommonPythonUtil method extractPythonAggregateFunctionInfosFromAggregateCall.

public static Tuple2<int[], PythonFunctionInfo[]> extractPythonAggregateFunctionInfosFromAggregateCall(AggregateCall[] aggCalls) {
    Map<Integer, Integer> inputNodes = new LinkedHashMap<>();
    List<PythonFunctionInfo> pythonFunctionInfos = new ArrayList<>();
    for (AggregateCall aggregateCall : aggCalls) {
        List<Integer> inputs = new ArrayList<>();
        List<Integer> argList = aggregateCall.getArgList();
        for (Integer arg : argList) {
            if (inputNodes.containsKey(arg)) {
                inputs.add(inputNodes.get(arg));
            } else {
                Integer inputOffset = inputNodes.size();
                inputs.add(inputOffset);
                inputNodes.put(arg, inputOffset);
            }
        }
        PythonFunction pythonFunction = null;
        SqlAggFunction aggregateFunction = aggregateCall.getAggregation();
        if (aggregateFunction instanceof AggSqlFunction) {
            pythonFunction = (PythonFunction) ((AggSqlFunction) aggregateFunction).aggregateFunction();
        } else if (aggregateFunction instanceof BridgingSqlAggFunction) {
            pythonFunction = (PythonFunction) ((BridgingSqlAggFunction) aggregateFunction).getDefinition();
        }
        PythonFunctionInfo pythonFunctionInfo = new PythonAggregateFunctionInfo(pythonFunction, inputs.toArray(), aggregateCall.filterArg, aggregateCall.isDistinct());
        pythonFunctionInfos.add(pythonFunctionInfo);
    }
    int[] udafInputOffsets = inputNodes.keySet().stream().mapToInt(i -> i).toArray();
    return Tuple2.of(udafInputOffsets, pythonFunctionInfos.toArray(new PythonFunctionInfo[0]));
}
Also used : TypeInference(org.apache.flink.table.types.inference.TypeInference) MapView(org.apache.flink.table.api.dataview.MapView) SumAggFunction(org.apache.flink.table.planner.functions.aggfunctions.SumAggFunction) DataType(org.apache.flink.table.types.DataType) Sum0AggFunction(org.apache.flink.table.planner.functions.aggfunctions.Sum0AggFunction) Arrays(java.util.Arrays) Tuple2(org.apache.flink.api.java.tuple.Tuple2) DataViewSpec(org.apache.flink.table.runtime.dataview.DataViewSpec) StructuredType(org.apache.flink.table.types.logical.StructuredType) LastValueWithRetractAggFunction(org.apache.flink.table.runtime.functions.aggregate.LastValueWithRetractAggFunction) ListViewSpec(org.apache.flink.table.runtime.dataview.ListViewSpec) CountAggFunction(org.apache.flink.table.planner.functions.aggfunctions.CountAggFunction) BigDecimal(java.math.BigDecimal) PythonFunction(org.apache.flink.table.functions.python.PythonFunction) RexNode(org.apache.calcite.rex.RexNode) MaxAggFunction(org.apache.flink.table.planner.functions.aggfunctions.MaxAggFunction) Map(java.util.Map) FirstValueAggFunction(org.apache.flink.table.runtime.functions.aggregate.FirstValueAggFunction) Method(java.lang.reflect.Method) TableSqlFunction(org.apache.flink.table.planner.functions.utils.TableSqlFunction) BuiltInPythonAggregateFunction(org.apache.flink.table.functions.python.BuiltInPythonAggregateFunction) TableConfig(org.apache.flink.table.api.TableConfig) ListAggWsWithRetractAggFunction(org.apache.flink.table.runtime.functions.aggregate.ListAggWsWithRetractAggFunction) RexLiteral(org.apache.calcite.rex.RexLiteral) AggregateInfoList(org.apache.flink.table.planner.plan.utils.AggregateInfoList) MapViewSpec(org.apache.flink.table.runtime.dataview.MapViewSpec) UserDefinedFunction(org.apache.flink.table.functions.UserDefinedFunction) Count1AggFunction(org.apache.flink.table.planner.functions.aggfunctions.Count1AggFunction) FirstValueWithRetractAggFunction(org.apache.flink.table.runtime.functions.aggregate.FirstValueWithRetractAggFunction) InvocationTargetException(java.lang.reflect.InvocationTargetException) Objects(java.util.Objects) AggSqlFunction(org.apache.flink.table.planner.functions.utils.AggSqlFunction) List(java.util.List) ListAggFunction(org.apache.flink.table.planner.functions.aggfunctions.ListAggFunction) BridgingSqlAggFunction(org.apache.flink.table.planner.functions.bridging.BridgingSqlAggFunction) LogicalType(org.apache.flink.table.types.logical.LogicalType) PythonAggregateFunctionInfo(org.apache.flink.table.functions.python.PythonAggregateFunctionInfo) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) RexCall(org.apache.calcite.rex.RexCall) IntStream(java.util.stream.IntStream) MinAggFunction(org.apache.flink.table.planner.functions.aggfunctions.MinAggFunction) ListAggWithRetractAggFunction(org.apache.flink.table.runtime.functions.aggregate.ListAggWithRetractAggFunction) DummyStreamExecutionEnvironment(org.apache.flink.table.planner.utils.DummyStreamExecutionEnvironment) MinWithRetractAggFunction(org.apache.flink.table.runtime.functions.aggregate.MinWithRetractAggFunction) RowType(org.apache.flink.table.types.logical.RowType) ArrayList(java.util.ArrayList) LinkedHashMap(java.util.LinkedHashMap) DataView(org.apache.flink.table.api.dataview.DataView) FieldsDataType(org.apache.flink.table.types.FieldsDataType) ConfigOption(org.apache.flink.configuration.ConfigOption) SqlOperator(org.apache.calcite.sql.SqlOperator) ListView(org.apache.flink.table.api.dataview.ListView) AggregateInfo(org.apache.flink.table.planner.plan.utils.AggregateInfo) SqlTypeName(org.apache.calcite.sql.type.SqlTypeName) FunctionDefinition(org.apache.flink.table.functions.FunctionDefinition) AvgAggFunction(org.apache.flink.table.planner.functions.aggfunctions.AvgAggFunction) MaxWithRetractAggFunction(org.apache.flink.table.runtime.functions.aggregate.MaxWithRetractAggFunction) Configuration(org.apache.flink.configuration.Configuration) TableException(org.apache.flink.table.api.TableException) SumWithRetractAggFunction(org.apache.flink.table.planner.functions.aggfunctions.SumWithRetractAggFunction) PythonFunctionInfo(org.apache.flink.table.functions.python.PythonFunctionInfo) Field(java.lang.reflect.Field) LastValueAggFunction(org.apache.flink.table.runtime.functions.aggregate.LastValueAggFunction) BridgingSqlFunction(org.apache.flink.table.planner.functions.bridging.BridgingSqlFunction) ScalarSqlFunction(org.apache.flink.table.planner.functions.utils.ScalarSqlFunction) AggregateCall(org.apache.calcite.rel.core.AggregateCall) SqlAggFunction(org.apache.calcite.sql.SqlAggFunction) PythonFunctionInfo(org.apache.flink.table.functions.python.PythonFunctionInfo) ArrayList(java.util.ArrayList) BridgingSqlAggFunction(org.apache.flink.table.planner.functions.bridging.BridgingSqlAggFunction) SqlAggFunction(org.apache.calcite.sql.SqlAggFunction) BridgingSqlAggFunction(org.apache.flink.table.planner.functions.bridging.BridgingSqlAggFunction) LinkedHashMap(java.util.LinkedHashMap) AggregateCall(org.apache.calcite.rel.core.AggregateCall) PythonAggregateFunctionInfo(org.apache.flink.table.functions.python.PythonAggregateFunctionInfo) PythonFunction(org.apache.flink.table.functions.python.PythonFunction) AggSqlFunction(org.apache.flink.table.planner.functions.utils.AggSqlFunction)

Example 5 with PythonFunctionInfo

use of org.apache.flink.table.functions.python.PythonFunctionInfo in project flink by apache.

the class StreamExecPythonOverAggregate method getPythonOverWindowAggregateFunctionOperator.

@SuppressWarnings("unchecked")
private OneInputStreamOperator<RowData, RowData> getPythonOverWindowAggregateFunctionOperator(ExecNodeConfig config, Configuration pythonConfig, RowType inputRowType, RowType outputRowType, int rowTiemIdx, long lowerBoundary, boolean isRowsClause, int[] udafInputOffsets, PythonFunctionInfo[] pythonFunctionInfos, long minIdleStateRetentionTime, long maxIdleStateRetentionTime) {
    RowType userDefinedFunctionInputType = (RowType) Projection.of(udafInputOffsets).project(inputRowType);
    RowType userDefinedFunctionOutputType = (RowType) Projection.range(inputRowType.getFieldCount(), outputRowType.getFieldCount()).project(outputRowType);
    GeneratedProjection generatedProjection = ProjectionCodeGenerator.generateProjection(CodeGeneratorContext.apply(config.getTableConfig()), "UdafInputProjection", inputRowType, userDefinedFunctionInputType, udafInputOffsets);
    if (isRowsClause) {
        String className;
        if (rowTiemIdx != -1) {
            className = ARROW_PYTHON_OVER_WINDOW_ROWS_ROW_TIME_AGGREGATE_FUNCTION_OPERATOR_NAME;
        } else {
            className = ARROW_PYTHON_OVER_WINDOW_ROWS_PROC_TIME_AGGREGATE_FUNCTION_OPERATOR_NAME;
        }
        Class<?> clazz = CommonPythonUtil.loadClass(className);
        try {
            Constructor<?> ctor = clazz.getConstructor(Configuration.class, long.class, long.class, PythonFunctionInfo[].class, RowType.class, RowType.class, RowType.class, int.class, long.class, GeneratedProjection.class);
            return (OneInputStreamOperator<RowData, RowData>) ctor.newInstance(pythonConfig, minIdleStateRetentionTime, maxIdleStateRetentionTime, pythonFunctionInfos, inputRowType, userDefinedFunctionInputType, userDefinedFunctionOutputType, rowTiemIdx, lowerBoundary, generatedProjection);
        } catch (NoSuchMethodException | InstantiationException | IllegalAccessException | InvocationTargetException e) {
            throw new TableException("Python Arrow Over Rows Window Function Operator constructed failed.", e);
        }
    } else {
        String className;
        if (rowTiemIdx != -1) {
            className = ARROW_PYTHON_OVER_WINDOW_RANGE_ROW_TIME_AGGREGATE_FUNCTION_OPERATOR_NAME;
        } else {
            className = ARROW_PYTHON_OVER_WINDOW_RANGE_PROC_TIME_AGGREGATE_FUNCTION_OPERATOR_NAME;
        }
        Class<?> clazz = CommonPythonUtil.loadClass(className);
        try {
            Constructor<?> ctor = clazz.getConstructor(Configuration.class, PythonFunctionInfo[].class, RowType.class, RowType.class, RowType.class, int.class, long.class, GeneratedProjection.class);
            return (OneInputStreamOperator<RowData, RowData>) ctor.newInstance(pythonConfig, pythonFunctionInfos, inputRowType, userDefinedFunctionInputType, userDefinedFunctionOutputType, rowTiemIdx, lowerBoundary, generatedProjection);
        } catch (NoSuchMethodException | InstantiationException | IllegalAccessException | InvocationTargetException e) {
            throw new TableException("Python Arrow Over Range Window Function Operator constructed failed.", e);
        }
    }
}
Also used : PythonFunctionInfo(org.apache.flink.table.functions.python.PythonFunctionInfo) TableException(org.apache.flink.table.api.TableException) RowType(org.apache.flink.table.types.logical.RowType) InvocationTargetException(java.lang.reflect.InvocationTargetException) OneInputStreamOperator(org.apache.flink.streaming.api.operators.OneInputStreamOperator) GeneratedProjection(org.apache.flink.table.runtime.generated.GeneratedProjection)

Aggregations

PythonFunctionInfo (org.apache.flink.table.functions.python.PythonFunctionInfo)18 RowType (org.apache.flink.table.types.logical.RowType)13 TableException (org.apache.flink.table.api.TableException)9 OneInputStreamOperator (org.apache.flink.streaming.api.operators.OneInputStreamOperator)8 InvocationTargetException (java.lang.reflect.InvocationTargetException)7 RowData (org.apache.flink.table.data.RowData)6 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)4 Configuration (org.apache.flink.configuration.Configuration)4 GeneratedProjection (org.apache.flink.table.runtime.generated.GeneratedProjection)4 InternalTypeInfo (org.apache.flink.table.runtime.typeutils.InternalTypeInfo)4 Constructor (java.lang.reflect.Constructor)3 ArrayList (java.util.ArrayList)3 List (java.util.List)3 AggregateCall (org.apache.calcite.rel.core.AggregateCall)3 RexCall (org.apache.calcite.rex.RexCall)3 RexNode (org.apache.calcite.rex.RexNode)3 Transformation (org.apache.flink.api.dag.Transformation)3 ManagedMemoryUseCase (org.apache.flink.core.memory.ManagedMemoryUseCase)3 OneInputTransformation (org.apache.flink.streaming.api.transformations.OneInputTransformation)3 Projection (org.apache.flink.table.connector.Projection)3