Search in sources :

Example 36 with DataSet

use of org.apache.flink.api.java.DataSet in project flink by apache.

the class PythonPlanBinder method createUnionOperation.

@SuppressWarnings("unchecked")
private void createUnionOperation(PythonOperationInfo info) throws IOException {
    DataSet op1 = (DataSet) sets.get(info.parentID);
    DataSet op2 = (DataSet) sets.get(info.otherID);
    sets.put(info.setID, op1.union(op2).setParallelism(getParallelism(info)).name("Union"));
}
Also used : DataSet(org.apache.flink.api.java.DataSet)

Example 37 with DataSet

use of org.apache.flink.api.java.DataSet in project flink by apache.

the class PythonPlanBinder method createDistinctOperation.

@SuppressWarnings("unchecked")
private void createDistinctOperation(PythonOperationInfo info) throws IOException {
    DataSet op = (DataSet) sets.get(info.parentID);
    sets.put(info.setID, op.distinct(info.keys).setParallelism(getParallelism(info)).name("Distinct").map(new KeyDiscarder()).setParallelism(getParallelism(info)).name("DistinctPostStep"));
}
Also used : NestedKeyDiscarder(org.apache.flink.python.api.functions.util.NestedKeyDiscarder) KeyDiscarder(org.apache.flink.python.api.functions.util.KeyDiscarder) DataSet(org.apache.flink.api.java.DataSet)

Example 38 with DataSet

use of org.apache.flink.api.java.DataSet in project flink by apache.

the class PythonPlanBinder method createCsvSink.

@SuppressWarnings("unchecked")
private void createCsvSink(PythonOperationInfo info) throws IOException {
    DataSet parent = (DataSet) sets.get(info.parentID);
    parent.map(new StringTupleDeserializerMap()).setParallelism(getParallelism(info)).name("CsvSinkPreStep").writeAsCsv(info.path, info.lineDelimiter, info.fieldDelimiter, info.writeMode).setParallelism(getParallelism(info)).name("CsvSink");
}
Also used : StringTupleDeserializerMap(org.apache.flink.python.api.functions.util.StringTupleDeserializerMap) DataSet(org.apache.flink.api.java.DataSet)

Example 39 with DataSet

use of org.apache.flink.api.java.DataSet in project flink by apache.

the class PythonPlanBinder method createJoinOperation.

@SuppressWarnings("unchecked")
private void createJoinOperation(DatasizeHint mode, PythonOperationInfo info) {
    DataSet op1 = (DataSet) sets.get(info.parentID);
    DataSet op2 = (DataSet) sets.get(info.otherID);
    if (info.usesUDF) {
        sets.put(info.setID, createDefaultJoin(op1, op2, info.keys1, info.keys2, mode, getParallelism(info)).mapPartition(new PythonMapPartition(info.setID, info.types)).setParallelism(getParallelism(info)).name(info.name));
    } else {
        sets.put(info.setID, createDefaultJoin(op1, op2, info.keys1, info.keys2, mode, getParallelism(info)));
    }
}
Also used : DataSet(org.apache.flink.api.java.DataSet) PythonMapPartition(org.apache.flink.python.api.functions.PythonMapPartition)

Example 40 with DataSet

use of org.apache.flink.api.java.DataSet in project flink by apache.

the class PythonPlanBinder method createBroadcastVariable.

private void createBroadcastVariable(PythonOperationInfo info) throws IOException {
    UdfOperator<?> op1 = (UdfOperator) sets.get(info.parentID);
    DataSet<?> op2 = (DataSet) sets.get(info.otherID);
    op1.withBroadcastSet(op2, info.name);
    Configuration c = op1.getParameters();
    if (c == null) {
        c = new Configuration();
    }
    int count = c.getInteger(PLANBINDER_CONFIG_BCVAR_COUNT, 0);
    c.setInteger(PLANBINDER_CONFIG_BCVAR_COUNT, count + 1);
    c.setString(PLANBINDER_CONFIG_BCVAR_NAME_PREFIX + count, info.name);
    op1.withParameters(c);
}
Also used : UdfOperator(org.apache.flink.api.java.operators.UdfOperator) Configuration(org.apache.flink.configuration.Configuration) GlobalConfiguration(org.apache.flink.configuration.GlobalConfiguration) DataSet(org.apache.flink.api.java.DataSet) DatasizeHint(org.apache.flink.python.api.PythonOperationInfo.DatasizeHint)

Aggregations

DataSet (org.apache.flink.api.java.DataSet)43 ExecutionEnvironment (org.apache.flink.api.java.ExecutionEnvironment)18 Test (org.junit.Test)15 Graph (org.apache.flink.graph.Graph)14 DiscardingOutputFormat (org.apache.flink.api.java.io.DiscardingOutputFormat)11 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)11 NullValue (org.apache.flink.types.NullValue)11 Plan (org.apache.flink.api.common.Plan)7 FieldList (org.apache.flink.api.common.operators.util.FieldList)6 DualInputPlanNode (org.apache.flink.optimizer.plan.DualInputPlanNode)6 OptimizedPlan (org.apache.flink.optimizer.plan.OptimizedPlan)6 PlanNode (org.apache.flink.optimizer.plan.PlanNode)6 SinkPlanNode (org.apache.flink.optimizer.plan.SinkPlanNode)6 WorksetIterationPlanNode (org.apache.flink.optimizer.plan.WorksetIterationPlanNode)6 PythonMapPartition (org.apache.flink.python.api.functions.PythonMapPartition)6 LongSumAggregator (org.apache.flink.api.common.aggregators.LongSumAggregator)5 MapFunction (org.apache.flink.api.common.functions.MapFunction)5 Tuple3 (org.apache.flink.api.java.tuple.Tuple3)5 Edge (org.apache.flink.graph.Edge)5 Tuple2ToVertexMap (org.apache.flink.graph.utils.Tuple2ToVertexMap)5