Examples with DataSet - org.apache.flink.api.java.DataSet

Example 6 with DataSet

use of org.apache.flink.api.java.DataSet in project flink by apache.

the class PythonPlanBinder method createHashPartitionOperation.

@SuppressWarnings("unchecked")
private void createHashPartitionOperation(PythonOperationInfo info) throws IOException {
    DataSet op1 = (DataSet) sets.get(info.parentID);
    sets.put(info.setID, op1.partitionByHash(info.keys).setParallelism(getParallelism(info)).map(new KeyDiscarder()).setParallelism(getParallelism(info)).name("HashPartitionPostStep"));
}

Also used : NestedKeyDiscarder(org.apache.flink.python.api.functions.util.NestedKeyDiscarder) KeyDiscarder(org.apache.flink.python.api.functions.util.KeyDiscarder) DataSet(org.apache.flink.api.java.DataSet)

Example 7 with DataSet

use of org.apache.flink.api.java.DataSet in project flink by apache.

the class PythonPlanBinder method createCrossOperation.

@SuppressWarnings("unchecked")
private void createCrossOperation(DatasizeHint mode, PythonOperationInfo info) {
    DataSet op1 = (DataSet) sets.get(info.parentID);
    DataSet op2 = (DataSet) sets.get(info.otherID);
    DefaultCross defaultResult;
    switch(mode) {
        case NONE:
            defaultResult = op1.cross(op2);
            break;
        case HUGE:
            defaultResult = op1.crossWithHuge(op2);
            break;
        case TINY:
            defaultResult = op1.crossWithTiny(op2);
            break;
        default:
            throw new IllegalArgumentException("Invalid Cross mode specified: " + mode);
    }
    defaultResult.setParallelism(getParallelism(info));
    if (info.usesUDF) {
        sets.put(info.setID, defaultResult.mapPartition(new PythonMapPartition(info.setID, info.types)).setParallelism(getParallelism(info)).name(info.name));
    } else {
        sets.put(info.setID, defaultResult.name("DefaultCross"));
    }
}

Also used : DefaultCross(org.apache.flink.api.java.operators.CrossOperator.DefaultCross) DataSet(org.apache.flink.api.java.DataSet) PythonMapPartition(org.apache.flink.python.api.functions.PythonMapPartition)

Example 8 with DataSet

use of org.apache.flink.api.java.DataSet in project flink by apache.

the class PythonPlanBinder method createCoGroupOperation.

@SuppressWarnings("unchecked")
private void createCoGroupOperation(PythonOperationInfo info) {
    DataSet op1 = (DataSet) sets.get(info.parentID);
    DataSet op2 = (DataSet) sets.get(info.otherID);
    Keys.ExpressionKeys<?> key1 = new Keys.ExpressionKeys(info.keys1, op1.getType());
    Keys.ExpressionKeys<?> key2 = new Keys.ExpressionKeys(info.keys2, op2.getType());
    PythonCoGroup pcg = new PythonCoGroup(info.setID, info.types);
    sets.put(info.setID, new CoGroupRawOperator(op1, op2, key1, key2, pcg, info.types, info.name).setParallelism(getParallelism(info)));
}

Also used : PythonCoGroup(org.apache.flink.python.api.functions.PythonCoGroup) CoGroupRawOperator(org.apache.flink.api.java.operators.CoGroupRawOperator) DataSet(org.apache.flink.api.java.DataSet) Keys(org.apache.flink.api.common.operators.Keys)

Example 9 with DataSet

use of org.apache.flink.api.java.DataSet in project flink by apache.

the class PythonPlanBinder method createMapPartitionOperation.

@SuppressWarnings("unchecked")
private void createMapPartitionOperation(PythonOperationInfo info) {
    DataSet op1 = (DataSet) sets.get(info.parentID);
    sets.put(info.setID, op1.mapPartition(new PythonMapPartition(info.setID, info.types)).setParallelism(getParallelism(info)).name(info.name));
}

Also used : DataSet(org.apache.flink.api.java.DataSet) PythonMapPartition(org.apache.flink.python.api.functions.PythonMapPartition)

Example 10 with DataSet

use of org.apache.flink.api.java.DataSet in project flink by apache.

the class PythonPlanBinder method createGroupOperation.

private void createGroupOperation(PythonOperationInfo info) throws IOException {
    DataSet op1 = (DataSet) sets.get(info.parentID);
    sets.put(info.setID, op1.groupBy(info.keys));
}

Also used : DataSet(org.apache.flink.api.java.DataSet)

Aggregations

DataSet (org.apache.flink.api.java.DataSet)43 ExecutionEnvironment (org.apache.flink.api.java.ExecutionEnvironment)18 Test (org.junit.Test)15 Graph (org.apache.flink.graph.Graph)14 DiscardingOutputFormat (org.apache.flink.api.java.io.DiscardingOutputFormat)11 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)11 NullValue (org.apache.flink.types.NullValue)11 Plan (org.apache.flink.api.common.Plan)7 FieldList (org.apache.flink.api.common.operators.util.FieldList)6 DualInputPlanNode (org.apache.flink.optimizer.plan.DualInputPlanNode)6 OptimizedPlan (org.apache.flink.optimizer.plan.OptimizedPlan)6 PlanNode (org.apache.flink.optimizer.plan.PlanNode)6 SinkPlanNode (org.apache.flink.optimizer.plan.SinkPlanNode)6 WorksetIterationPlanNode (org.apache.flink.optimizer.plan.WorksetIterationPlanNode)6 PythonMapPartition (org.apache.flink.python.api.functions.PythonMapPartition)6 LongSumAggregator (org.apache.flink.api.common.aggregators.LongSumAggregator)5 MapFunction (org.apache.flink.api.common.functions.MapFunction)5 Tuple3 (org.apache.flink.api.java.tuple.Tuple3)5 Edge (org.apache.flink.graph.Edge)5 Tuple2ToVertexMap (org.apache.flink.graph.utils.Tuple2ToVertexMap)5