Search in sources :

Example 1 with MapPartitionOperator

use of org.apache.flink.api.java.operators.MapPartitionOperator in project flink by apache.

the class DataSetUtils method sampleWithSize.

/**
	 * Generate a sample of DataSet which contains fixed size elements.
	 * <p>
	 * <strong>NOTE:</strong> Sample with fixed size is not as efficient as sample with fraction, use sample with
	 * fraction unless you need exact precision.
	 * </p>
	 *
	 * @param withReplacement Whether element can be selected more than once.
	 * @param numSamples       The expected sample size.
	 * @param seed            Random number generator seed.
	 * @return The sampled DataSet
	 */
public static <T> DataSet<T> sampleWithSize(DataSet<T> input, final boolean withReplacement, final int numSamples, final long seed) {
    SampleInPartition<T> sampleInPartition = new SampleInPartition<>(withReplacement, numSamples, seed);
    MapPartitionOperator mapPartitionOperator = input.mapPartition(sampleInPartition);
    // There is no previous group, so the parallelism of GroupReduceOperator is always 1.
    String callLocation = Utils.getCallLocationName();
    SampleInCoordinator<T> sampleInCoordinator = new SampleInCoordinator<>(withReplacement, numSamples, seed);
    return new GroupReduceOperator<>(mapPartitionOperator, input.getType(), sampleInCoordinator, callLocation);
}
Also used : GroupReduceOperator(org.apache.flink.api.java.operators.GroupReduceOperator) SampleInPartition(org.apache.flink.api.java.functions.SampleInPartition) SampleInCoordinator(org.apache.flink.api.java.functions.SampleInCoordinator) MapPartitionOperator(org.apache.flink.api.java.operators.MapPartitionOperator)

Aggregations

SampleInCoordinator (org.apache.flink.api.java.functions.SampleInCoordinator)1 SampleInPartition (org.apache.flink.api.java.functions.SampleInPartition)1 GroupReduceOperator (org.apache.flink.api.java.operators.GroupReduceOperator)1 MapPartitionOperator (org.apache.flink.api.java.operators.MapPartitionOperator)1