use of org.apache.flink.api.java.operators.MapPartitionOperator in project flink by apache.
the class DataSetUtils method sampleWithSize.
/**
* Generate a sample of DataSet which contains fixed size elements.
* <p>
* <strong>NOTE:</strong> Sample with fixed size is not as efficient as sample with fraction, use sample with
* fraction unless you need exact precision.
* </p>
*
* @param withReplacement Whether element can be selected more than once.
* @param numSamples The expected sample size.
* @param seed Random number generator seed.
* @return The sampled DataSet
*/
public static <T> DataSet<T> sampleWithSize(DataSet<T> input, final boolean withReplacement, final int numSamples, final long seed) {
SampleInPartition<T> sampleInPartition = new SampleInPartition<>(withReplacement, numSamples, seed);
MapPartitionOperator mapPartitionOperator = input.mapPartition(sampleInPartition);
// There is no previous group, so the parallelism of GroupReduceOperator is always 1.
String callLocation = Utils.getCallLocationName();
SampleInCoordinator<T> sampleInCoordinator = new SampleInCoordinator<>(withReplacement, numSamples, seed);
return new GroupReduceOperator<>(mapPartitionOperator, input.getType(), sampleInCoordinator, callLocation);
}
Aggregations