Search in sources :

Example 1 with HoodieJavaRDD

use of org.apache.hudi.data.HoodieJavaRDD in project hudi by apache.

the class MultipleSparkJobExecutionStrategy method performClustering.

@Override
public HoodieWriteMetadata<HoodieData<WriteStatus>> performClustering(final HoodieClusteringPlan clusteringPlan, final Schema schema, final String instantTime) {
    JavaSparkContext engineContext = HoodieSparkEngineContext.getSparkContext(getEngineContext());
    // execute clustering for each group async and collect WriteStatus
    Stream<HoodieData<WriteStatus>> writeStatusesStream = FutureUtils.allOf(clusteringPlan.getInputGroups().stream().map(inputGroup -> runClusteringForGroupAsync(inputGroup, clusteringPlan.getStrategy().getStrategyParams(), Option.ofNullable(clusteringPlan.getPreserveHoodieMetadata()).orElse(false), instantTime)).collect(Collectors.toList())).join().stream();
    JavaRDD<WriteStatus>[] writeStatuses = convertStreamToArray(writeStatusesStream.map(HoodieJavaRDD::getJavaRDD));
    JavaRDD<WriteStatus> writeStatusRDD = engineContext.union(writeStatuses);
    HoodieWriteMetadata<HoodieData<WriteStatus>> writeMetadata = new HoodieWriteMetadata<>();
    writeMetadata.setWriteStatuses(HoodieJavaRDD.of(writeStatusRDD));
    return writeMetadata;
}
Also used : HoodieData(org.apache.hudi.common.data.HoodieData) HoodieWriteMetadata(org.apache.hudi.table.action.HoodieWriteMetadata) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) WriteStatus(org.apache.hudi.client.WriteStatus) HoodieJavaRDD(org.apache.hudi.data.HoodieJavaRDD) JavaRDD(org.apache.spark.api.java.JavaRDD)

Aggregations

WriteStatus (org.apache.hudi.client.WriteStatus)1 HoodieData (org.apache.hudi.common.data.HoodieData)1 HoodieJavaRDD (org.apache.hudi.data.HoodieJavaRDD)1 HoodieWriteMetadata (org.apache.hudi.table.action.HoodieWriteMetadata)1 JavaRDD (org.apache.spark.api.java.JavaRDD)1 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)1