Search in sources :

Example 1 with SingleFileHandleCreateFactory

use of org.apache.hudi.io.SingleFileHandleCreateFactory in project hudi by apache.

the class SparkSingleFileSortExecutionStrategy method performClusteringWithRecordsRDD.

@Override
public HoodieData<WriteStatus> performClusteringWithRecordsRDD(HoodieData<HoodieRecord<T>> inputRecords, int numOutputGroups, String instantTime, Map<String, String> strategyParams, Schema schema, List<HoodieFileGroupId> fileGroupIdList, boolean preserveHoodieMetadata) {
    if (numOutputGroups != 1 || fileGroupIdList.size() != 1) {
        throw new HoodieClusteringException("Expect only one file group for strategy: " + getClass().getName());
    }
    LOG.info("Starting clustering for a group, parallelism:" + numOutputGroups + " commit:" + instantTime);
    Properties props = getWriteConfig().getProps();
    props.put(HoodieWriteConfig.BULKINSERT_PARALLELISM_VALUE.key(), String.valueOf(numOutputGroups));
    // We are calling another action executor - disable auto commit. Strategy is only expected to write data in new files.
    props.put(HoodieWriteConfig.AUTO_COMMIT_ENABLE.key(), Boolean.FALSE.toString());
    // Since clustering will write to single file group using HoodieUnboundedCreateHandle, set max file size to a large value.
    props.put(HoodieStorageConfig.PARQUET_MAX_FILE_SIZE.key(), String.valueOf(Long.MAX_VALUE));
    HoodieWriteConfig newConfig = HoodieWriteConfig.newBuilder().withProps(props).build();
    return (HoodieData<WriteStatus>) SparkBulkInsertHelper.newInstance().bulkInsert(inputRecords, instantTime, getHoodieTable(), newConfig, false, getPartitioner(strategyParams, schema), true, numOutputGroups, new SingleFileHandleCreateFactory(fileGroupIdList.get(0).getFileId(), preserveHoodieMetadata));
}
Also used : HoodieData(org.apache.hudi.common.data.HoodieData) HoodieClusteringException(org.apache.hudi.exception.HoodieClusteringException) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) Properties(java.util.Properties) SingleFileHandleCreateFactory(org.apache.hudi.io.SingleFileHandleCreateFactory)

Aggregations

Properties (java.util.Properties)1 HoodieData (org.apache.hudi.common.data.HoodieData)1 HoodieWriteConfig (org.apache.hudi.config.HoodieWriteConfig)1 HoodieClusteringException (org.apache.hudi.exception.HoodieClusteringException)1 SingleFileHandleCreateFactory (org.apache.hudi.io.SingleFileHandleCreateFactory)1