Search in sources :

Example 6 with HoodieClusteringException

use of org.apache.hudi.exception.HoodieClusteringException in project hudi by apache.

the class SparkSingleFileSortExecutionStrategy method performClusteringWithRecordsRDD.

@Override
public HoodieData<WriteStatus> performClusteringWithRecordsRDD(HoodieData<HoodieRecord<T>> inputRecords, int numOutputGroups, String instantTime, Map<String, String> strategyParams, Schema schema, List<HoodieFileGroupId> fileGroupIdList, boolean preserveHoodieMetadata) {
    if (numOutputGroups != 1 || fileGroupIdList.size() != 1) {
        throw new HoodieClusteringException("Expect only one file group for strategy: " + getClass().getName());
    }
    LOG.info("Starting clustering for a group, parallelism:" + numOutputGroups + " commit:" + instantTime);
    Properties props = getWriteConfig().getProps();
    props.put(HoodieWriteConfig.BULKINSERT_PARALLELISM_VALUE.key(), String.valueOf(numOutputGroups));
    // We are calling another action executor - disable auto commit. Strategy is only expected to write data in new files.
    props.put(HoodieWriteConfig.AUTO_COMMIT_ENABLE.key(), Boolean.FALSE.toString());
    // Since clustering will write to single file group using HoodieUnboundedCreateHandle, set max file size to a large value.
    props.put(HoodieStorageConfig.PARQUET_MAX_FILE_SIZE.key(), String.valueOf(Long.MAX_VALUE));
    HoodieWriteConfig newConfig = HoodieWriteConfig.newBuilder().withProps(props).build();
    return (HoodieData<WriteStatus>) SparkBulkInsertHelper.newInstance().bulkInsert(inputRecords, instantTime, getHoodieTable(), newConfig, false, getPartitioner(strategyParams, schema), true, numOutputGroups, new SingleFileHandleCreateFactory(fileGroupIdList.get(0).getFileId(), preserveHoodieMetadata));
}
Also used : HoodieData(org.apache.hudi.common.data.HoodieData) HoodieClusteringException(org.apache.hudi.exception.HoodieClusteringException) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) Properties(java.util.Properties) SingleFileHandleCreateFactory(org.apache.hudi.io.SingleFileHandleCreateFactory)

Aggregations

HoodieClusteringException (org.apache.hudi.exception.HoodieClusteringException)6 HoodieRecord (org.apache.hudi.common.model.HoodieRecord)5 HoodieWriteConfig (org.apache.hudi.config.HoodieWriteConfig)5 IOException (java.io.IOException)4 ArrayList (java.util.ArrayList)4 Schema (org.apache.avro.Schema)4 Path (org.apache.hadoop.fs.Path)4 HoodieData (org.apache.hudi.common.data.HoodieData)4 HoodieTable (org.apache.hudi.table.HoodieTable)4 List (java.util.List)3 Map (java.util.Map)3 Collectors (java.util.stream.Collectors)3 IndexedRecord (org.apache.avro.generic.IndexedRecord)3 HoodieSparkEngineContext (org.apache.hudi.client.common.HoodieSparkEngineContext)3 HoodieEngineContext (org.apache.hudi.common.engine.HoodieEngineContext)3 HoodieKey (org.apache.hudi.common.model.HoodieKey)3 HoodieRecordPayload (org.apache.hudi.common.model.HoodieRecordPayload)3 Option (org.apache.hudi.common.util.Option)3 Iterator (java.util.Iterator)2 CompletableFuture (java.util.concurrent.CompletableFuture)2