Search in sources :

Example 1 with FileIdPrefixProvider

use of org.apache.hudi.table.FileIdPrefixProvider in project hudi by apache.

the class JavaBulkInsertHelper method bulkInsert.

@Override
public List<WriteStatus> bulkInsert(List<HoodieRecord<T>> inputRecords, String instantTime, HoodieTable<T, List<HoodieRecord<T>>, List<HoodieKey>, List<WriteStatus>> table, HoodieWriteConfig config, boolean performDedupe, Option<BulkInsertPartitioner> userDefinedBulkInsertPartitioner, boolean useWriterSchema, int parallelism, WriteHandleFactory writeHandleFactory) {
    // De-dupe/merge if needed
    List<HoodieRecord<T>> dedupedRecords = inputRecords;
    if (performDedupe) {
        dedupedRecords = (List<HoodieRecord<T>>) JavaWriteHelper.newInstance().combineOnCondition(config.shouldCombineBeforeInsert(), inputRecords, parallelism, table);
    }
    final List<HoodieRecord<T>> repartitionedRecords;
    BulkInsertPartitioner partitioner = userDefinedBulkInsertPartitioner.isPresent() ? userDefinedBulkInsertPartitioner.get() : JavaBulkInsertInternalPartitionerFactory.get(config.getBulkInsertSortMode());
    // only List is supported for Java partitioner, but it is not enforced by BulkInsertPartitioner API. To improve this, TODO HUDI-3463
    repartitionedRecords = (List<HoodieRecord<T>>) partitioner.repartitionRecords(dedupedRecords, parallelism);
    FileIdPrefixProvider fileIdPrefixProvider = (FileIdPrefixProvider) ReflectionUtils.loadClass(config.getFileIdPrefixProviderClassName(), config.getProps());
    List<WriteStatus> writeStatuses = new ArrayList<>();
    new JavaLazyInsertIterable<>(repartitionedRecords.iterator(), true, config, instantTime, table, fileIdPrefixProvider.createFilePrefix(""), table.getTaskContextSupplier(), new CreateHandleFactory<>()).forEachRemaining(writeStatuses::addAll);
    return writeStatuses;
}
Also used : FileIdPrefixProvider(org.apache.hudi.table.FileIdPrefixProvider) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) ArrayList(java.util.ArrayList) CreateHandleFactory(org.apache.hudi.io.CreateHandleFactory) BulkInsertPartitioner(org.apache.hudi.table.BulkInsertPartitioner) WriteStatus(org.apache.hudi.client.WriteStatus)

Aggregations

ArrayList (java.util.ArrayList)1 WriteStatus (org.apache.hudi.client.WriteStatus)1 HoodieRecord (org.apache.hudi.common.model.HoodieRecord)1 CreateHandleFactory (org.apache.hudi.io.CreateHandleFactory)1 BulkInsertPartitioner (org.apache.hudi.table.BulkInsertPartitioner)1 FileIdPrefixProvider (org.apache.hudi.table.FileIdPrefixProvider)1