Search in sources :

Example 1 with CreateHandleFactory

use of org.apache.hudi.io.CreateHandleFactory in project hudi by apache.

the class JavaSortAndSizeExecutionStrategy method performClusteringWithRecordList.

@Override
public List<WriteStatus> performClusteringWithRecordList(final List<HoodieRecord<T>> inputRecords, final int numOutputGroups, final String instantTime, final Map<String, String> strategyParams, final Schema schema, final List<HoodieFileGroupId> fileGroupIdList, final boolean preserveHoodieMetadata) {
    LOG.info("Starting clustering for a group, parallelism:" + numOutputGroups + " commit:" + instantTime);
    Properties props = getWriteConfig().getProps();
    props.put(HoodieWriteConfig.BULKINSERT_PARALLELISM_VALUE.key(), String.valueOf(numOutputGroups));
    // We are calling another action executor - disable auto commit. Strategy is only expected to write data in new files.
    props.put(HoodieWriteConfig.AUTO_COMMIT_ENABLE.key(), Boolean.FALSE.toString());
    props.put(HoodieStorageConfig.PARQUET_MAX_FILE_SIZE.key(), String.valueOf(getWriteConfig().getClusteringTargetFileMaxBytes()));
    HoodieWriteConfig newConfig = HoodieWriteConfig.newBuilder().withProps(props).build();
    return (List<WriteStatus>) JavaBulkInsertHelper.newInstance().bulkInsert(inputRecords, instantTime, getHoodieTable(), newConfig, false, getPartitioner(strategyParams, schema), true, numOutputGroups, new CreateHandleFactory(preserveHoodieMetadata));
}
Also used : HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) List(java.util.List) CreateHandleFactory(org.apache.hudi.io.CreateHandleFactory) Properties(java.util.Properties)

Example 2 with CreateHandleFactory

use of org.apache.hudi.io.CreateHandleFactory in project hudi by apache.

the class JavaBulkInsertHelper method bulkInsert.

@Override
public List<WriteStatus> bulkInsert(List<HoodieRecord<T>> inputRecords, String instantTime, HoodieTable<T, List<HoodieRecord<T>>, List<HoodieKey>, List<WriteStatus>> table, HoodieWriteConfig config, boolean performDedupe, Option<BulkInsertPartitioner> userDefinedBulkInsertPartitioner, boolean useWriterSchema, int parallelism, WriteHandleFactory writeHandleFactory) {
    // De-dupe/merge if needed
    List<HoodieRecord<T>> dedupedRecords = inputRecords;
    if (performDedupe) {
        dedupedRecords = (List<HoodieRecord<T>>) JavaWriteHelper.newInstance().combineOnCondition(config.shouldCombineBeforeInsert(), inputRecords, parallelism, table);
    }
    final List<HoodieRecord<T>> repartitionedRecords;
    BulkInsertPartitioner partitioner = userDefinedBulkInsertPartitioner.isPresent() ? userDefinedBulkInsertPartitioner.get() : JavaBulkInsertInternalPartitionerFactory.get(config.getBulkInsertSortMode());
    // only List is supported for Java partitioner, but it is not enforced by BulkInsertPartitioner API. To improve this, TODO HUDI-3463
    repartitionedRecords = (List<HoodieRecord<T>>) partitioner.repartitionRecords(dedupedRecords, parallelism);
    FileIdPrefixProvider fileIdPrefixProvider = (FileIdPrefixProvider) ReflectionUtils.loadClass(config.getFileIdPrefixProviderClassName(), config.getProps());
    List<WriteStatus> writeStatuses = new ArrayList<>();
    new JavaLazyInsertIterable<>(repartitionedRecords.iterator(), true, config, instantTime, table, fileIdPrefixProvider.createFilePrefix(""), table.getTaskContextSupplier(), new CreateHandleFactory<>()).forEachRemaining(writeStatuses::addAll);
    return writeStatuses;
}
Also used : FileIdPrefixProvider(org.apache.hudi.table.FileIdPrefixProvider) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) ArrayList(java.util.ArrayList) CreateHandleFactory(org.apache.hudi.io.CreateHandleFactory) BulkInsertPartitioner(org.apache.hudi.table.BulkInsertPartitioner) WriteStatus(org.apache.hudi.client.WriteStatus)

Example 3 with CreateHandleFactory

use of org.apache.hudi.io.CreateHandleFactory in project hudi by apache.

the class JavaBulkInsertHelper method bulkInsert.

@Override
public HoodieWriteMetadata<List<WriteStatus>> bulkInsert(final List<HoodieRecord<T>> inputRecords, final String instantTime, final HoodieTable<T, List<HoodieRecord<T>>, List<HoodieKey>, List<WriteStatus>> table, final HoodieWriteConfig config, final BaseCommitActionExecutor<T, List<HoodieRecord<T>>, List<HoodieKey>, List<WriteStatus>, R> executor, final boolean performDedupe, final Option<BulkInsertPartitioner> userDefinedBulkInsertPartitioner) {
    HoodieWriteMetadata result = new HoodieWriteMetadata();
    // It's possible the transition to inflight could have already happened.
    if (!table.getActiveTimeline().filterInflights().containsInstant(instantTime)) {
        table.getActiveTimeline().transitionRequestedToInflight(new HoodieInstant(HoodieInstant.State.REQUESTED, table.getMetaClient().getCommitActionType(), instantTime), Option.empty(), config.shouldAllowMultiWriteOnSameInstant());
    }
    // write new files
    List<WriteStatus> writeStatuses = bulkInsert(inputRecords, instantTime, table, config, performDedupe, userDefinedBulkInsertPartitioner, false, config.getBulkInsertShuffleParallelism(), new CreateHandleFactory(false));
    // update index
    ((BaseJavaCommitActionExecutor) executor).updateIndexAndCommitIfNeeded(writeStatuses, result);
    return result;
}
Also used : HoodieInstant(org.apache.hudi.common.table.timeline.HoodieInstant) HoodieWriteMetadata(org.apache.hudi.table.action.HoodieWriteMetadata) CreateHandleFactory(org.apache.hudi.io.CreateHandleFactory) WriteStatus(org.apache.hudi.client.WriteStatus)

Example 4 with CreateHandleFactory

use of org.apache.hudi.io.CreateHandleFactory in project hudi by apache.

the class SparkSortAndSizeExecutionStrategy method performClusteringWithRecordsRDD.

@Override
public HoodieData<WriteStatus> performClusteringWithRecordsRDD(final HoodieData<HoodieRecord<T>> inputRecords, final int numOutputGroups, final String instantTime, final Map<String, String> strategyParams, final Schema schema, final List<HoodieFileGroupId> fileGroupIdList, final boolean preserveHoodieMetadata) {
    LOG.info("Starting clustering for a group, parallelism:" + numOutputGroups + " commit:" + instantTime);
    Properties props = getWriteConfig().getProps();
    props.put(HoodieWriteConfig.BULKINSERT_PARALLELISM_VALUE.key(), String.valueOf(numOutputGroups));
    // We are calling another action executor - disable auto commit. Strategy is only expected to write data in new files.
    props.put(HoodieWriteConfig.AUTO_COMMIT_ENABLE.key(), Boolean.FALSE.toString());
    props.put(HoodieStorageConfig.PARQUET_MAX_FILE_SIZE.key(), String.valueOf(getWriteConfig().getClusteringTargetFileMaxBytes()));
    HoodieWriteConfig newConfig = HoodieWriteConfig.newBuilder().withProps(props).build();
    return (HoodieData<WriteStatus>) SparkBulkInsertHelper.newInstance().bulkInsert(inputRecords, instantTime, getHoodieTable(), newConfig, false, getPartitioner(strategyParams, schema), true, numOutputGroups, new CreateHandleFactory(preserveHoodieMetadata));
}
Also used : HoodieData(org.apache.hudi.common.data.HoodieData) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) CreateHandleFactory(org.apache.hudi.io.CreateHandleFactory) Properties(java.util.Properties)

Example 5 with CreateHandleFactory

use of org.apache.hudi.io.CreateHandleFactory in project hudi by apache.

the class SparkBulkInsertHelper method bulkInsert.

@Override
public HoodieWriteMetadata<HoodieData<WriteStatus>> bulkInsert(final HoodieData<HoodieRecord<T>> inputRecords, final String instantTime, final HoodieTable<T, HoodieData<HoodieRecord<T>>, HoodieData<HoodieKey>, HoodieData<WriteStatus>> table, final HoodieWriteConfig config, final BaseCommitActionExecutor<T, HoodieData<HoodieRecord<T>>, HoodieData<HoodieKey>, HoodieData<WriteStatus>, R> executor, final boolean performDedupe, final Option<BulkInsertPartitioner> userDefinedBulkInsertPartitioner) {
    HoodieWriteMetadata result = new HoodieWriteMetadata();
    // transition bulk_insert state to inflight
    table.getActiveTimeline().transitionRequestedToInflight(new HoodieInstant(HoodieInstant.State.REQUESTED, executor.getCommitActionType(), instantTime), Option.empty(), config.shouldAllowMultiWriteOnSameInstant());
    // write new files
    HoodieData<WriteStatus> writeStatuses = bulkInsert(inputRecords, instantTime, table, config, performDedupe, userDefinedBulkInsertPartitioner, false, config.getBulkInsertShuffleParallelism(), new CreateHandleFactory(false));
    // update index
    ((BaseSparkCommitActionExecutor) executor).updateIndexAndCommitIfNeeded(writeStatuses, result);
    return result;
}
Also used : HoodieInstant(org.apache.hudi.common.table.timeline.HoodieInstant) HoodieWriteMetadata(org.apache.hudi.table.action.HoodieWriteMetadata) CreateHandleFactory(org.apache.hudi.io.CreateHandleFactory) WriteStatus(org.apache.hudi.client.WriteStatus)

Aggregations

CreateHandleFactory (org.apache.hudi.io.CreateHandleFactory)5 WriteStatus (org.apache.hudi.client.WriteStatus)3 Properties (java.util.Properties)2 HoodieInstant (org.apache.hudi.common.table.timeline.HoodieInstant)2 HoodieWriteConfig (org.apache.hudi.config.HoodieWriteConfig)2 HoodieWriteMetadata (org.apache.hudi.table.action.HoodieWriteMetadata)2 ArrayList (java.util.ArrayList)1 List (java.util.List)1 HoodieData (org.apache.hudi.common.data.HoodieData)1 HoodieRecord (org.apache.hudi.common.model.HoodieRecord)1 BulkInsertPartitioner (org.apache.hudi.table.BulkInsertPartitioner)1 FileIdPrefixProvider (org.apache.hudi.table.FileIdPrefixProvider)1