Search in sources :

Example 26 with HoodieWriteMetadata

use of org.apache.hudi.table.action.HoodieWriteMetadata in project hudi by apache.

the class SparkBulkInsertHelper method bulkInsert.

@Override
public HoodieWriteMetadata<HoodieData<WriteStatus>> bulkInsert(final HoodieData<HoodieRecord<T>> inputRecords, final String instantTime, final HoodieTable<T, HoodieData<HoodieRecord<T>>, HoodieData<HoodieKey>, HoodieData<WriteStatus>> table, final HoodieWriteConfig config, final BaseCommitActionExecutor<T, HoodieData<HoodieRecord<T>>, HoodieData<HoodieKey>, HoodieData<WriteStatus>, R> executor, final boolean performDedupe, final Option<BulkInsertPartitioner> userDefinedBulkInsertPartitioner) {
    HoodieWriteMetadata result = new HoodieWriteMetadata();
    // transition bulk_insert state to inflight
    table.getActiveTimeline().transitionRequestedToInflight(new HoodieInstant(HoodieInstant.State.REQUESTED, executor.getCommitActionType(), instantTime), Option.empty(), config.shouldAllowMultiWriteOnSameInstant());
    // write new files
    HoodieData<WriteStatus> writeStatuses = bulkInsert(inputRecords, instantTime, table, config, performDedupe, userDefinedBulkInsertPartitioner, false, config.getBulkInsertShuffleParallelism(), new CreateHandleFactory(false));
    // update index
    ((BaseSparkCommitActionExecutor) executor).updateIndexAndCommitIfNeeded(writeStatuses, result);
    return result;
}
Also used : HoodieInstant(org.apache.hudi.common.table.timeline.HoodieInstant) HoodieWriteMetadata(org.apache.hudi.table.action.HoodieWriteMetadata) CreateHandleFactory(org.apache.hudi.io.CreateHandleFactory) WriteStatus(org.apache.hudi.client.WriteStatus)

Example 27 with HoodieWriteMetadata

use of org.apache.hudi.table.action.HoodieWriteMetadata in project hudi by apache.

the class SparkValidatorUtils method getRecordsFromPendingCommits.

/**
 * Get reads from partitions modified including any inflight commits.
 * Note that this only works for COW tables
 */
public static Dataset<Row> getRecordsFromPendingCommits(SQLContext sqlContext, Set<String> partitionsAffected, HoodieWriteMetadata<HoodieData<WriteStatus>> writeMetadata, HoodieTable table, String instantTime) {
    // build file system view with pending commits
    HoodieTablePreCommitFileSystemView fsView = new HoodieTablePreCommitFileSystemView(table.getMetaClient(), table.getHoodieView(), writeMetadata.getWriteStats().get(), writeMetadata.getPartitionToReplaceFileIds(), instantTime);
    List<String> newFiles = partitionsAffected.stream().flatMap(partition -> fsView.getLatestBaseFiles(partition).map(BaseFile::getPath)).collect(Collectors.toList());
    if (newFiles.isEmpty()) {
        return sqlContext.emptyDataFrame();
    }
    return readRecordsForBaseFiles(sqlContext, newFiles);
}
Also used : HoodieTable(org.apache.hudi.table.HoodieTable) Arrays(java.util.Arrays) Dataset(org.apache.spark.sql.Dataset) CompletableFuture(java.util.concurrent.CompletableFuture) HoodieEngineContext(org.apache.hudi.common.engine.HoodieEngineContext) HoodieValidationException(org.apache.hudi.exception.HoodieValidationException) BaseSparkCommitActionExecutor(org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor) Logger(org.apache.log4j.Logger) StringUtils(org.apache.hudi.common.util.StringUtils) HoodieSparkTable(org.apache.hudi.table.HoodieSparkTable) BaseFile(org.apache.hudi.common.model.BaseFile) HoodieSparkEngineContext(org.apache.hudi.client.common.HoodieSparkEngineContext) HoodieWriteMetadata(org.apache.hudi.table.action.HoodieWriteMetadata) HoodieData(org.apache.hudi.common.data.HoodieData) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) SQLContext(org.apache.spark.sql.SQLContext) Set(java.util.Set) Row(org.apache.spark.sql.Row) Collectors(java.util.stream.Collectors) WriteStatus(org.apache.hudi.client.WriteStatus) SparkPreCommitValidator(org.apache.hudi.client.validator.SparkPreCommitValidator) List(java.util.List) Stream(java.util.stream.Stream) HoodieTablePreCommitFileSystemView(org.apache.hudi.common.table.view.HoodieTablePreCommitFileSystemView) JavaConverters(scala.collection.JavaConverters) ReflectionUtils(org.apache.hudi.common.util.ReflectionUtils) LogManager(org.apache.log4j.LogManager) BaseFile(org.apache.hudi.common.model.BaseFile) HoodieTablePreCommitFileSystemView(org.apache.hudi.common.table.view.HoodieTablePreCommitFileSystemView)

Aggregations

HoodieWriteMetadata (org.apache.hudi.table.action.HoodieWriteMetadata)27 WriteStatus (org.apache.hudi.client.WriteStatus)23 List (java.util.List)20 HoodieWriteConfig (org.apache.hudi.config.HoodieWriteConfig)16 Collectors (java.util.stream.Collectors)15 HoodieRecord (org.apache.hudi.common.model.HoodieRecord)15 HoodieInstant (org.apache.hudi.common.table.timeline.HoodieInstant)14 HoodieTable (org.apache.hudi.table.HoodieTable)14 IOException (java.io.IOException)12 HoodieTableMetaClient (org.apache.hudi.common.table.HoodieTableMetaClient)12 JavaRDD (org.apache.spark.api.java.JavaRDD)12 HoodieData (org.apache.hudi.common.data.HoodieData)11 HoodieTimeline (org.apache.hudi.common.table.timeline.HoodieTimeline)11 Option (org.apache.hudi.common.util.Option)11 Path (org.apache.hadoop.fs.Path)10 HoodieSparkTable (org.apache.hudi.table.HoodieSparkTable)10 HashMap (java.util.HashMap)9 Map (java.util.Map)9 Stream (java.util.stream.Stream)9 HoodieKey (org.apache.hudi.common.model.HoodieKey)9