Search in sources :

Example 51 with HoodieWriteStat

use of org.apache.hudi.common.model.HoodieWriteStat in project hudi by apache.

the class HoodieMergeHandle method close.

@Override
public List<WriteStatus> close() {
    try {
        writeIncomingRecords();
        if (keyToNewRecords instanceof ExternalSpillableMap) {
            ((ExternalSpillableMap) keyToNewRecords).close();
        } else {
            keyToNewRecords.clear();
        }
        writtenRecordKeys.clear();
        if (fileWriter != null) {
            fileWriter.close();
            fileWriter = null;
        }
        long fileSizeInBytes = FSUtils.getFileSize(fs, newFilePath);
        HoodieWriteStat stat = writeStatus.getStat();
        stat.setTotalWriteBytes(fileSizeInBytes);
        stat.setFileSizeInBytes(fileSizeInBytes);
        stat.setNumWrites(recordsWritten);
        stat.setNumDeletes(recordsDeleted);
        stat.setNumUpdateWrites(updatedRecordsWritten);
        stat.setNumInserts(insertRecordsWritten);
        stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords());
        RuntimeStats runtimeStats = new RuntimeStats();
        runtimeStats.setTotalUpsertTime(timer.endTimer());
        stat.setRuntimeStats(runtimeStats);
        performMergeDataValidationCheck(writeStatus);
        LOG.info(String.format("MergeHandle for partitionPath %s fileID %s, took %d ms.", stat.getPartitionPath(), stat.getFileId(), runtimeStats.getTotalUpsertTime()));
        return Collections.singletonList(writeStatus);
    } catch (IOException e) {
        throw new HoodieUpsertException("Failed to close UpdateHandle", e);
    }
}
Also used : HoodieWriteStat(org.apache.hudi.common.model.HoodieWriteStat) HoodieUpsertException(org.apache.hudi.exception.HoodieUpsertException) ExternalSpillableMap(org.apache.hudi.common.util.collection.ExternalSpillableMap) RuntimeStats(org.apache.hudi.common.model.HoodieWriteStat.RuntimeStats) IOException(java.io.IOException) HoodieIOException(org.apache.hudi.exception.HoodieIOException)

Example 52 with HoodieWriteStat

use of org.apache.hudi.common.model.HoodieWriteStat in project hudi by apache.

the class IncrementalTimelineSyncFileSystemView method updatePartitionWriteFileGroups.

private void updatePartitionWriteFileGroups(Map<String, List<HoodieWriteStat>> partitionToWriteStats, HoodieTimeline timeline, HoodieInstant instant) {
    partitionToWriteStats.entrySet().stream().forEach(entry -> {
        String partition = entry.getKey();
        if (isPartitionAvailableInStore(partition)) {
            LOG.info("Syncing partition (" + partition + ") of instant (" + instant + ")");
            FileStatus[] statuses = entry.getValue().stream().map(p -> {
                FileStatus status = new FileStatus(p.getFileSizeInBytes(), false, 0, 0, 0, 0, null, null, null, new Path(String.format("%s/%s", metaClient.getBasePath(), p.getPath())));
                return status;
            }).toArray(FileStatus[]::new);
            List<HoodieFileGroup> fileGroups = buildFileGroups(statuses, timeline.filterCompletedAndCompactionInstants(), false);
            applyDeltaFileSlicesToPartitionView(partition, fileGroups, DeltaApplyMode.ADD);
        } else {
            LOG.warn("Skipping partition (" + partition + ") when syncing instant (" + instant + ") as it is not loaded");
        }
    });
    LOG.info("Done Syncing committed instant (" + instant + ")");
}
Also used : HoodieInstant(org.apache.hudi.common.table.timeline.HoodieInstant) FileSlice(org.apache.hudi.common.model.FileSlice) TimelineDiffHelper(org.apache.hudi.common.table.timeline.TimelineDiffHelper) HoodieException(org.apache.hudi.exception.HoodieException) Option(org.apache.hudi.common.util.Option) FileStatus(org.apache.hadoop.fs.FileStatus) Logger(org.apache.log4j.Logger) HoodieFileGroup(org.apache.hudi.common.model.HoodieFileGroup) CleanerUtils(org.apache.hudi.common.util.CleanerUtils) Map(java.util.Map) HoodieRollbackMetadata(org.apache.hudi.avro.model.HoodieRollbackMetadata) Path(org.apache.hadoop.fs.Path) HoodieLogFile(org.apache.hudi.common.model.HoodieLogFile) HoodieFileGroupId(org.apache.hudi.common.model.HoodieFileGroupId) HoodieTimeline(org.apache.hudi.common.table.timeline.HoodieTimeline) Set(java.util.Set) HoodieCommitMetadata(org.apache.hudi.common.model.HoodieCommitMetadata) TimelineMetadataUtils(org.apache.hudi.common.table.timeline.TimelineMetadataUtils) IOException(java.io.IOException) Collectors(java.util.stream.Collectors) CompactionOperation(org.apache.hudi.common.model.CompactionOperation) HoodieReplaceCommitMetadata(org.apache.hudi.common.model.HoodieReplaceCommitMetadata) HoodieBaseFile(org.apache.hudi.common.model.HoodieBaseFile) List(java.util.List) HoodieCleanMetadata(org.apache.hudi.avro.model.HoodieCleanMetadata) TimelineDiffResult(org.apache.hudi.common.table.timeline.TimelineDiffHelper.TimelineDiffResult) HoodieWriteStat(org.apache.hudi.common.model.HoodieWriteStat) HoodieCompactionPlan(org.apache.hudi.avro.model.HoodieCompactionPlan) HoodieRestoreMetadata(org.apache.hudi.avro.model.HoodieRestoreMetadata) LogManager(org.apache.log4j.LogManager) FSUtils(org.apache.hudi.common.fs.FSUtils) CompactionUtils(org.apache.hudi.common.util.CompactionUtils) Pair(org.apache.hudi.common.util.collection.Pair) Path(org.apache.hadoop.fs.Path) FileStatus(org.apache.hadoop.fs.FileStatus) HoodieFileGroup(org.apache.hudi.common.model.HoodieFileGroup)

Example 53 with HoodieWriteStat

use of org.apache.hudi.common.model.HoodieWriteStat in project hudi by apache.

the class HiveTestUtil method createLogFiles.

private static HoodieCommitMetadata createLogFiles(Map<String, List<HoodieWriteStat>> partitionWriteStats, boolean isLogSchemaSimple, boolean useSchemaFromCommitMetadata) throws InterruptedException, IOException, URISyntaxException {
    HoodieCommitMetadata commitMetadata = new HoodieCommitMetadata();
    for (Entry<String, List<HoodieWriteStat>> wEntry : partitionWriteStats.entrySet()) {
        String partitionPath = wEntry.getKey();
        for (HoodieWriteStat wStat : wEntry.getValue()) {
            Path path = new Path(wStat.getPath());
            HoodieBaseFile dataFile = new HoodieBaseFile(fileSystem.getFileStatus(path));
            HoodieLogFile logFile = generateLogData(path, isLogSchemaSimple);
            HoodieDeltaWriteStat writeStat = new HoodieDeltaWriteStat();
            writeStat.setFileId(dataFile.getFileId());
            writeStat.setPath(logFile.getPath().toString());
            commitMetadata.addWriteStat(partitionPath, writeStat);
        }
    }
    addSchemaToCommitMetadata(commitMetadata, isLogSchemaSimple, useSchemaFromCommitMetadata);
    return commitMetadata;
}
Also used : HoodieCommitMetadata(org.apache.hudi.common.model.HoodieCommitMetadata) Path(org.apache.hadoop.fs.Path) HoodieWriteStat(org.apache.hudi.common.model.HoodieWriteStat) HoodieBaseFile(org.apache.hudi.common.model.HoodieBaseFile) HoodieDeltaWriteStat(org.apache.hudi.common.model.HoodieDeltaWriteStat) List(java.util.List) ArrayList(java.util.ArrayList) HoodieLogFile(org.apache.hudi.common.model.HoodieLogFile)

Example 54 with HoodieWriteStat

use of org.apache.hudi.common.model.HoodieWriteStat in project hudi by apache.

the class HiveTestUtil method createTestData.

private static List<HoodieWriteStat> createTestData(Path partPath, boolean isParquetSchemaSimple, String instantTime) throws IOException, URISyntaxException {
    List<HoodieWriteStat> writeStats = new ArrayList<>();
    for (int i = 0; i < 5; i++) {
        // Create 5 files
        String fileId = UUID.randomUUID().toString();
        Path filePath = new Path(partPath.toString() + "/" + FSUtils.makeDataFileName(instantTime, "1-0-1", fileId));
        generateParquetData(filePath, isParquetSchemaSimple);
        HoodieWriteStat writeStat = new HoodieWriteStat();
        writeStat.setFileId(fileId);
        writeStat.setPath(filePath.toString());
        writeStats.add(writeStat);
    }
    return writeStats;
}
Also used : Path(org.apache.hadoop.fs.Path) HoodieWriteStat(org.apache.hudi.common.model.HoodieWriteStat) ArrayList(java.util.ArrayList)

Example 55 with HoodieWriteStat

use of org.apache.hudi.common.model.HoodieWriteStat in project hudi by apache.

the class HiveTestUtil method createPartitions.

private static HoodieCommitMetadata createPartitions(int numberOfPartitions, boolean isParquetSchemaSimple, boolean useSchemaFromCommitMetadata, ZonedDateTime startFrom, String instantTime, String basePath) throws IOException, URISyntaxException {
    startFrom = startFrom.truncatedTo(ChronoUnit.DAYS);
    HoodieCommitMetadata commitMetadata = new HoodieCommitMetadata();
    for (int i = 0; i < numberOfPartitions; i++) {
        String partitionPath = startFrom.format(dtfOut);
        Path partPath = new Path(basePath + "/" + partitionPath);
        fileSystem.makeQualified(partPath);
        fileSystem.mkdirs(partPath);
        List<HoodieWriteStat> writeStats = createTestData(partPath, isParquetSchemaSimple, instantTime);
        startFrom = startFrom.minusDays(1);
        writeStats.forEach(s -> commitMetadata.addWriteStat(partitionPath, s));
    }
    addSchemaToCommitMetadata(commitMetadata, isParquetSchemaSimple, useSchemaFromCommitMetadata);
    return commitMetadata;
}
Also used : HoodieCommitMetadata(org.apache.hudi.common.model.HoodieCommitMetadata) Path(org.apache.hadoop.fs.Path) HoodieWriteStat(org.apache.hudi.common.model.HoodieWriteStat)

Aggregations

HoodieWriteStat (org.apache.hudi.common.model.HoodieWriteStat)74 HoodieCommitMetadata (org.apache.hudi.common.model.HoodieCommitMetadata)42 List (java.util.List)38 ArrayList (java.util.ArrayList)33 HashMap (java.util.HashMap)32 Map (java.util.Map)32 Path (org.apache.hadoop.fs.Path)28 HoodieInstant (org.apache.hudi.common.table.timeline.HoodieInstant)24 HoodieTimeline (org.apache.hudi.common.table.timeline.HoodieTimeline)23 IOException (java.io.IOException)22 Option (org.apache.hudi.common.util.Option)19 Collectors (java.util.stream.Collectors)18 HoodieTableMetaClient (org.apache.hudi.common.table.HoodieTableMetaClient)18 WriteStatus (org.apache.hudi.client.WriteStatus)17 HoodieReplaceCommitMetadata (org.apache.hudi.common.model.HoodieReplaceCommitMetadata)17 LogManager (org.apache.log4j.LogManager)16 Logger (org.apache.log4j.Logger)16 HoodieWriteConfig (org.apache.hudi.config.HoodieWriteConfig)15 FileSlice (org.apache.hudi.common.model.FileSlice)14 HoodieRecord (org.apache.hudi.common.model.HoodieRecord)14