Search in sources :

Example 1 with JavaTaskContextSupplier

use of org.apache.hudi.client.common.JavaTaskContextSupplier in project hudi by apache.

the class JavaExecutionStrategy method readRecordsForGroupWithLogs.

/**
 * Read records from baseFiles and apply updates.
 */
private List<HoodieRecord<T>> readRecordsForGroupWithLogs(List<ClusteringOperation> clusteringOps, String instantTime) {
    HoodieWriteConfig config = getWriteConfig();
    HoodieTable table = getHoodieTable();
    List<HoodieRecord<T>> records = new ArrayList<>();
    clusteringOps.forEach(clusteringOp -> {
        long maxMemoryPerCompaction = IOUtils.getMaxMemoryPerCompaction(new JavaTaskContextSupplier(), config);
        LOG.info("MaxMemoryPerCompaction run as part of clustering => " + maxMemoryPerCompaction);
        try {
            Schema readerSchema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(config.getSchema()));
            HoodieMergedLogRecordScanner scanner = HoodieMergedLogRecordScanner.newBuilder().withFileSystem(table.getMetaClient().getFs()).withBasePath(table.getMetaClient().getBasePath()).withLogFilePaths(clusteringOp.getDeltaFilePaths()).withReaderSchema(readerSchema).withLatestInstantTime(instantTime).withMaxMemorySizeInBytes(maxMemoryPerCompaction).withReadBlocksLazily(config.getCompactionLazyBlockReadEnabled()).withReverseReader(config.getCompactionReverseLogReadEnabled()).withBufferSize(config.getMaxDFSStreamBufferSize()).withSpillableMapBasePath(config.getSpillableMapBasePath()).withPartition(clusteringOp.getPartitionPath()).build();
            Option<HoodieFileReader> baseFileReader = StringUtils.isNullOrEmpty(clusteringOp.getDataFilePath()) ? Option.empty() : Option.of(HoodieFileReaderFactory.getFileReader(table.getHadoopConf(), new Path(clusteringOp.getDataFilePath())));
            HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
            Iterator<HoodieRecord<T>> fileSliceReader = getFileSliceReader(baseFileReader, scanner, readerSchema, tableConfig.getPayloadClass(), tableConfig.getPreCombineField(), tableConfig.populateMetaFields() ? Option.empty() : Option.of(Pair.of(tableConfig.getRecordKeyFieldProp(), tableConfig.getPartitionFieldProp())));
            fileSliceReader.forEachRemaining(records::add);
        } catch (IOException e) {
            throw new HoodieClusteringException("Error reading input data for " + clusteringOp.getDataFilePath() + " and " + clusteringOp.getDeltaFilePaths(), e);
        }
    });
    return records;
}
Also used : Path(org.apache.hadoop.fs.Path) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) HoodieMergedLogRecordScanner(org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner) Schema(org.apache.avro.Schema) ArrayList(java.util.ArrayList) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) HoodieFileReader(org.apache.hudi.io.storage.HoodieFileReader) IOException(java.io.IOException) HoodieTableConfig(org.apache.hudi.common.table.HoodieTableConfig) HoodieClusteringException(org.apache.hudi.exception.HoodieClusteringException) HoodieTable(org.apache.hudi.table.HoodieTable) JavaTaskContextSupplier(org.apache.hudi.client.common.JavaTaskContextSupplier)

Aggregations

IOException (java.io.IOException)1 ArrayList (java.util.ArrayList)1 Schema (org.apache.avro.Schema)1 Path (org.apache.hadoop.fs.Path)1 JavaTaskContextSupplier (org.apache.hudi.client.common.JavaTaskContextSupplier)1 HoodieRecord (org.apache.hudi.common.model.HoodieRecord)1 HoodieTableConfig (org.apache.hudi.common.table.HoodieTableConfig)1 HoodieMergedLogRecordScanner (org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner)1 HoodieWriteConfig (org.apache.hudi.config.HoodieWriteConfig)1 HoodieClusteringException (org.apache.hudi.exception.HoodieClusteringException)1 HoodieFileReader (org.apache.hudi.io.storage.HoodieFileReader)1 HoodieTable (org.apache.hudi.table.HoodieTable)1