Search in sources :

Example 1 with HoodieParquetWriter

use of org.apache.hudi.io.storage.HoodieParquetWriter in project hudi by apache.

the class HoodieWriteableTestTable method withInserts.

public Path withInserts(String partition, String fileId, List<HoodieRecord> records, TaskContextSupplier contextSupplier) throws Exception {
    FileCreateUtils.createPartitionMetaFile(basePath, partition);
    String fileName = baseFileName(currentInstantTime, fileId);
    Path baseFilePath = new Path(Paths.get(basePath, partition, fileName).toString());
    if (this.fs.exists(baseFilePath)) {
        LOG.warn("Deleting the existing base file " + baseFilePath);
        this.fs.delete(baseFilePath, true);
    }
    if (HoodieTableConfig.BASE_FILE_FORMAT.defaultValue().equals(HoodieFileFormat.PARQUET)) {
        HoodieAvroWriteSupport writeSupport = new HoodieAvroWriteSupport(new AvroSchemaConverter().convert(schema), schema, Option.of(filter));
        HoodieAvroParquetConfig config = new HoodieAvroParquetConfig(writeSupport, CompressionCodecName.GZIP, ParquetWriter.DEFAULT_BLOCK_SIZE, ParquetWriter.DEFAULT_PAGE_SIZE, 120 * 1024 * 1024, new Configuration(), Double.parseDouble(HoodieStorageConfig.PARQUET_COMPRESSION_RATIO_FRACTION.defaultValue()));
        try (HoodieParquetWriter writer = new HoodieParquetWriter(currentInstantTime, new Path(Paths.get(basePath, partition, fileName).toString()), config, schema, contextSupplier, populateMetaFields)) {
            int seqId = 1;
            for (HoodieRecord record : records) {
                GenericRecord avroRecord = (GenericRecord) ((HoodieRecordPayload) record.getData()).getInsertValue(schema).get();
                if (populateMetaFields) {
                    HoodieAvroUtils.addCommitMetadataToRecord(avroRecord, currentInstantTime, String.valueOf(seqId++));
                    HoodieAvroUtils.addHoodieKeyToRecord(avroRecord, record.getRecordKey(), record.getPartitionPath(), fileName);
                    writer.writeAvro(record.getRecordKey(), avroRecord);
                    filter.add(record.getRecordKey());
                } else {
                    writer.writeAvro(record.getRecordKey(), avroRecord);
                }
            }
        }
    } else if (HoodieTableConfig.BASE_FILE_FORMAT.defaultValue().equals(HoodieFileFormat.ORC)) {
        Configuration conf = new Configuration();
        int orcStripSize = Integer.parseInt(HoodieStorageConfig.ORC_STRIPE_SIZE.defaultValue());
        int orcBlockSize = Integer.parseInt(HoodieStorageConfig.ORC_BLOCK_SIZE.defaultValue());
        int maxFileSize = Integer.parseInt(HoodieStorageConfig.ORC_FILE_MAX_SIZE.defaultValue());
        HoodieOrcConfig config = new HoodieOrcConfig(conf, CompressionKind.ZLIB, orcStripSize, orcBlockSize, maxFileSize, filter);
        try (HoodieOrcWriter writer = new HoodieOrcWriter(currentInstantTime, new Path(Paths.get(basePath, partition, fileName).toString()), config, schema, contextSupplier)) {
            int seqId = 1;
            for (HoodieRecord record : records) {
                GenericRecord avroRecord = (GenericRecord) ((HoodieRecordPayload) record.getData()).getInsertValue(schema).get();
                HoodieAvroUtils.addCommitMetadataToRecord(avroRecord, currentInstantTime, String.valueOf(seqId++));
                HoodieAvroUtils.addHoodieKeyToRecord(avroRecord, record.getRecordKey(), record.getPartitionPath(), fileName);
                writer.writeAvro(record.getRecordKey(), avroRecord);
                filter.add(record.getRecordKey());
            }
        }
    }
    return baseFilePath;
}
Also used : Path(org.apache.hadoop.fs.Path) HoodieParquetWriter(org.apache.hudi.io.storage.HoodieParquetWriter) AvroSchemaConverter(org.apache.parquet.avro.AvroSchemaConverter) Configuration(org.apache.hadoop.conf.Configuration) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) HoodieAvroWriteSupport(org.apache.hudi.avro.HoodieAvroWriteSupport) HoodieOrcConfig(org.apache.hudi.io.storage.HoodieOrcConfig) HoodieRecordPayload(org.apache.hudi.common.model.HoodieRecordPayload) HoodieOrcWriter(org.apache.hudi.io.storage.HoodieOrcWriter) HoodieAvroParquetConfig(org.apache.hudi.io.storage.HoodieAvroParquetConfig) GenericRecord(org.apache.avro.generic.GenericRecord)

Aggregations

GenericRecord (org.apache.avro.generic.GenericRecord)1 Configuration (org.apache.hadoop.conf.Configuration)1 Path (org.apache.hadoop.fs.Path)1 HoodieAvroWriteSupport (org.apache.hudi.avro.HoodieAvroWriteSupport)1 HoodieRecord (org.apache.hudi.common.model.HoodieRecord)1 HoodieRecordPayload (org.apache.hudi.common.model.HoodieRecordPayload)1 HoodieAvroParquetConfig (org.apache.hudi.io.storage.HoodieAvroParquetConfig)1 HoodieOrcConfig (org.apache.hudi.io.storage.HoodieOrcConfig)1 HoodieOrcWriter (org.apache.hudi.io.storage.HoodieOrcWriter)1 HoodieParquetWriter (org.apache.hudi.io.storage.HoodieParquetWriter)1 AvroSchemaConverter (org.apache.parquet.avro.AvroSchemaConverter)1