Search in sources :

Example 6 with HoodieAvroWriteSupport

use of org.apache.hudi.avro.HoodieAvroWriteSupport in project hudi by apache.

the class HoodieFileWriterFactory method newParquetFileWriter.

private static <T extends HoodieRecordPayload, R extends IndexedRecord> HoodieFileWriter<R> newParquetFileWriter(String instantTime, Path path, HoodieWriteConfig config, Schema schema, HoodieTable hoodieTable, TaskContextSupplier taskContextSupplier, boolean populateMetaFields, boolean enableBloomFilter) throws IOException {
    Option<BloomFilter> filter = enableBloomFilter ? Option.of(createBloomFilter(config)) : Option.empty();
    HoodieAvroWriteSupport writeSupport = new HoodieAvroWriteSupport(new AvroSchemaConverter(hoodieTable.getHadoopConf()).convert(schema), schema, filter);
    HoodieAvroParquetConfig parquetConfig = new HoodieAvroParquetConfig(writeSupport, config.getParquetCompressionCodec(), config.getParquetBlockSize(), config.getParquetPageSize(), config.getParquetMaxFileSize(), hoodieTable.getHadoopConf(), config.getParquetCompressionRatio(), config.parquetDictionaryEnabled());
    return new HoodieParquetWriter<>(instantTime, path, parquetConfig, schema, taskContextSupplier, populateMetaFields);
}
Also used : AvroSchemaConverter(org.apache.parquet.avro.AvroSchemaConverter) HoodieAvroWriteSupport(org.apache.hudi.avro.HoodieAvroWriteSupport) BloomFilter(org.apache.hudi.common.bloom.BloomFilter)

Example 7 with HoodieAvroWriteSupport

use of org.apache.hudi.avro.HoodieAvroWriteSupport in project hudi by apache.

the class HiveTestUtil method generateParquetData.

@SuppressWarnings({ "unchecked", "deprecation" })
private static void generateParquetData(Path filePath, boolean isParquetSchemaSimple) throws IOException, URISyntaxException {
    Schema schema = getTestDataSchema(isParquetSchemaSimple);
    org.apache.parquet.schema.MessageType parquetSchema = new AvroSchemaConverter().convert(schema);
    BloomFilter filter = BloomFilterFactory.createBloomFilter(1000, 0.0001, -1, BloomFilterTypeCode.SIMPLE.name());
    HoodieAvroWriteSupport writeSupport = new HoodieAvroWriteSupport(parquetSchema, schema, Option.of(filter));
    ParquetWriter writer = new ParquetWriter(filePath, writeSupport, CompressionCodecName.GZIP, 120 * 1024 * 1024, ParquetWriter.DEFAULT_PAGE_SIZE, ParquetWriter.DEFAULT_PAGE_SIZE, ParquetWriter.DEFAULT_IS_DICTIONARY_ENABLED, ParquetWriter.DEFAULT_IS_VALIDATING_ENABLED, ParquetWriter.DEFAULT_WRITER_VERSION, fileSystem.getConf());
    List<IndexedRecord> testRecords = (isParquetSchemaSimple ? SchemaTestUtil.generateTestRecords(0, 100) : SchemaTestUtil.generateEvolvedTestRecords(100, 100));
    testRecords.forEach(s -> {
        try {
            writer.write(s);
        } catch (IOException e) {
            fail("IOException while writing test records as parquet" + e.toString());
        }
    });
    writer.close();
}
Also used : AvroSchemaConverter(org.apache.parquet.avro.AvroSchemaConverter) IndexedRecord(org.apache.avro.generic.IndexedRecord) ParquetWriter(org.apache.parquet.hadoop.ParquetWriter) Schema(org.apache.avro.Schema) HoodieAvroWriteSupport(org.apache.hudi.avro.HoodieAvroWriteSupport) IOException(java.io.IOException) BloomFilter(org.apache.hudi.common.bloom.BloomFilter)

Aggregations

HoodieAvroWriteSupport (org.apache.hudi.avro.HoodieAvroWriteSupport)7 AvroSchemaConverter (org.apache.parquet.avro.AvroSchemaConverter)7 BloomFilter (org.apache.hudi.common.bloom.BloomFilter)5 IndexedRecord (org.apache.avro.generic.IndexedRecord)4 ParquetWriter (org.apache.parquet.hadoop.ParquetWriter)4 IOException (java.io.IOException)3 Schema (org.apache.avro.Schema)3 GenericRecord (org.apache.avro.generic.GenericRecord)2 Configuration (org.apache.hadoop.conf.Configuration)2 Path (org.apache.hadoop.fs.Path)2 HoodieRecord (org.apache.hudi.common.model.HoodieRecord)2 HoodieAvroParquetConfig (org.apache.hudi.io.storage.HoodieAvroParquetConfig)2 ByteArrayOutputStream (java.io.ByteArrayOutputStream)1 FSDataOutputStream (org.apache.hadoop.fs.FSDataOutputStream)1 HoodieRecordPayload (org.apache.hudi.common.model.HoodieRecordPayload)1 HoodieOrcConfig (org.apache.hudi.io.storage.HoodieOrcConfig)1 HoodieOrcWriter (org.apache.hudi.io.storage.HoodieOrcWriter)1 HoodieParquetStreamWriter (org.apache.hudi.io.storage.HoodieParquetStreamWriter)1 HoodieParquetWriter (org.apache.hudi.io.storage.HoodieParquetWriter)1