Search in sources :

Example 1 with DeltaWriterAdapter

use of org.apache.hudi.integ.testsuite.writer.DeltaWriterAdapter in project hudi by apache.

the class DeltaGenerator method writeRecords.

public JavaRDD<DeltaWriteStats> writeRecords(JavaRDD<GenericRecord> records) {
    if (deltaOutputConfig.shouldDeleteOldInputData() && batchId > 1) {
        Path oldInputDir = new Path(deltaOutputConfig.getDeltaBasePath(), Integer.toString(batchId - 1));
        try {
            FileSystem fs = FSUtils.getFs(oldInputDir.toString(), deltaOutputConfig.getConfiguration());
            fs.delete(oldInputDir, true);
        } catch (IOException e) {
            log.error("Failed to delete older input data direcory " + oldInputDir, e);
        }
    }
    // The following creates a new anonymous function for iterator and hence results in serialization issues
    JavaRDD<DeltaWriteStats> ws = records.mapPartitions(itr -> {
        try {
            DeltaWriterAdapter<GenericRecord> deltaWriterAdapter = DeltaWriterFactory.getDeltaWriterAdapter(deltaOutputConfig, batchId);
            return Collections.singletonList(deltaWriterAdapter.write(itr)).iterator();
        } catch (IOException io) {
            throw new UncheckedIOException(io);
        }
    }).flatMap(List::iterator);
    batchId++;
    return ws;
}
Also used : Path(org.apache.hadoop.fs.Path) IntStream(java.util.stream.IntStream) Arrays(java.util.Arrays) FileSystem(org.apache.hadoop.fs.FileSystem) DeltaWriterFactory(org.apache.hudi.integ.testsuite.writer.DeltaWriterFactory) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) LoggerFactory(org.slf4j.LoggerFactory) HashMap(java.util.HashMap) Option(org.apache.hudi.common.util.Option) BuiltinKeyGenerator(org.apache.hudi.keygen.BuiltinKeyGenerator) UpdateConverter(org.apache.hudi.integ.testsuite.converter.UpdateConverter) ArrayList(java.util.ArrayList) Converter(org.apache.hudi.integ.testsuite.converter.Converter) DFSHoodieDatasetInputReader(org.apache.hudi.integ.testsuite.reader.DFSHoodieDatasetInputReader) SchemaUtils(org.apache.hudi.integ.testsuite.schema.SchemaUtils) StorageLevel(org.apache.spark.storage.StorageLevel) Config(org.apache.hudi.integ.testsuite.configuration.DeltaConfig.Config) Map(java.util.Map) DFSAvroDeltaInputReader(org.apache.hudi.integ.testsuite.reader.DFSAvroDeltaInputReader) Path(org.apache.hadoop.fs.Path) StreamSupport(java.util.stream.StreamSupport) DeltaOutputMode(org.apache.hudi.integ.testsuite.writer.DeltaOutputMode) DeltaInputReader(org.apache.hudi.integ.testsuite.reader.DeltaInputReader) DeltaWriterAdapter(org.apache.hudi.integ.testsuite.writer.DeltaWriterAdapter) JavaRDD(org.apache.spark.api.java.JavaRDD) DFSDeltaConfig(org.apache.hudi.integ.testsuite.configuration.DFSDeltaConfig) SparkSession(org.apache.spark.sql.SparkSession) GenericRecord(org.apache.avro.generic.GenericRecord) Logger(org.slf4j.Logger) Iterator(java.util.Iterator) DeltaWriteStats(org.apache.hudi.integ.testsuite.writer.DeltaWriteStats) IOException(java.io.IOException) DeleteConverter(org.apache.hudi.integ.testsuite.converter.DeleteConverter) Tuple2(scala.Tuple2) Collectors(java.util.stream.Collectors) Serializable(java.io.Serializable) UncheckedIOException(java.io.UncheckedIOException) List(java.util.List) Collections(java.util.Collections) FSUtils(org.apache.hudi.common.fs.FSUtils) DeltaWriterAdapter(org.apache.hudi.integ.testsuite.writer.DeltaWriterAdapter) FileSystem(org.apache.hadoop.fs.FileSystem) UncheckedIOException(java.io.UncheckedIOException) ArrayList(java.util.ArrayList) List(java.util.List) IOException(java.io.IOException) UncheckedIOException(java.io.UncheckedIOException) DeltaWriteStats(org.apache.hudi.integ.testsuite.writer.DeltaWriteStats)

Aggregations

IOException (java.io.IOException)1 Serializable (java.io.Serializable)1 UncheckedIOException (java.io.UncheckedIOException)1 ArrayList (java.util.ArrayList)1 Arrays (java.util.Arrays)1 Collections (java.util.Collections)1 HashMap (java.util.HashMap)1 Iterator (java.util.Iterator)1 List (java.util.List)1 Map (java.util.Map)1 Collectors (java.util.stream.Collectors)1 IntStream (java.util.stream.IntStream)1 StreamSupport (java.util.stream.StreamSupport)1 GenericRecord (org.apache.avro.generic.GenericRecord)1 FileSystem (org.apache.hadoop.fs.FileSystem)1 Path (org.apache.hadoop.fs.Path)1 FSUtils (org.apache.hudi.common.fs.FSUtils)1 Option (org.apache.hudi.common.util.Option)1 DFSDeltaConfig (org.apache.hudi.integ.testsuite.configuration.DFSDeltaConfig)1 Config (org.apache.hudi.integ.testsuite.configuration.DeltaConfig.Config)1