Search in sources :

Example 1 with UpdateConverter

use of org.apache.hudi.integ.testsuite.converter.UpdateConverter in project hudi by apache.

the class DeltaGenerator method generateUpdates.

public JavaRDD<GenericRecord> generateUpdates(Config config) throws IOException {
    if (deltaOutputConfig.getDeltaOutputMode() == DeltaOutputMode.DFS) {
        JavaRDD<GenericRecord> inserts = null;
        if (config.getNumRecordsInsert() > 0) {
            inserts = generateInserts(config);
        }
        DeltaInputReader deltaInputReader = null;
        JavaRDD<GenericRecord> adjustedRDD = null;
        if (config.getNumUpsertPartitions() != 0) {
            if (config.getNumUpsertPartitions() < 0) {
                // randomly generate updates for a given number of records without regard to partitions and files
                deltaInputReader = new DFSAvroDeltaInputReader(sparkSession, schemaStr, ((DFSDeltaConfig) deltaOutputConfig).getDeltaBasePath(), Option.empty(), Option.empty());
                adjustedRDD = deltaInputReader.read(config.getNumRecordsUpsert());
                adjustedRDD = adjustRDDToGenerateExactNumUpdates(adjustedRDD, jsc, config.getNumRecordsUpsert());
            } else {
                deltaInputReader = new DFSHoodieDatasetInputReader(jsc, ((DFSDeltaConfig) deltaOutputConfig).getDatasetOutputPath(), schemaStr);
                if (config.getFractionUpsertPerFile() > 0) {
                    adjustedRDD = deltaInputReader.read(config.getNumUpsertPartitions(), config.getNumUpsertFiles(), config.getFractionUpsertPerFile());
                } else {
                    adjustedRDD = deltaInputReader.read(config.getNumUpsertPartitions(), config.getNumUpsertFiles(), config.getNumRecordsUpsert());
                }
            }
            // persist this since we will make multiple passes over this
            int numPartition = Math.min(deltaOutputConfig.getInputParallelism(), Math.max(1, config.getNumUpsertPartitions()));
            log.info("Repartitioning records into " + numPartition + " partitions for updates");
            adjustedRDD = adjustedRDD.repartition(numPartition);
            log.info("Repartitioning records done for updates");
            UpdateConverter converter = new UpdateConverter(schemaStr, config.getRecordSize(), partitionPathFieldNames, recordRowKeyFieldNames);
            JavaRDD<GenericRecord> convertedRecords = converter.convert(adjustedRDD);
            JavaRDD<GenericRecord> updates = convertedRecords.map(record -> {
                record.put(SchemaUtils.SOURCE_ORDERING_FIELD, batchId);
                return record;
            });
            updates.persist(StorageLevel.DISK_ONLY());
            if (inserts == null) {
                inserts = updates;
            } else {
                inserts = inserts.union(updates);
            }
        }
        return inserts;
    // TODO : Generate updates for only N partitions.
    } else {
        throw new IllegalArgumentException("Other formats are not supported at the moment");
    }
}
Also used : DFSHoodieDatasetInputReader(org.apache.hudi.integ.testsuite.reader.DFSHoodieDatasetInputReader) UpdateConverter(org.apache.hudi.integ.testsuite.converter.UpdateConverter) DFSAvroDeltaInputReader(org.apache.hudi.integ.testsuite.reader.DFSAvroDeltaInputReader) DFSAvroDeltaInputReader(org.apache.hudi.integ.testsuite.reader.DFSAvroDeltaInputReader) DeltaInputReader(org.apache.hudi.integ.testsuite.reader.DeltaInputReader) GenericRecord(org.apache.avro.generic.GenericRecord) DFSDeltaConfig(org.apache.hudi.integ.testsuite.configuration.DFSDeltaConfig)

Aggregations

GenericRecord (org.apache.avro.generic.GenericRecord)1 DFSDeltaConfig (org.apache.hudi.integ.testsuite.configuration.DFSDeltaConfig)1 UpdateConverter (org.apache.hudi.integ.testsuite.converter.UpdateConverter)1 DFSAvroDeltaInputReader (org.apache.hudi.integ.testsuite.reader.DFSAvroDeltaInputReader)1 DFSHoodieDatasetInputReader (org.apache.hudi.integ.testsuite.reader.DFSHoodieDatasetInputReader)1 DeltaInputReader (org.apache.hudi.integ.testsuite.reader.DeltaInputReader)1