Search in sources :

Example 86 with TypedProperties

use of org.apache.hudi.common.config.TypedProperties in project hudi by apache.

the class TestDFSPathSelectorCommonMethods method setUp.

@BeforeEach
void setUp() {
    initSparkContexts();
    initPath();
    initFileSystem();
    props = new TypedProperties();
    props.setProperty(ROOT_INPUT_PATH_PROP, basePath);
    props.setProperty(PARTITIONS_LIST_PARALLELISM, "1");
    inputPath = new Path(basePath);
}
Also used : Path(org.apache.hadoop.fs.Path) TypedProperties(org.apache.hudi.common.config.TypedProperties) BeforeEach(org.junit.jupiter.api.BeforeEach)

Example 87 with TypedProperties

use of org.apache.hudi.common.config.TypedProperties in project hudi by apache.

the class TestDatePartitionPathSelector method testPruneDatePartitionPaths.

@ParameterizedTest(name = "[{index}] {0}")
@MethodSource("configParams")
public void testPruneDatePartitionPaths(String tableName, String dateFormat, int datePartitionDepth, int numPrevDaysToList, String currentDate, boolean isHiveStylePartition, int expectedNumFiles) throws IOException {
    TypedProperties props = getProps(basePath + "/" + tableName, dateFormat, datePartitionDepth, numPrevDaysToList, currentDate);
    DatePartitionPathSelector pathSelector = new DatePartitionPathSelector(props, jsc.hadoopConfiguration());
    Path root = new Path(props.getString(ROOT_INPUT_PATH_PROP));
    int totalDepthBeforeDatePartitions = props.getInteger(DATE_PARTITION_DEPTH) - 1;
    // Create parent dir
    List<Path> leafDirs = new ArrayList<>();
    createParentDirsBeforeDatePartitions(root, generateRandomStrings(), totalDepthBeforeDatePartitions, leafDirs);
    createDatePartitionsWithFiles(leafDirs, isHiveStylePartition, dateFormat);
    List<String> paths = pathSelector.pruneDatePartitionPaths(context, fs, root.toString(), LocalDate.parse(currentDate));
    assertEquals(expectedNumFiles, paths.size());
}
Also used : Path(org.apache.hadoop.fs.Path) ArrayList(java.util.ArrayList) TypedProperties(org.apache.hudi.common.config.TypedProperties) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest) MethodSource(org.junit.jupiter.params.provider.MethodSource)

Example 88 with TypedProperties

use of org.apache.hudi.common.config.TypedProperties in project hudi by apache.

the class TestKafkaOffsetGen method getConsumerConfigs.

private TypedProperties getConsumerConfigs(String autoOffsetReset, String kafkaCheckpointType) {
    TypedProperties props = new TypedProperties();
    props.put("hoodie.deltastreamer.source.kafka.checkpoint.type", kafkaCheckpointType);
    props.put("auto.offset.reset", autoOffsetReset);
    props.put("hoodie.deltastreamer.source.kafka.topic", TEST_TOPIC_NAME);
    props.setProperty("bootstrap.servers", testUtils.brokerAddress());
    props.setProperty("key.deserializer", StringDeserializer.class.getName());
    props.setProperty("value.deserializer", StringDeserializer.class.getName());
    props.setProperty(ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString());
    return props;
}
Also used : StringDeserializer(org.apache.kafka.common.serialization.StringDeserializer) TypedProperties(org.apache.hudi.common.config.TypedProperties)

Example 89 with TypedProperties

use of org.apache.hudi.common.config.TypedProperties in project hudi by apache.

the class TestS3EventsMetaSelector method setUp.

@BeforeEach
void setUp() {
    initSparkContexts();
    initPath();
    initFileSystem();
    MockitoAnnotations.initMocks(this);
    props = new TypedProperties();
    sqsUrl = "test-queue";
    props.setProperty(S3_SOURCE_QUEUE_URL, sqsUrl);
    props.setProperty(S3_SOURCE_QUEUE_REGION, REGION_NAME);
}
Also used : TypedProperties(org.apache.hudi.common.config.TypedProperties) BeforeEach(org.junit.jupiter.api.BeforeEach)

Example 90 with TypedProperties

use of org.apache.hudi.common.config.TypedProperties in project hudi by apache.

the class DistributedTestDataSource method fetchNewData.

@Override
protected InputBatch<JavaRDD<GenericRecord>> fetchNewData(Option<String> lastCkptStr, long sourceLimit) {
    int nextCommitNum = lastCkptStr.map(s -> Integer.parseInt(s) + 1).orElse(0);
    String instantTime = String.format("%05d", nextCommitNum);
    LOG.info("Source Limit is set to " + sourceLimit);
    // No new data.
    if (sourceLimit <= 0) {
        return new InputBatch<>(Option.empty(), instantTime);
    }
    TypedProperties newProps = new TypedProperties();
    newProps.putAll(props);
    // Set the maxUniqueRecords per partition for TestDataSource
    int maxUniqueRecords = props.getInteger(SourceConfigs.MAX_UNIQUE_RECORDS_PROP, SourceConfigs.DEFAULT_MAX_UNIQUE_RECORDS);
    String maxUniqueRecordsPerPartition = String.valueOf(Math.max(1, maxUniqueRecords / numTestSourcePartitions));
    newProps.setProperty(SourceConfigs.MAX_UNIQUE_RECORDS_PROP, maxUniqueRecordsPerPartition);
    int perPartitionSourceLimit = Math.max(1, (int) (sourceLimit / numTestSourcePartitions));
    JavaRDD<GenericRecord> avroRDD = sparkContext.parallelize(IntStream.range(0, numTestSourcePartitions).boxed().collect(Collectors.toList()), numTestSourcePartitions).mapPartitionsWithIndex((p, idx) -> {
        LOG.info("Initializing source with newProps=" + newProps);
        if (!dataGeneratorMap.containsKey(p)) {
            initDataGen(newProps, p);
        }
        return fetchNextBatch(newProps, perPartitionSourceLimit, instantTime, p).iterator();
    }, true);
    return new InputBatch<>(Option.of(avroRDD), instantTime);
}
Also used : IntStream(java.util.stream.IntStream) SchemaProvider(org.apache.hudi.utilities.schema.SchemaProvider) GenericRecord(org.apache.avro.generic.GenericRecord) TypedProperties(org.apache.hudi.common.config.TypedProperties) SourceConfigs(org.apache.hudi.utilities.testutils.sources.config.SourceConfigs) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) Option(org.apache.hudi.common.util.Option) InputBatch(org.apache.hudi.utilities.sources.InputBatch) Collectors(java.util.stream.Collectors) Logger(org.apache.log4j.Logger) LogManager(org.apache.log4j.LogManager) JavaRDD(org.apache.spark.api.java.JavaRDD) SparkSession(org.apache.spark.sql.SparkSession) InputBatch(org.apache.hudi.utilities.sources.InputBatch) TypedProperties(org.apache.hudi.common.config.TypedProperties) GenericRecord(org.apache.avro.generic.GenericRecord)

Aggregations

TypedProperties (org.apache.hudi.common.config.TypedProperties)143 Test (org.junit.jupiter.api.Test)47 HoodieTestDataGenerator (org.apache.hudi.common.testutils.HoodieTestDataGenerator)22 JavaRDD (org.apache.spark.api.java.JavaRDD)16 ParameterizedTest (org.junit.jupiter.params.ParameterizedTest)15 IOException (java.io.IOException)14 Path (org.apache.hadoop.fs.Path)14 Properties (java.util.Properties)13 GenericRecord (org.apache.avro.generic.GenericRecord)13 SourceFormatAdapter (org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter)12 Row (org.apache.spark.sql.Row)12 BeforeEach (org.junit.jupiter.api.BeforeEach)11 ArrayList (java.util.ArrayList)10 HoodieTableMetaClient (org.apache.hudi.common.table.HoodieTableMetaClient)10 HoodieKey (org.apache.hudi.common.model.HoodieKey)9 DFSPropertiesConfiguration (org.apache.hudi.common.config.DFSPropertiesConfiguration)8 HoodieWriteConfig (org.apache.hudi.config.HoodieWriteConfig)8 HoodieIOException (org.apache.hudi.exception.HoodieIOException)8 Dataset (org.apache.spark.sql.Dataset)8 HoodieRecord (org.apache.hudi.common.model.HoodieRecord)7