Search in sources :

Example 1 with HoodieConfig

use of org.apache.hudi.common.config.HoodieConfig in project hudi by apache.

the class HoodieTableConfig method create.

/**
 * Initialize the hoodie meta directory and any necessary files inside the meta (including the hoodie.properties).
 */
public static void create(FileSystem fs, Path metadataFolder, Properties properties) throws IOException {
    if (!fs.exists(metadataFolder)) {
        fs.mkdirs(metadataFolder);
    }
    HoodieConfig hoodieConfig = new HoodieConfig(properties);
    Path propertyPath = new Path(metadataFolder, HOODIE_PROPERTIES_FILE);
    try (FSDataOutputStream outputStream = fs.create(propertyPath)) {
        if (!hoodieConfig.contains(NAME)) {
            throw new IllegalArgumentException(NAME.key() + " property needs to be specified");
        }
        hoodieConfig.setDefaultValue(TYPE);
        if (hoodieConfig.getString(TYPE).equals(HoodieTableType.MERGE_ON_READ.name())) {
            hoodieConfig.setDefaultValue(PAYLOAD_CLASS_NAME);
        }
        hoodieConfig.setDefaultValue(ARCHIVELOG_FOLDER);
        if (!hoodieConfig.contains(TIMELINE_LAYOUT_VERSION)) {
            // Use latest Version as default unless forced by client
            hoodieConfig.setValue(TIMELINE_LAYOUT_VERSION, TimelineLayoutVersion.CURR_VERSION.toString());
        }
        if (hoodieConfig.contains(BOOTSTRAP_BASE_PATH)) {
            // Use the default bootstrap index class.
            hoodieConfig.setDefaultValue(BOOTSTRAP_INDEX_CLASS_NAME, getDefaultBootstrapIndexClass(properties));
        }
        if (hoodieConfig.contains(TIMELINE_TIMEZONE)) {
            HoodieInstantTimeGenerator.setCommitTimeZone(HoodieTimelineTimeZone.valueOf(hoodieConfig.getString(TIMELINE_TIMEZONE)));
        }
        storeProperties(hoodieConfig.getProps(), outputStream);
    }
}
Also used : Path(org.apache.hadoop.fs.Path) HoodieConfig(org.apache.hudi.common.config.HoodieConfig) FSDataOutputStream(org.apache.hadoop.fs.FSDataOutputStream)

Example 2 with HoodieConfig

use of org.apache.hudi.common.config.HoodieConfig in project hudi by apache.

the class HoodieTableConfig method getDefaultBootstrapIndexClass.

public static String getDefaultBootstrapIndexClass(Properties props) {
    HoodieConfig hoodieConfig = new HoodieConfig(props);
    String defaultClass = BOOTSTRAP_INDEX_CLASS_NAME.defaultValue();
    if (!hoodieConfig.getBooleanOrDefault(BOOTSTRAP_INDEX_ENABLE)) {
        defaultClass = NO_OP_BOOTSTRAP_INDEX_CLASS;
    }
    return defaultClass;
}
Also used : HoodieConfig(org.apache.hudi.common.config.HoodieConfig)

Example 3 with HoodieConfig

use of org.apache.hudi.common.config.HoodieConfig in project hudi by apache.

the class HoodieTableMetaClient method initTableAndGetMetaClient.

/**
 * Helper method to initialize a given path as a hoodie table with configs passed in as Properties.
 *
 * @return Instance of HoodieTableMetaClient
 */
public static HoodieTableMetaClient initTableAndGetMetaClient(Configuration hadoopConf, String basePath, Properties props) throws IOException {
    LOG.info("Initializing " + basePath + " as hoodie table " + basePath);
    Path basePathDir = new Path(basePath);
    final FileSystem fs = FSUtils.getFs(basePath, hadoopConf);
    if (!fs.exists(basePathDir)) {
        fs.mkdirs(basePathDir);
    }
    Path metaPathDir = new Path(basePath, METAFOLDER_NAME);
    if (!fs.exists(metaPathDir)) {
        fs.mkdirs(metaPathDir);
    }
    // if anything other than default archive log folder is specified, create that too
    String archiveLogPropVal = new HoodieConfig(props).getStringOrDefault(HoodieTableConfig.ARCHIVELOG_FOLDER);
    if (!StringUtils.isNullOrEmpty(archiveLogPropVal)) {
        Path archiveLogDir = new Path(metaPathDir, archiveLogPropVal);
        if (!fs.exists(archiveLogDir)) {
            fs.mkdirs(archiveLogDir);
        }
    }
    // Always create temporaryFolder which is needed for finalizeWrite for Hoodie tables
    final Path temporaryFolder = new Path(basePath, HoodieTableMetaClient.TEMPFOLDER_NAME);
    if (!fs.exists(temporaryFolder)) {
        fs.mkdirs(temporaryFolder);
    }
    // Always create auxiliary folder which is needed to track compaction workloads (stats and any metadata in future)
    final Path auxiliaryFolder = new Path(basePath, HoodieTableMetaClient.AUXILIARYFOLDER_NAME);
    if (!fs.exists(auxiliaryFolder)) {
        fs.mkdirs(auxiliaryFolder);
    }
    initializeBootstrapDirsIfNotExists(hadoopConf, basePath, fs);
    HoodieTableConfig.create(fs, metaPathDir, props);
    // We should not use fs.getConf as this might be different from the original configuration
    // used to create the fs in unit tests
    HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(basePath).build();
    LOG.info("Finished initializing Table of type " + metaClient.getTableConfig().getTableType() + " from " + basePath);
    return metaClient;
}
Also used : Path(org.apache.hadoop.fs.Path) HoodieConfig(org.apache.hudi.common.config.HoodieConfig) HoodieWrapperFileSystem(org.apache.hudi.common.fs.HoodieWrapperFileSystem) HoodieRetryWrapperFileSystem(org.apache.hudi.common.fs.HoodieRetryWrapperFileSystem) FileSystem(org.apache.hadoop.fs.FileSystem)

Example 4 with HoodieConfig

use of org.apache.hudi.common.config.HoodieConfig in project hudi by apache.

the class TestHoodieDeltaStreamer method testPayloadClassUpdate.

@Test
public void testPayloadClassUpdate() throws Exception {
    String dataSetBasePath = dfsBasePath + "/test_dataset_mor";
    HoodieDeltaStreamer.Config cfg = TestHelpers.makeConfig(dataSetBasePath, WriteOperationType.BULK_INSERT, Collections.singletonList(SqlQueryBasedTransformer.class.getName()), PROPS_FILENAME_TEST_SOURCE, true, true, false, null, "MERGE_ON_READ");
    new HoodieDeltaStreamer(cfg, jsc, dfs, hiveServer.getHiveConf()).sync();
    TestHelpers.assertRecordCount(1000, dataSetBasePath + "/*/*.parquet", sqlContext);
    // now create one more deltaStreamer instance and update payload class
    cfg = TestHelpers.makeConfig(dataSetBasePath, WriteOperationType.BULK_INSERT, Collections.singletonList(SqlQueryBasedTransformer.class.getName()), PROPS_FILENAME_TEST_SOURCE, true, true, true, DummyAvroPayload.class.getName(), "MERGE_ON_READ");
    new HoodieDeltaStreamer(cfg, jsc, dfs, hiveServer.getHiveConf());
    // now assert that hoodie.properties file now has updated payload class name
    Properties props = new Properties();
    String metaPath = dataSetBasePath + "/.hoodie/hoodie.properties";
    FileSystem fs = FSUtils.getFs(cfg.targetBasePath, jsc.hadoopConfiguration());
    try (FSDataInputStream inputStream = fs.open(new Path(metaPath))) {
        props.load(inputStream);
    }
    assertEquals(new HoodieConfig(props).getString(HoodieTableConfig.PAYLOAD_CLASS_NAME), DummyAvroPayload.class.getName());
}
Also used : Path(org.apache.hadoop.fs.Path) HoodieDeltaStreamer(org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer) HoodieConfig(org.apache.hudi.common.config.HoodieConfig) SqlQueryBasedTransformer(org.apache.hudi.utilities.transform.SqlQueryBasedTransformer) FileSystem(org.apache.hadoop.fs.FileSystem) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream) Properties(java.util.Properties) TypedProperties(org.apache.hudi.common.config.TypedProperties) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest) Test(org.junit.jupiter.api.Test)

Example 5 with HoodieConfig

use of org.apache.hudi.common.config.HoodieConfig in project hudi by apache.

the class HoodieDeltaStreamer method combineProperties.

private static TypedProperties combineProperties(Config cfg, Option<TypedProperties> propsOverride, Configuration hadoopConf) {
    HoodieConfig hoodieConfig = new HoodieConfig();
    // 3. Otherwise, parse provided specified props file (merging in CLI overrides)
    if (propsOverride.isPresent()) {
        hoodieConfig.setAll(propsOverride.get());
    } else if (cfg.propsFilePath.equals(Config.DEFAULT_DFS_SOURCE_PROPERTIES)) {
        hoodieConfig.setAll(UtilHelpers.getConfig(cfg.configs).getProps());
    } else {
        hoodieConfig.setAll(UtilHelpers.readConfig(hadoopConf, new Path(cfg.propsFilePath), cfg.configs).getProps());
    }
    hoodieConfig.setDefaultValue(DataSourceWriteOptions.RECONCILE_SCHEMA());
    return hoodieConfig.getProps(true);
}
Also used : Path(org.apache.hadoop.fs.Path) HoodieConfig(org.apache.hudi.common.config.HoodieConfig)

Aggregations

HoodieConfig (org.apache.hudi.common.config.HoodieConfig)8 Path (org.apache.hadoop.fs.Path)6 FSDataInputStream (org.apache.hadoop.fs.FSDataInputStream)3 FileSystem (org.apache.hadoop.fs.FileSystem)2 Test (org.junit.jupiter.api.Test)2 BasicSessionCredentials (com.amazonaws.auth.BasicSessionCredentials)1 Properties (java.util.Properties)1 FSDataOutputStream (org.apache.hadoop.fs.FSDataOutputStream)1 TypedProperties (org.apache.hudi.common.config.TypedProperties)1 HoodieRetryWrapperFileSystem (org.apache.hudi.common.fs.HoodieRetryWrapperFileSystem)1 HoodieWrapperFileSystem (org.apache.hudi.common.fs.HoodieWrapperFileSystem)1 HoodieDeltaStreamer (org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer)1 SqlQueryBasedTransformer (org.apache.hudi.utilities.transform.SqlQueryBasedTransformer)1 ParameterizedTest (org.junit.jupiter.params.ParameterizedTest)1