Search in sources :

Example 1 with ConfigurableCleanableDataset

use of org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset in project incubator-gobblin by apache.

the class ConfigBasedCleanabledDatasetFinder method findDatasetsCallable.

protected Callable<Void> findDatasetsCallable(final ConfigClient confClient, final URI u, final Properties p, Optional<List<String>> blacklistURNs, final Collection<Dataset> datasets) {
    return new Callable<Void>() {

        @Override
        public Void call() throws Exception {
            // Process each {@link Config}, find dataset and add those into the datasets
            Config c = confClient.getConfig(u);
            Dataset datasetForConfig = new ConfigurableCleanableDataset(fileSystem, p, new Path(c.getString(DATASET_PATH)), c, log);
            datasets.add(datasetForConfig);
            return null;
        }
    };
}
Also used : Path(org.apache.hadoop.fs.Path) ConfigurableCleanableDataset(org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset) Config(com.typesafe.config.Config) ConfigurableCleanableDataset(org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset) Dataset(org.apache.gobblin.dataset.Dataset) Callable(java.util.concurrent.Callable)

Example 2 with ConfigurableCleanableDataset

use of org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset in project incubator-gobblin by apache.

the class ConfigurableCleanableDatasetTest method testDatasetIsBlacklisted.

@Test
public void testDatasetIsBlacklisted() throws Exception {
    Config conf = ConfigFactory.parseMap(ImmutableMap.<String, String>of("gobblin.retention.version.finder.class", "org.apache.gobblin.data.management.version.finder.WatermarkDatasetVersionFinder", "gobblin.retention.selection.policy.class", "org.apache.gobblin.data.management.policy.NewestKSelectionPolicy", "gobblin.retention.selection.newestK.versionsSelected", "2", "gobblin.retention.dataset.is.blacklisted", "true"));
    ConfigurableCleanableDataset<FileSystemDatasetVersion> dataset = new ConfigurableCleanableDataset<FileSystemDatasetVersion>(FileSystem.get(new URI(ConfigurationKeys.LOCAL_FS_URI), new Configuration()), new Properties(), new Path("/someroot"), conf, LoggerFactory.getLogger(ConfigurableCleanableDatasetTest.class));
    Assert.assertEquals(dataset.isDatasetBlacklisted(), true);
}
Also used : Path(org.apache.hadoop.fs.Path) ConfigurableCleanableDataset(org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset) Configuration(org.apache.hadoop.conf.Configuration) Config(com.typesafe.config.Config) FileSystemDatasetVersion(org.apache.gobblin.data.management.version.FileSystemDatasetVersion) Properties(java.util.Properties) URI(java.net.URI) Test(org.testng.annotations.Test)

Example 3 with ConfigurableCleanableDataset

use of org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset in project incubator-gobblin by apache.

the class ConfigurableCleanableDatasetTest method testConfigureWithMulitplePolicies.

@Test
public void testConfigureWithMulitplePolicies() throws Exception {
    Map<String, String> partitionConf = ImmutableMap.<String, String>of("version.finder.class", "org.apache.gobblin.data.management.version.finder.WatermarkDatasetVersionFinder", "selection.policy.class", "org.apache.gobblin.data.management.policy.NewestKSelectionPolicy", "selection.newestK.versionsSelected", "2");
    Config conf = ConfigFactory.parseMap(ImmutableMap.<String, List<Map<String, String>>>of("gobblin.retention.dataset.partitions", ImmutableList.of(partitionConf, partitionConf)));
    ConfigurableCleanableDataset<FileSystemDatasetVersion> dataset = new ConfigurableCleanableDataset<FileSystemDatasetVersion>(FileSystem.get(new URI(ConfigurationKeys.LOCAL_FS_URI), new Configuration()), new Properties(), new Path("/someroot"), conf, LoggerFactory.getLogger(ConfigurableCleanableDatasetTest.class));
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(0).getVersionSelectionPolicy().getClass(), NewestKSelectionPolicy.class);
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(0).getVersionFinder().getClass(), WatermarkDatasetVersionFinder.class);
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(1).getVersionSelectionPolicy().getClass(), NewestKSelectionPolicy.class);
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(1).getVersionFinder().getClass(), WatermarkDatasetVersionFinder.class);
    Assert.assertEquals(dataset.isDatasetBlacklisted(), false);
}
Also used : Path(org.apache.hadoop.fs.Path) ConfigurableCleanableDataset(org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset) Configuration(org.apache.hadoop.conf.Configuration) Config(com.typesafe.config.Config) FileSystemDatasetVersion(org.apache.gobblin.data.management.version.FileSystemDatasetVersion) Properties(java.util.Properties) ImmutableMap(com.google.common.collect.ImmutableMap) Map(java.util.Map) URI(java.net.URI) Test(org.testng.annotations.Test)

Example 4 with ConfigurableCleanableDataset

use of org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset in project incubator-gobblin by apache.

the class ConfigurableCleanableDatasetTest method testConfigureWithSelectionPolicy.

@Test
public void testConfigureWithSelectionPolicy() throws Exception {
    Config conf = ConfigFactory.parseMap(ImmutableMap.<String, String>of("gobblin.retention.version.finder.class", "org.apache.gobblin.data.management.version.finder.WatermarkDatasetVersionFinder", "gobblin.retention.selection.policy.class", "org.apache.gobblin.data.management.policy.NewestKSelectionPolicy", "gobblin.retention.selection.newestK.versionsSelected", "2"));
    ConfigurableCleanableDataset<FileSystemDatasetVersion> dataset = new ConfigurableCleanableDataset<FileSystemDatasetVersion>(FileSystem.get(new URI(ConfigurationKeys.LOCAL_FS_URI), new Configuration()), new Properties(), new Path("/someroot"), conf, LoggerFactory.getLogger(ConfigurableCleanableDatasetTest.class));
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(0).getVersionSelectionPolicy().getClass(), NewestKSelectionPolicy.class);
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(0).getVersionFinder().getClass(), WatermarkDatasetVersionFinder.class);
    Assert.assertEquals(dataset.isDatasetBlacklisted(), false);
}
Also used : Path(org.apache.hadoop.fs.Path) ConfigurableCleanableDataset(org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset) Configuration(org.apache.hadoop.conf.Configuration) Config(com.typesafe.config.Config) FileSystemDatasetVersion(org.apache.gobblin.data.management.version.FileSystemDatasetVersion) Properties(java.util.Properties) URI(java.net.URI) Test(org.testng.annotations.Test)

Example 5 with ConfigurableCleanableDataset

use of org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset in project incubator-gobblin by apache.

the class ConfigurableCleanableDatasetTest method testConfigureWithRetentionPolicy.

@Test
public void testConfigureWithRetentionPolicy() throws Exception {
    Config conf = ConfigFactory.parseMap(ImmutableMap.<String, String>of("gobblin.retention.version.finder.class", "org.apache.gobblin.data.management.version.finder.WatermarkDatasetVersionFinder", "gobblin.retention.retention.policy.class", "org.apache.gobblin.data.management.retention.policy.NewestKRetentionPolicy", "gobblin.retention.newestK.versions.retained", "2"));
    ConfigurableCleanableDataset<FileSystemDatasetVersion> dataset = new ConfigurableCleanableDataset<FileSystemDatasetVersion>(FileSystem.get(new URI(ConfigurationKeys.LOCAL_FS_URI), new Configuration()), new Properties(), new Path("/someroot"), conf, LoggerFactory.getLogger(ConfigurableCleanableDatasetTest.class));
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(0).getVersionSelectionPolicy().getClass(), EmbeddedRetentionSelectionPolicy.class);
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(0).getVersionFinder().getClass(), WatermarkDatasetVersionFinder.class);
    Assert.assertEquals(dataset.isDatasetBlacklisted(), false);
}
Also used : Path(org.apache.hadoop.fs.Path) ConfigurableCleanableDataset(org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset) Configuration(org.apache.hadoop.conf.Configuration) Config(com.typesafe.config.Config) FileSystemDatasetVersion(org.apache.gobblin.data.management.version.FileSystemDatasetVersion) Properties(java.util.Properties) URI(java.net.URI) Test(org.testng.annotations.Test)

Aggregations

Config (com.typesafe.config.Config)5 ConfigurableCleanableDataset (org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset)5 Path (org.apache.hadoop.fs.Path)5 URI (java.net.URI)4 Properties (java.util.Properties)4 FileSystemDatasetVersion (org.apache.gobblin.data.management.version.FileSystemDatasetVersion)4 Configuration (org.apache.hadoop.conf.Configuration)4 Test (org.testng.annotations.Test)4 ImmutableMap (com.google.common.collect.ImmutableMap)1 Map (java.util.Map)1 Callable (java.util.concurrent.Callable)1 Dataset (org.apache.gobblin.dataset.Dataset)1