Search in sources :

Example 6 with FileSystemDatasetVersion

use of org.apache.gobblin.data.management.version.FileSystemDatasetVersion in project incubator-gobblin by apache.

the class ConfigurableCleanableDatasetTest method testConfigureWithMulitplePolicies.

@Test
public void testConfigureWithMulitplePolicies() throws Exception {
    Map<String, String> partitionConf = ImmutableMap.<String, String>of("version.finder.class", "org.apache.gobblin.data.management.version.finder.WatermarkDatasetVersionFinder", "selection.policy.class", "org.apache.gobblin.data.management.policy.NewestKSelectionPolicy", "selection.newestK.versionsSelected", "2");
    Config conf = ConfigFactory.parseMap(ImmutableMap.<String, List<Map<String, String>>>of("gobblin.retention.dataset.partitions", ImmutableList.of(partitionConf, partitionConf)));
    ConfigurableCleanableDataset<FileSystemDatasetVersion> dataset = new ConfigurableCleanableDataset<FileSystemDatasetVersion>(FileSystem.get(new URI(ConfigurationKeys.LOCAL_FS_URI), new Configuration()), new Properties(), new Path("/someroot"), conf, LoggerFactory.getLogger(ConfigurableCleanableDatasetTest.class));
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(0).getVersionSelectionPolicy().getClass(), NewestKSelectionPolicy.class);
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(0).getVersionFinder().getClass(), WatermarkDatasetVersionFinder.class);
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(1).getVersionSelectionPolicy().getClass(), NewestKSelectionPolicy.class);
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(1).getVersionFinder().getClass(), WatermarkDatasetVersionFinder.class);
    Assert.assertEquals(dataset.isDatasetBlacklisted(), false);
}
Also used : Path(org.apache.hadoop.fs.Path) ConfigurableCleanableDataset(org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset) Configuration(org.apache.hadoop.conf.Configuration) Config(com.typesafe.config.Config) FileSystemDatasetVersion(org.apache.gobblin.data.management.version.FileSystemDatasetVersion) Properties(java.util.Properties) ImmutableMap(com.google.common.collect.ImmutableMap) Map(java.util.Map) URI(java.net.URI) Test(org.testng.annotations.Test)

Example 7 with FileSystemDatasetVersion

use of org.apache.gobblin.data.management.version.FileSystemDatasetVersion in project incubator-gobblin by apache.

the class ConfigurableCleanableDatasetTest method testConfigureWithSelectionPolicy.

@Test
public void testConfigureWithSelectionPolicy() throws Exception {
    Config conf = ConfigFactory.parseMap(ImmutableMap.<String, String>of("gobblin.retention.version.finder.class", "org.apache.gobblin.data.management.version.finder.WatermarkDatasetVersionFinder", "gobblin.retention.selection.policy.class", "org.apache.gobblin.data.management.policy.NewestKSelectionPolicy", "gobblin.retention.selection.newestK.versionsSelected", "2"));
    ConfigurableCleanableDataset<FileSystemDatasetVersion> dataset = new ConfigurableCleanableDataset<FileSystemDatasetVersion>(FileSystem.get(new URI(ConfigurationKeys.LOCAL_FS_URI), new Configuration()), new Properties(), new Path("/someroot"), conf, LoggerFactory.getLogger(ConfigurableCleanableDatasetTest.class));
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(0).getVersionSelectionPolicy().getClass(), NewestKSelectionPolicy.class);
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(0).getVersionFinder().getClass(), WatermarkDatasetVersionFinder.class);
    Assert.assertEquals(dataset.isDatasetBlacklisted(), false);
}
Also used : Path(org.apache.hadoop.fs.Path) ConfigurableCleanableDataset(org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset) Configuration(org.apache.hadoop.conf.Configuration) Config(com.typesafe.config.Config) FileSystemDatasetVersion(org.apache.gobblin.data.management.version.FileSystemDatasetVersion) Properties(java.util.Properties) URI(java.net.URI) Test(org.testng.annotations.Test)

Example 8 with FileSystemDatasetVersion

use of org.apache.gobblin.data.management.version.FileSystemDatasetVersion in project incubator-gobblin by apache.

the class ConfigurableCleanableDatasetTest method testConfigureWithRetentionPolicy.

@Test
public void testConfigureWithRetentionPolicy() throws Exception {
    Config conf = ConfigFactory.parseMap(ImmutableMap.<String, String>of("gobblin.retention.version.finder.class", "org.apache.gobblin.data.management.version.finder.WatermarkDatasetVersionFinder", "gobblin.retention.retention.policy.class", "org.apache.gobblin.data.management.retention.policy.NewestKRetentionPolicy", "gobblin.retention.newestK.versions.retained", "2"));
    ConfigurableCleanableDataset<FileSystemDatasetVersion> dataset = new ConfigurableCleanableDataset<FileSystemDatasetVersion>(FileSystem.get(new URI(ConfigurationKeys.LOCAL_FS_URI), new Configuration()), new Properties(), new Path("/someroot"), conf, LoggerFactory.getLogger(ConfigurableCleanableDatasetTest.class));
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(0).getVersionSelectionPolicy().getClass(), EmbeddedRetentionSelectionPolicy.class);
    Assert.assertEquals(dataset.getVersionFindersAndPolicies().get(0).getVersionFinder().getClass(), WatermarkDatasetVersionFinder.class);
    Assert.assertEquals(dataset.isDatasetBlacklisted(), false);
}
Also used : Path(org.apache.hadoop.fs.Path) ConfigurableCleanableDataset(org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset) Configuration(org.apache.hadoop.conf.Configuration) Config(com.typesafe.config.Config) FileSystemDatasetVersion(org.apache.gobblin.data.management.version.FileSystemDatasetVersion) Properties(java.util.Properties) URI(java.net.URI) Test(org.testng.annotations.Test)

Example 9 with FileSystemDatasetVersion

use of org.apache.gobblin.data.management.version.FileSystemDatasetVersion in project incubator-gobblin by apache.

the class FsCleanableHelper method clean.

/**
 * Delete all {@link FileSystemDatasetVersion}s <code>deletableVersions</code> and also delete any empty parent directories.
 *
 * @param fsDataset to which the version belongs.
 */
public void clean(final Collection<? extends FileSystemDatasetVersion> deletableVersions, final FileSystemDataset fsDataset) throws IOException {
    if (deletableVersions.isEmpty()) {
        log.warn("No deletable dataset version can be found. Ignoring.");
        return;
    }
    Set<Path> possiblyEmptyDirectories = new HashSet<>();
    for (FileSystemDatasetVersion fsdv : deletableVersions) {
        clean(fsdv, possiblyEmptyDirectories);
    }
    cleanEmptyDirectories(possiblyEmptyDirectories, fsDataset);
}
Also used : Path(org.apache.hadoop.fs.Path) FileSystemDatasetVersion(org.apache.gobblin.data.management.version.FileSystemDatasetVersion) HashSet(java.util.HashSet)

Aggregations

FileSystemDatasetVersion (org.apache.gobblin.data.management.version.FileSystemDatasetVersion)9 Path (org.apache.hadoop.fs.Path)8 Config (com.typesafe.config.Config)6 Test (org.testng.annotations.Test)6 Properties (java.util.Properties)5 Configuration (org.apache.hadoop.conf.Configuration)5 URI (java.net.URI)4 ConfigurableCleanableDataset (org.apache.gobblin.data.management.retention.dataset.ConfigurableCleanableDataset)4 ArrayList (java.util.ArrayList)2 HashSet (java.util.HashSet)2 FileSystemDataset (org.apache.gobblin.dataset.FileSystemDataset)2 ImmutableMap (com.google.common.collect.ImmutableMap)1 IOException (java.io.IOException)1 InvocationTargetException (java.lang.reflect.InvocationTargetException)1 Map (java.util.Map)1 RetentionAction (org.apache.gobblin.data.management.retention.action.RetentionAction)1 FsCleanableHelper (org.apache.gobblin.data.management.retention.dataset.FsCleanableHelper)1 DatasetVersion (org.apache.gobblin.data.management.version.DatasetVersion)1 TimestampedDatasetVersion (org.apache.gobblin.data.management.version.TimestampedDatasetVersion)1 FileSystem (org.apache.hadoop.fs.FileSystem)1