Search in sources :

Example 1 with AutoReturnableObject

use of org.apache.gobblin.util.AutoReturnableObject in project incubator-gobblin by apache.

the class Avro2OrcStaleDatasetCleaner method run.

@Override
public void run() throws Exception {
    Iterator<HiveDataset> iterator = this.datasetFinder.getDatasetsIterator();
    while (iterator.hasNext()) {
        ConvertibleHiveDataset hiveDataset = (ConvertibleHiveDataset) iterator.next();
        try (AutoReturnableObject<IMetaStoreClient> client = hiveDataset.getClientPool().getClient()) {
            Set<Partition> sourcePartitions = new HashSet<>(HiveUtils.getPartitions(client.get(), hiveDataset.getTable(), Optional.<String>absent()));
            sourcePartitions.parallelStream().filter(partition -> isUnixTimeStamp(partition.getDataLocation().getName())).forEach(partition -> {
                Arrays.stream(listFiles(partition.getDataLocation().getParent())).filter(fileStatus -> !fileStatus.getPath().toString().equalsIgnoreCase(partition.getDataLocation().toString())).forEach(fileStatus -> {
                    deletePath(fileStatus, this.graceTimeInMillis, true);
                });
            });
        }
    }
}
Also used : Arrays(java.util.Arrays) HiveUtils(org.apache.gobblin.data.management.copy.hive.HiveUtils) FileSystem(org.apache.hadoop.fs.FileSystem) MetricContext(org.apache.gobblin.metrics.MetricContext) EventConstants(org.apache.gobblin.data.management.conversion.hive.events.EventConstants) ConfigUtils(org.apache.gobblin.util.ConfigUtils) FileStatus(org.apache.hadoop.fs.FileStatus) HashSet(java.util.HashSet) Logger(org.apache.log4j.Logger) Optional(com.google.common.base.Optional) Configuration(org.apache.hadoop.conf.Configuration) Path(org.apache.hadoop.fs.Path) ConfigFactory(com.typesafe.config.ConfigFactory) HiveDatasetFinder(org.apache.gobblin.data.management.copy.hive.HiveDatasetFinder) ConvertibleHiveDatasetFinder(org.apache.gobblin.data.management.conversion.hive.dataset.ConvertibleHiveDatasetFinder) Properties(java.util.Properties) Iterator(java.util.Iterator) ValidationJob(org.apache.gobblin.data.management.conversion.hive.validation.ValidationJob) Config(com.typesafe.config.Config) Instrumented(org.apache.gobblin.instrumented.Instrumented) Set(java.util.Set) ConvertibleHiveDataset(org.apache.gobblin.data.management.conversion.hive.dataset.ConvertibleHiveDataset) IOException(java.io.IOException) TimeUnit(java.util.concurrent.TimeUnit) Partition(org.apache.hadoop.hive.ql.metadata.Partition) EventSubmitter(org.apache.gobblin.metrics.event.EventSubmitter) AbstractJob(azkaban.jobExecutor.AbstractJob) HiveDataset(org.apache.gobblin.data.management.copy.hive.HiveDataset) IMetaStoreClient(org.apache.hadoop.hive.metastore.IMetaStoreClient) AutoReturnableObject(org.apache.gobblin.util.AutoReturnableObject) Partition(org.apache.hadoop.hive.ql.metadata.Partition) ConvertibleHiveDataset(org.apache.gobblin.data.management.conversion.hive.dataset.ConvertibleHiveDataset) ConvertibleHiveDataset(org.apache.gobblin.data.management.conversion.hive.dataset.ConvertibleHiveDataset) HiveDataset(org.apache.gobblin.data.management.copy.hive.HiveDataset) IMetaStoreClient(org.apache.hadoop.hive.metastore.IMetaStoreClient) HashSet(java.util.HashSet)

Aggregations

AbstractJob (azkaban.jobExecutor.AbstractJob)1 Optional (com.google.common.base.Optional)1 Config (com.typesafe.config.Config)1 ConfigFactory (com.typesafe.config.ConfigFactory)1 IOException (java.io.IOException)1 Arrays (java.util.Arrays)1 HashSet (java.util.HashSet)1 Iterator (java.util.Iterator)1 Properties (java.util.Properties)1 Set (java.util.Set)1 TimeUnit (java.util.concurrent.TimeUnit)1 ConvertibleHiveDataset (org.apache.gobblin.data.management.conversion.hive.dataset.ConvertibleHiveDataset)1 ConvertibleHiveDatasetFinder (org.apache.gobblin.data.management.conversion.hive.dataset.ConvertibleHiveDatasetFinder)1 EventConstants (org.apache.gobblin.data.management.conversion.hive.events.EventConstants)1 ValidationJob (org.apache.gobblin.data.management.conversion.hive.validation.ValidationJob)1 HiveDataset (org.apache.gobblin.data.management.copy.hive.HiveDataset)1 HiveDatasetFinder (org.apache.gobblin.data.management.copy.hive.HiveDatasetFinder)1 HiveUtils (org.apache.gobblin.data.management.copy.hive.HiveUtils)1 Instrumented (org.apache.gobblin.instrumented.Instrumented)1 MetricContext (org.apache.gobblin.metrics.MetricContext)1