Search in sources :

Example 1 with HiveDatasetFinder

use of org.apache.gobblin.data.management.copy.hive.HiveDatasetFinder in project incubator-gobblin by apache.

the class ComplianceRetentionJob method initDatasetFinder.

public void initDatasetFinder(Properties properties) throws IOException {
    Preconditions.checkArgument(properties.containsKey(GOBBLIN_COMPLIANCE_DATASET_FINDER_CLASS), "Missing required propety " + GOBBLIN_COMPLIANCE_DATASET_FINDER_CLASS);
    String finderClass = properties.getProperty(GOBBLIN_COMPLIANCE_DATASET_FINDER_CLASS);
    this.finder = GobblinConstructorUtils.invokeConstructor(DatasetsFinder.class, finderClass, new State(properties));
    Iterator<HiveDataset> datasetsIterator = new HiveDatasetFinder(FileSystem.newInstance(new Configuration()), properties).getDatasetsIterator();
    while (datasetsIterator.hasNext()) {
        // Drop partitions from empty tables if property is set, otherwise skip the table
        HiveDataset hiveDataset = datasetsIterator.next();
        List<Partition> partitionsFromDataset = hiveDataset.getPartitionsFromDataset();
        String completeTableName = hiveDataset.getTable().getCompleteName();
        if (!partitionsFromDataset.isEmpty()) {
            this.tableNamesList.add(completeTableName);
            continue;
        }
        if (!Boolean.parseBoolean(properties.getProperty(ComplianceConfigurationKeys.SHOULD_DROP_EMPTY_TABLES, ComplianceConfigurationKeys.DEFAULT_SHOULD_DROP_EMPTY_TABLES))) {
            continue;
        }
        if (completeTableName.contains(ComplianceConfigurationKeys.TRASH) || completeTableName.contains(ComplianceConfigurationKeys.BACKUP) || completeTableName.contains(ComplianceConfigurationKeys.STAGING)) {
            this.tablesToDrop.add(hiveDataset);
        }
    }
}
Also used : Partition(org.apache.hadoop.hive.ql.metadata.Partition) Configuration(org.apache.hadoop.conf.Configuration) State(org.apache.gobblin.configuration.State) DatasetsFinder(org.apache.gobblin.dataset.DatasetsFinder) HiveDatasetFinder(org.apache.gobblin.data.management.copy.hive.HiveDatasetFinder) HiveDataset(org.apache.gobblin.data.management.copy.hive.HiveDataset)

Example 2 with HiveDatasetFinder

use of org.apache.gobblin.data.management.copy.hive.HiveDatasetFinder in project incubator-gobblin by apache.

the class HivePartitionFinder method getHiveDatasets.

private static List<HiveDataset> getHiveDatasets(FileSystem fs, State state) throws IOException {
    Preconditions.checkArgument(state.contains(ComplianceConfigurationKeys.COMPLIANCE_DATASET_WHITELIST), "Missing required property " + ComplianceConfigurationKeys.COMPLIANCE_DATASET_WHITELIST);
    Properties prop = new Properties();
    prop.setProperty(ComplianceConfigurationKeys.HIVE_DATASET_WHITELIST, state.getProp(ComplianceConfigurationKeys.COMPLIANCE_DATASET_WHITELIST));
    HiveDatasetFinder finder = new HiveDatasetFinder(fs, prop);
    return finder.findDatasets();
}
Also used : HiveDatasetFinder(org.apache.gobblin.data.management.copy.hive.HiveDatasetFinder) Properties(java.util.Properties)

Aggregations

HiveDatasetFinder (org.apache.gobblin.data.management.copy.hive.HiveDatasetFinder)2 Properties (java.util.Properties)1 State (org.apache.gobblin.configuration.State)1 HiveDataset (org.apache.gobblin.data.management.copy.hive.HiveDataset)1 DatasetsFinder (org.apache.gobblin.dataset.DatasetsFinder)1 Configuration (org.apache.hadoop.conf.Configuration)1 Partition (org.apache.hadoop.hive.ql.metadata.Partition)1