Search in sources :

Example 1 with DatasetsFinder

use of org.apache.gobblin.dataset.DatasetsFinder in project incubator-gobblin by apache.

the class CompactionSource method getWorkunitStream.

@Override
public WorkUnitStream getWorkunitStream(SourceState state) {
    try {
        fs = getSourceFileSystem(state);
        state.setProp(COMPACTION_INIT_TIME, DateTimeUtils.currentTimeMillis());
        suite = CompactionSuiteUtils.getCompactionSuiteFactory(state).createSuite(state);
        initRequestAllocator(state);
        initJobDir(state);
        copyJarDependencies(state);
        DatasetsFinder finder = DatasetUtils.instantiateDatasetFinder(state.getProperties(), getSourceFileSystem(state), DefaultFileSystemGlobFinder.class.getName());
        List<Dataset> datasets = finder.findDatasets();
        CompactionWorkUnitIterator workUnitIterator = new CompactionWorkUnitIterator();
        // Spawn a single thread to create work units
        new Thread(new SingleWorkUnitGeneratorService(state, prioritize(datasets, state), workUnitIterator), "SingleWorkUnitGeneratorService").start();
        return new BasicWorkUnitStream.Builder(workUnitIterator).build();
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}
Also used : DefaultFileSystemGlobFinder(org.apache.gobblin.data.management.dataset.DefaultFileSystemGlobFinder) Dataset(org.apache.gobblin.dataset.Dataset) BasicWorkUnitStream(org.apache.gobblin.source.workunit.BasicWorkUnitStream) DatasetsFinder(org.apache.gobblin.dataset.DatasetsFinder) IOException(java.io.IOException)

Example 2 with DatasetsFinder

use of org.apache.gobblin.dataset.DatasetsFinder in project incubator-gobblin by apache.

the class RetentionTestHelper method clean.

/**
 * Does gobblin retention for test data. {@link DatasetCleaner} which does retention in production can not be directly called as we need to resolve some
 * runtime properties like ${testNameTempPath}. This directory contains all the setup data created for a test by {@link RetentionTestDataGenerator#setup()}.
 * It is unique for each test.
 * The default {@link ConfigClient} used by {@link DatasetCleaner} connects to config store configs. We need to provide a
 * mock {@link ConfigClient} since the configs are in classpath and not on config store.
 *
 * @param retentionConfigClasspathResource this is the same jobProps/config files used while running a real retention job
 * @param testNameTempPath temp path for this test where test data is generated
 */
public static void clean(FileSystem fs, Path retentionConfigClasspathResource, Optional<Path> additionalJobPropsClasspathResource, Path testNameTempPath) throws Exception {
    Properties additionalJobProps = new Properties();
    if (additionalJobPropsClasspathResource.isPresent()) {
        try (final InputStream stream = RetentionTestHelper.class.getClassLoader().getResourceAsStream(additionalJobPropsClasspathResource.get().toString())) {
            additionalJobProps.load(stream);
        }
    }
    if (retentionConfigClasspathResource.getName().endsWith(".job")) {
        Properties jobProps = new Properties();
        try (final InputStream stream = RetentionTestHelper.class.getClassLoader().getResourceAsStream(retentionConfigClasspathResource.toString())) {
            jobProps.load(stream);
            for (Entry<Object, Object> entry : jobProps.entrySet()) {
                jobProps.put(entry.getKey(), StringUtils.replace((String) entry.getValue(), "${testNameTempPath}", testNameTempPath.toString()));
            }
        }
        MultiCleanableDatasetFinder finder = new MultiCleanableDatasetFinder(fs, jobProps);
        for (Dataset dataset : finder.findDatasets()) {
            ((CleanableDataset) dataset).clean();
        }
    } else {
        Config testConfig = ConfigFactory.parseResources(retentionConfigClasspathResource.toString()).withFallback(ConfigFactory.parseMap(ImmutableMap.of("testNameTempPath", PathUtils.getPathWithoutSchemeAndAuthority(testNameTempPath).toString()))).resolve();
        ConfigClient client = mock(ConfigClient.class);
        when(client.getConfig(any(String.class))).thenReturn(testConfig);
        Properties jobProps = new Properties();
        jobProps.setProperty(CleanableDatasetBase.SKIP_TRASH_KEY, Boolean.toString(true));
        jobProps.setProperty(ConfigurationKeys.CONFIG_MANAGEMENT_STORE_URI, "dummy");
        jobProps.setProperty(ConfigurationKeys.CONFIG_MANAGEMENT_STORE_ENABLED, "true");
        jobProps.putAll(additionalJobProps);
        @SuppressWarnings("unchecked") DatasetsFinder<CleanableDataset> finder = (DatasetsFinder<CleanableDataset>) GobblinConstructorUtils.invokeFirstConstructor(Class.forName(testConfig.getString(MultiCleanableDatasetFinder.DATASET_FINDER_CLASS_KEY)), ImmutableList.of(fs, jobProps, testConfig, client), ImmutableList.of(fs, jobProps, client));
        for (CleanableDataset dataset : finder.findDatasets()) {
            dataset.clean();
        }
    }
}
Also used : ConfigClient(org.apache.gobblin.config.client.ConfigClient) CleanableDataset(org.apache.gobblin.data.management.retention.dataset.CleanableDataset) InputStream(java.io.InputStream) Dataset(org.apache.gobblin.dataset.Dataset) CleanableDataset(org.apache.gobblin.data.management.retention.dataset.CleanableDataset) Config(com.typesafe.config.Config) Properties(java.util.Properties) DatasetsFinder(org.apache.gobblin.dataset.DatasetsFinder) MultiCleanableDatasetFinder(org.apache.gobblin.data.management.retention.profile.MultiCleanableDatasetFinder)

Example 3 with DatasetsFinder

use of org.apache.gobblin.dataset.DatasetsFinder in project incubator-gobblin by apache.

the class DatasetUtils method instantiateDatasetFinder.

/**
 * Instantiate a {@link DatasetsFinder}. The class of the {@link DatasetsFinder} is read from property
 * {@link #DATASET_PROFILE_CLASS_KEY}.
 *
 * @param props Properties used for building {@link DatasetsFinder}.
 * @param fs {@link FileSystem} where datasets are located.
 * @return A new instance of {@link DatasetsFinder}.
 * @throws IOException
 */
@SuppressWarnings("unchecked")
public static <T extends org.apache.gobblin.dataset.Dataset> DatasetsFinder<T> instantiateDatasetFinder(Properties props, FileSystem fs, String default_class, Object... additionalArgs) throws IOException {
    String className = default_class;
    if (props.containsKey(DATASET_PROFILE_CLASS_KEY)) {
        className = props.getProperty(DATASET_PROFILE_CLASS_KEY);
    }
    try {
        Class<?> datasetFinderClass = Class.forName(className);
        List<Object> args = Lists.newArrayList(fs, props);
        if (additionalArgs != null) {
            args.addAll(Lists.newArrayList(additionalArgs));
        }
        return (DatasetsFinder<T>) GobblinConstructorUtils.invokeLongestConstructor(datasetFinderClass, args.toArray());
    } catch (ReflectiveOperationException exception) {
        throw new IOException(exception);
    }
}
Also used : DatasetsFinder(org.apache.gobblin.dataset.DatasetsFinder) IOException(java.io.IOException)

Aggregations

DatasetsFinder (org.apache.gobblin.dataset.DatasetsFinder)3 IOException (java.io.IOException)2 Dataset (org.apache.gobblin.dataset.Dataset)2 Config (com.typesafe.config.Config)1 InputStream (java.io.InputStream)1 Properties (java.util.Properties)1 ConfigClient (org.apache.gobblin.config.client.ConfigClient)1 DefaultFileSystemGlobFinder (org.apache.gobblin.data.management.dataset.DefaultFileSystemGlobFinder)1 CleanableDataset (org.apache.gobblin.data.management.retention.dataset.CleanableDataset)1 MultiCleanableDatasetFinder (org.apache.gobblin.data.management.retention.profile.MultiCleanableDatasetFinder)1 BasicWorkUnitStream (org.apache.gobblin.source.workunit.BasicWorkUnitStream)1