Search in sources :

Example 1 with PartitionableDataset

use of org.apache.gobblin.dataset.PartitionableDataset in project incubator-gobblin by apache.

the class DatasetFinderSource method createWorkUnitStream.

private Stream<WorkUnit> createWorkUnitStream(SourceState state) throws IOException {
    IterableDatasetFinder datasetsFinder = createDatasetsFinder(state);
    Stream<Dataset> datasetStream = datasetsFinder.getDatasetsStream(0, null);
    if (this.drilldownIntoPartitions) {
        return datasetStream.flatMap(dataset -> {
            if (dataset instanceof PartitionableDataset) {
                try {
                    return (Stream<PartitionableDataset.DatasetPartition>) ((PartitionableDataset) dataset).getPartitions(0, null);
                } catch (IOException ioe) {
                    log.error("Failed to get partitions for dataset " + dataset.getUrn());
                    return Stream.empty();
                }
            } else {
                return Stream.of(new DatasetWrapper(dataset));
            }
        }).map(this::workUnitForPartitionInternal);
    } else {
        return datasetStream.map(this::workUnitForDataset);
    }
}
Also used : DatasetUtils(org.apache.gobblin.data.management.dataset.DatasetUtils) WorkUnitStream(org.apache.gobblin.source.workunit.WorkUnitStream) Getter(lombok.Getter) IOException(java.io.IOException) Collectors(java.util.stream.Collectors) PartitionableDataset(org.apache.gobblin.dataset.PartitionableDataset) IterableDatasetFinder(org.apache.gobblin.dataset.IterableDatasetFinder) List(java.util.List) Slf4j(lombok.extern.slf4j.Slf4j) Stream(java.util.stream.Stream) BasicWorkUnitStream(org.apache.gobblin.source.workunit.BasicWorkUnitStream) SourceState(org.apache.gobblin.configuration.SourceState) WorkUnitStreamSource(org.apache.gobblin.source.WorkUnitStreamSource) HadoopUtils(org.apache.gobblin.util.HadoopUtils) AllArgsConstructor(lombok.AllArgsConstructor) Dataset(org.apache.gobblin.dataset.Dataset) WorkUnit(org.apache.gobblin.source.workunit.WorkUnit) PartitionableDataset(org.apache.gobblin.dataset.PartitionableDataset) IterableDatasetFinder(org.apache.gobblin.dataset.IterableDatasetFinder) PartitionableDataset(org.apache.gobblin.dataset.PartitionableDataset) Dataset(org.apache.gobblin.dataset.Dataset) IOException(java.io.IOException)

Aggregations

IOException (java.io.IOException)1 List (java.util.List)1 Collectors (java.util.stream.Collectors)1 Stream (java.util.stream.Stream)1 AllArgsConstructor (lombok.AllArgsConstructor)1 Getter (lombok.Getter)1 Slf4j (lombok.extern.slf4j.Slf4j)1 SourceState (org.apache.gobblin.configuration.SourceState)1 DatasetUtils (org.apache.gobblin.data.management.dataset.DatasetUtils)1 Dataset (org.apache.gobblin.dataset.Dataset)1 IterableDatasetFinder (org.apache.gobblin.dataset.IterableDatasetFinder)1 PartitionableDataset (org.apache.gobblin.dataset.PartitionableDataset)1 WorkUnitStreamSource (org.apache.gobblin.source.WorkUnitStreamSource)1 BasicWorkUnitStream (org.apache.gobblin.source.workunit.BasicWorkUnitStream)1 WorkUnit (org.apache.gobblin.source.workunit.WorkUnit)1 WorkUnitStream (org.apache.gobblin.source.workunit.WorkUnitStream)1 HadoopUtils (org.apache.gobblin.util.HadoopUtils)1