Search in sources :

Example 1 with TimedCallable

use of org.apache.drill.exec.store.TimedCallable in project drill by apache.

the class BlockMapBuilder method generateFileWork.

public List<CompleteFileWork> generateFileWork(List<FileStatus> files, boolean blockify) throws IOException {
    List<TimedCallable<List<CompleteFileWork>>> readers = new ArrayList<>(files.size());
    for (FileStatus status : files) {
        readers.add(new BlockMapReader(status, blockify));
    }
    List<List<CompleteFileWork>> work = TimedCallable.run("Get block maps", logger, readers, 16);
    List<CompleteFileWork> singleList = Lists.newArrayList();
    for (List<CompleteFileWork> innerWorkList : work) {
        singleList.addAll(innerWorkList);
    }
    return singleList;
}
Also used : FileStatus(org.apache.hadoop.fs.FileStatus) ArrayList(java.util.ArrayList) TimedCallable(org.apache.drill.exec.store.TimedCallable) ArrayList(java.util.ArrayList) List(java.util.List)

Example 2 with TimedCallable

use of org.apache.drill.exec.store.TimedCallable in project drill by apache.

the class FooterGatherer method getFooters.

/**
 * A function to get a list of footers.
 *
 * @param conf configuration for file system
 * @param statuses list of file statuses
 * @param parallelism parallelism
 * @return a list of footers
 * @throws IOException
 */
public static List<Footer> getFooters(final Configuration conf, List<FileStatus> statuses, int parallelism) throws IOException {
    final List<TimedCallable<Footer>> readers = new ArrayList<>();
    final List<Footer> foundFooters = new ArrayList<>();
    for (FileStatus status : statuses) {
        if (status.isDirectory()) {
            // first we check for summary file.
            FileSystem fs = status.getPath().getFileSystem(conf);
            final Path summaryPath = new Path(status.getPath(), ParquetFileWriter.PARQUET_METADATA_FILE);
            if (fs.exists(summaryPath)) {
                FileStatus summaryStatus = fs.getFileStatus(summaryPath);
                foundFooters.addAll(ParquetFileReader.readSummaryFile(conf, summaryStatus));
                continue;
            }
            // else we handle as normal file.
            for (FileStatus inStatus : DrillFileSystemUtil.listFiles(fs, status.getPath(), false)) {
                readers.add(new FooterReader(conf, inStatus));
            }
        } else {
            readers.add(new FooterReader(conf, status));
        }
    }
    if (!readers.isEmpty()) {
        foundFooters.addAll(TimedCallable.run("Fetch Parquet Footers", logger, readers, parallelism));
    }
    return foundFooters;
}
Also used : Path(org.apache.hadoop.fs.Path) FileStatus(org.apache.hadoop.fs.FileStatus) FileSystem(org.apache.hadoop.fs.FileSystem) ArrayList(java.util.ArrayList) Footer(org.apache.parquet.hadoop.Footer) TimedCallable(org.apache.drill.exec.store.TimedCallable)

Aggregations

ArrayList (java.util.ArrayList)2 TimedCallable (org.apache.drill.exec.store.TimedCallable)2 FileStatus (org.apache.hadoop.fs.FileStatus)2 List (java.util.List)1 FileSystem (org.apache.hadoop.fs.FileSystem)1 Path (org.apache.hadoop.fs.Path)1 Footer (org.apache.parquet.hadoop.Footer)1