Search in sources :

Example 51 with LocatedFileStatus

use of org.apache.hadoop.fs.LocatedFileStatus in project druid by druid-io.

the class HdfsDataSegmentPuller method getSegmentFiles.

public FileUtils.FileCopyResult getSegmentFiles(final Path path, final File outDir) throws SegmentLoadingException {
    try {
        final FileSystem fs = path.getFileSystem(config);
        if (fs.isDirectory(path)) {
            try {
                return RetryUtils.retry(new Callable<FileUtils.FileCopyResult>() {

                    @Override
                    public FileUtils.FileCopyResult call() throws Exception {
                        if (!fs.exists(path)) {
                            throw new SegmentLoadingException("No files found at [%s]", path.toString());
                        }
                        final RemoteIterator<LocatedFileStatus> children = fs.listFiles(path, false);
                        final ArrayList<FileUtils.FileCopyResult> localChildren = new ArrayList<>();
                        final FileUtils.FileCopyResult result = new FileUtils.FileCopyResult();
                        while (children.hasNext()) {
                            final LocatedFileStatus child = children.next();
                            final Path childPath = child.getPath();
                            final String fname = childPath.getName();
                            if (fs.isDirectory(childPath)) {
                                log.warn("[%s] is a child directory, skipping", childPath.toString());
                            } else {
                                final File outFile = new File(outDir, fname);
                                // Actual copy
                                fs.copyToLocalFile(childPath, new Path(outFile.toURI()));
                                result.addFile(outFile);
                            }
                        }
                        log.info("Copied %d bytes from [%s] to [%s]", result.size(), path.toString(), outDir.getAbsolutePath());
                        return result;
                    }
                }, shouldRetryPredicate(), DEFAULT_RETRY_COUNT);
            } catch (Exception e) {
                throw Throwables.propagate(e);
            }
        } else if (CompressionUtils.isZip(path.getName())) {
            // --------    zip     ---------
            final FileUtils.FileCopyResult result = CompressionUtils.unzip(new ByteSource() {

                @Override
                public InputStream openStream() throws IOException {
                    return getInputStream(path);
                }
            }, outDir, shouldRetryPredicate(), false);
            log.info("Unzipped %d bytes from [%s] to [%s]", result.size(), path.toString(), outDir.getAbsolutePath());
            return result;
        } else if (CompressionUtils.isGz(path.getName())) {
            // --------    gzip     ---------
            final String fname = path.getName();
            final File outFile = new File(outDir, CompressionUtils.getGzBaseName(fname));
            final FileUtils.FileCopyResult result = CompressionUtils.gunzip(new ByteSource() {

                @Override
                public InputStream openStream() throws IOException {
                    return getInputStream(path);
                }
            }, outFile);
            log.info("Gunzipped %d bytes from [%s] to [%s]", result.size(), path.toString(), outFile.getAbsolutePath());
            return result;
        } else {
            throw new SegmentLoadingException("Do not know how to handle file type at [%s]", path.toString());
        }
    } catch (IOException e) {
        throw new SegmentLoadingException(e, "Error loading [%s]", path.toString());
    }
}
Also used : Path(org.apache.hadoop.fs.Path) SegmentLoadingException(io.druid.segment.loading.SegmentLoadingException) FileUtils(io.druid.java.util.common.FileUtils) InputStream(java.io.InputStream) ArrayList(java.util.ArrayList) LocatedFileStatus(org.apache.hadoop.fs.LocatedFileStatus) IOException(java.io.IOException) SegmentLoadingException(io.druid.segment.loading.SegmentLoadingException) IOException(java.io.IOException) RemoteIterator(org.apache.hadoop.fs.RemoteIterator) FileSystem(org.apache.hadoop.fs.FileSystem) ByteSource(com.google.common.io.ByteSource) File(java.io.File)

Example 52 with LocatedFileStatus

use of org.apache.hadoop.fs.LocatedFileStatus in project druid by druid-io.

the class HdfsTaskLogs method killOlderThan.

@Override
public void killOlderThan(long timestamp) throws IOException {
    Path taskLogDir = new Path(config.getDirectory());
    FileSystem fs = taskLogDir.getFileSystem(hadoopConfig);
    if (fs.exists(taskLogDir)) {
        if (!fs.isDirectory(taskLogDir)) {
            throw new IOException(String.format("taskLogDir [%s] must be a directory.", taskLogDir));
        }
        RemoteIterator<LocatedFileStatus> iter = fs.listLocatedStatus(taskLogDir);
        while (iter.hasNext()) {
            LocatedFileStatus file = iter.next();
            if (file.getModificationTime() < timestamp) {
                Path p = file.getPath();
                log.info("Deleting hdfs task log [%s].", p.toUri().toString());
                fs.delete(p, true);
            }
            if (Thread.currentThread().isInterrupted()) {
                throw new IOException(new InterruptedException("Thread interrupted. Couldn't delete all tasklogs."));
            }
        }
    }
}
Also used : Path(org.apache.hadoop.fs.Path) FileSystem(org.apache.hadoop.fs.FileSystem) LocatedFileStatus(org.apache.hadoop.fs.LocatedFileStatus) IOException(java.io.IOException)

Example 53 with LocatedFileStatus

use of org.apache.hadoop.fs.LocatedFileStatus in project hadoop by apache.

the class ViewFileSystem method listLocatedStatus.

@Override
public RemoteIterator<LocatedFileStatus> listLocatedStatus(final Path f, final PathFilter filter) throws FileNotFoundException, IOException {
    final InodeTree.ResolveResult<FileSystem> res = fsState.resolve(getUriPath(f), true);
    final RemoteIterator<LocatedFileStatus> statusIter = res.targetFileSystem.listLocatedStatus(res.remainingPath);
    if (res.isInternalDir()) {
        return statusIter;
    }
    return new RemoteIterator<LocatedFileStatus>() {

        @Override
        public boolean hasNext() throws IOException {
            return statusIter.hasNext();
        }

        @Override
        public LocatedFileStatus next() throws IOException {
            final LocatedFileStatus status = statusIter.next();
            return (LocatedFileStatus) fixFileStatus(status, getChrootedPath(res, status, f));
        }
    };
}
Also used : RemoteIterator(org.apache.hadoop.fs.RemoteIterator) FileSystem(org.apache.hadoop.fs.FileSystem) LocatedFileStatus(org.apache.hadoop.fs.LocatedFileStatus)

Example 54 with LocatedFileStatus

use of org.apache.hadoop.fs.LocatedFileStatus in project hadoop by apache.

the class AbstractContractRootDirectoryTest method testSimpleRootListing.

@Test
public void testSimpleRootListing() throws IOException {
    describe("test the nonrecursive root listing calls");
    FileSystem fs = getFileSystem();
    Path root = new Path("/");
    FileStatus[] statuses = fs.listStatus(root);
    List<LocatedFileStatus> locatedStatusList = toList(fs.listLocatedStatus(root));
    assertEquals(statuses.length, locatedStatusList.size());
    List<LocatedFileStatus> fileList = toList(fs.listFiles(root, false));
    assertTrue(fileList.size() <= statuses.length);
}
Also used : Path(org.apache.hadoop.fs.Path) LocatedFileStatus(org.apache.hadoop.fs.LocatedFileStatus) FileStatus(org.apache.hadoop.fs.FileStatus) FileSystem(org.apache.hadoop.fs.FileSystem) LocatedFileStatus(org.apache.hadoop.fs.LocatedFileStatus) Test(org.junit.Test)

Example 55 with LocatedFileStatus

use of org.apache.hadoop.fs.LocatedFileStatus in project hadoop by apache.

the class AbstractContractGetFileStatusTest method testListFilesFile.

@Test
public void testListFilesFile() throws Throwable {
    describe("test the listStatus(path) on a file");
    Path f = touchf("listfilesfile");
    List<LocatedFileStatus> statusList = toList(getFileSystem().listFiles(f, false));
    assertEquals("size of file list returned", 1, statusList.size());
    assertIsNamedFile(f, statusList.get(0));
    List<LocatedFileStatus> statusList2 = toListThroughNextCallsAlone(getFileSystem().listFiles(f, false));
    assertEquals("size of file list returned through next() calls", 1, statusList2.size());
    assertIsNamedFile(f, statusList2.get(0));
}
Also used : Path(org.apache.hadoop.fs.Path) LocatedFileStatus(org.apache.hadoop.fs.LocatedFileStatus) Test(org.junit.Test)

Aggregations

LocatedFileStatus (org.apache.hadoop.fs.LocatedFileStatus)70 Path (org.apache.hadoop.fs.Path)51 FileSystem (org.apache.hadoop.fs.FileSystem)29 ArrayList (java.util.ArrayList)24 Test (org.junit.Test)20 FileStatus (org.apache.hadoop.fs.FileStatus)18 IOException (java.io.IOException)14 Configuration (org.apache.hadoop.conf.Configuration)10 File (java.io.File)8 FileNotFoundException (java.io.FileNotFoundException)7 BlockLocation (org.apache.hadoop.fs.BlockLocation)5 BufferedReader (java.io.BufferedReader)4 InputStreamReader (java.io.InputStreamReader)4 Matcher (java.util.regex.Matcher)4 FSDataInputStream (org.apache.hadoop.fs.FSDataInputStream)4 RemoteIterator (org.apache.hadoop.fs.RemoteIterator)4 DistributedFileSystem (org.apache.hadoop.hdfs.DistributedFileSystem)4 PrestoException (com.facebook.presto.spi.PrestoException)3 DataSegment (io.druid.timeline.DataSegment)3 Path (java.nio.file.Path)3