Search in sources :

Example 21 with HdfsFileStatusWithId

use of org.apache.hadoop.hive.shims.HadoopShims.HdfsFileStatusWithId in project hive by apache.

the class AcidUtils method getAcidFilesForStats.

public static List<FileStatus> getAcidFilesForStats(Table table, Path dir, Configuration jc, FileSystem fs) throws IOException {
    List<FileStatus> fileList = new ArrayList<>();
    ValidWriteIdList idList = AcidUtils.getTableValidWriteIdList(jc, AcidUtils.getFullTableName(table.getDbName(), table.getTableName()));
    if (idList == null) {
        LOG.warn("Cannot get ACID state for " + table.getDbName() + "." + table.getTableName() + " from " + jc.get(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY));
        return null;
    }
    if (fs == null) {
        fs = dir.getFileSystem(jc);
    }
    // Collect the all of the files/dirs
    Map<Path, HdfsDirSnapshot> hdfsDirSnapshots = AcidUtils.getHdfsDirSnapshots(fs, dir);
    AcidDirectory acidInfo = AcidUtils.getAcidState(fs, dir, jc, idList, null, false, hdfsDirSnapshots);
    // Assume that for an MM table, or if there's only the base directory, we are good.
    if (!acidInfo.getCurrentDirectories().isEmpty() && AcidUtils.isFullAcidTable(table)) {
        Utilities.FILE_OP_LOGGER.warn("Computing stats for an ACID table; stats may be inaccurate");
    }
    for (HdfsFileStatusWithId hfs : acidInfo.getOriginalFiles()) {
        fileList.add(hfs.getFileStatus());
    }
    for (ParsedDelta delta : acidInfo.getCurrentDirectories()) {
        fileList.addAll(hdfsDirSnapshots.get(delta.getPath()).getFiles());
    }
    if (acidInfo.getBaseDirectory() != null) {
        fileList.addAll(hdfsDirSnapshots.get(acidInfo.getBaseDirectory()).getFiles());
    }
    return fileList;
}
Also used : Path(org.apache.hadoop.fs.Path) FileStatus(org.apache.hadoop.fs.FileStatus) LocatedFileStatus(org.apache.hadoop.fs.LocatedFileStatus) ValidWriteIdList(org.apache.hadoop.hive.common.ValidWriteIdList) HdfsFileStatusWithId(org.apache.hadoop.hive.shims.HadoopShims.HdfsFileStatusWithId) ArrayList(java.util.ArrayList)

Example 22 with HdfsFileStatusWithId

use of org.apache.hadoop.hive.shims.HadoopShims.HdfsFileStatusWithId in project hive by apache.

the class ExternalCache method generateTestFileId.

private Long generateTestFileId(final FileStatus fs, List<HdfsFileStatusWithId> files, int i) {
    final Long fileId = HdfsUtils.createTestFileId(fs.getPath().toUri().getPath(), fs, false, null);
    files.set(i, new HdfsFileStatusWithId() {

        @Override
        public FileStatus getFileStatus() {
            return fs;
        }

        @Override
        public Long getFileId() {
            return fileId;
        }
    });
    return fileId;
}
Also used : FileStatus(org.apache.hadoop.fs.FileStatus) HdfsFileStatusWithId(org.apache.hadoop.hive.shims.HadoopShims.HdfsFileStatusWithId)

Aggregations

HdfsFileStatusWithId (org.apache.hadoop.hive.shims.HadoopShims.HdfsFileStatusWithId)22 Path (org.apache.hadoop.fs.Path)12 FileStatus (org.apache.hadoop.fs.FileStatus)10 ArrayList (java.util.ArrayList)8 FileSystem (org.apache.hadoop.fs.FileSystem)6 VisibleForTesting (com.google.common.annotations.VisibleForTesting)5 IOException (java.io.IOException)4 ValidReaderWriteIdList (org.apache.hadoop.hive.common.ValidReaderWriteIdList)4 AcidUtils (org.apache.hadoop.hive.ql.io.AcidUtils)4 Configuration (org.apache.hadoop.conf.Configuration)3 ValidWriteIdList (org.apache.hadoop.hive.common.ValidWriteIdList)3 JobConf (org.apache.hadoop.mapred.JobConf)3 DistributedFileSystem (org.apache.hadoop.hdfs.DistributedFileSystem)2 ValidReadTxnList (org.apache.hadoop.hive.common.ValidReadTxnList)2 MockFile (org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.MockFile)2 MockFileSystem (org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.MockFileSystem)2 MockPath (org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.MockPath)2 Test (org.junit.Test)2 ObjectMapper (com.fasterxml.jackson.databind.ObjectMapper)1 Preconditions (com.google.common.base.Preconditions)1