Search in sources :

Example 61 with ContentSummary

use of org.apache.hadoop.fs.ContentSummary in project gatk by broadinstitute.

the class RunSGAViaProcessBuilderOnSpark method loadFASTQFiles.

/**
     * Load the FASTQ files in the user specified directory and returns an RDD that satisfies the same requirement
     * as described in {@link JavaSparkContext#wholeTextFiles(String, int)}.
     * @param ctx
     * @param pathToAllInterleavedFASTQFiles path to the directory where all FASTQ files to perform local assembly upon are located
     * @throws GATKException when getting the file count in the specified directory
     * @return
     */
private static JavaPairRDD<String, String> loadFASTQFiles(final JavaSparkContext ctx, final String pathToAllInterleavedFASTQFiles) {
    try {
        final FileSystem hadoopFileSystem = FileSystem.get(ctx.hadoopConfiguration());
        final ContentSummary cs = hadoopFileSystem.getContentSummary(new org.apache.hadoop.fs.Path(pathToAllInterleavedFASTQFiles));
        final int fileCount = (int) cs.getFileCount();
        return ctx.wholeTextFiles(pathToAllInterleavedFASTQFiles, fileCount);
    } catch (final IOException e) {
        throw new GATKException(e.getMessage());
    }
}
Also used : FileSystem(org.apache.hadoop.fs.FileSystem) ContentSummary(org.apache.hadoop.fs.ContentSummary) IOException(java.io.IOException) GATKException(org.broadinstitute.hellbender.exceptions.GATKException)

Aggregations

ContentSummary (org.apache.hadoop.fs.ContentSummary)61 Path (org.apache.hadoop.fs.Path)42 Test (org.junit.Test)38 FileSystem (org.apache.hadoop.fs.FileSystem)10 IOException (java.io.IOException)9 Configuration (org.apache.hadoop.conf.Configuration)8 ArrayList (java.util.ArrayList)6 OutputStream (java.io.OutputStream)5 URI (java.net.URI)5 DSQuotaExceededException (org.apache.hadoop.hdfs.protocol.DSQuotaExceededException)5 QuotaExceededException (org.apache.hadoop.hdfs.protocol.QuotaExceededException)5 WebHdfsFileSystem (org.apache.hadoop.hdfs.web.WebHdfsFileSystem)5 JobConf (org.apache.hadoop.mapred.JobConf)5 HttpURLConnection (java.net.HttpURLConnection)4 HashMap (java.util.HashMap)4 Properties (java.util.Properties)4 FSDataOutputStream (org.apache.hadoop.fs.FSDataOutputStream)4 NSQuotaExceededException (org.apache.hadoop.hdfs.protocol.NSQuotaExceededException)4 SemanticException (org.apache.hadoop.hive.ql.parse.SemanticException)4 FileNotFoundException (java.io.FileNotFoundException)3