Search in sources :

Example 1 with OpticalDuplicatesArgumentCollection

use of org.broadinstitute.hellbender.cmdline.argumentcollections.OpticalDuplicatesArgumentCollection in project gatk by broadinstitute.

the class MarkDuplicatesSparkUnitTest method markDupesTest.

@Test(dataProvider = "md", groups = "spark")
public void markDupesTest(final String input, final long totalExpected, final long dupsExpected) throws IOException {
    JavaSparkContext ctx = SparkContextFactory.getTestSparkContext();
    ReadsSparkSource readSource = new ReadsSparkSource(ctx);
    JavaRDD<GATKRead> reads = readSource.getParallelReads(input, null);
    Assert.assertEquals(reads.count(), totalExpected);
    SAMFileHeader header = readSource.getHeader(input, null);
    OpticalDuplicatesArgumentCollection opticalDuplicatesArgumentCollection = new OpticalDuplicatesArgumentCollection();
    final OpticalDuplicateFinder finder = opticalDuplicatesArgumentCollection.READ_NAME_REGEX != null ? new OpticalDuplicateFinder(opticalDuplicatesArgumentCollection.READ_NAME_REGEX, opticalDuplicatesArgumentCollection.OPTICAL_DUPLICATE_PIXEL_DISTANCE, null) : null;
    JavaRDD<GATKRead> markedReads = MarkDuplicatesSpark.mark(reads, header, MarkDuplicatesScoringStrategy.SUM_OF_BASE_QUALITIES, finder, 1);
    Assert.assertEquals(markedReads.count(), totalExpected);
    JavaRDD<GATKRead> dupes = markedReads.filter(GATKRead::isDuplicate);
    Assert.assertEquals(dupes.count(), dupsExpected);
}
Also used : GATKRead(org.broadinstitute.hellbender.utils.read.GATKRead) ReadsSparkSource(org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSource) OpticalDuplicatesArgumentCollection(org.broadinstitute.hellbender.cmdline.argumentcollections.OpticalDuplicatesArgumentCollection) OpticalDuplicateFinder(org.broadinstitute.hellbender.utils.read.markduplicates.OpticalDuplicateFinder) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) SAMFileHeader(htsjdk.samtools.SAMFileHeader) BaseTest(org.broadinstitute.hellbender.utils.test.BaseTest) Test(org.testng.annotations.Test)

Aggregations

SAMFileHeader (htsjdk.samtools.SAMFileHeader)1 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)1 OpticalDuplicatesArgumentCollection (org.broadinstitute.hellbender.cmdline.argumentcollections.OpticalDuplicatesArgumentCollection)1 ReadsSparkSource (org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSource)1 GATKRead (org.broadinstitute.hellbender.utils.read.GATKRead)1 OpticalDuplicateFinder (org.broadinstitute.hellbender.utils.read.markduplicates.OpticalDuplicateFinder)1 BaseTest (org.broadinstitute.hellbender.utils.test.BaseTest)1 Test (org.testng.annotations.Test)1