Search in sources :

Example 26 with AllelicCountCollection

use of org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCountCollection in project gatk by broadinstitute.

the class SNPSegmenterUnitTest method testAllelicFractionBasedSegmentation.

/**
     * Tests that segments are correctly determined using allelic counts from SNP sites.
     * Segment-mean and target-number columns from expected segment file are not checked.
     */
@Test
public void testAllelicFractionBasedSegmentation() {
    final String sampleName = "test";
    final File snpFile = new File(TEST_SUB_DIR, "snps-simplified-for-allelic-fraction-segmentation.tsv");
    final List<AllelicCount> snpCounts = new AllelicCountCollection(snpFile).getCounts();
    final TargetCollection<AllelicCount> snps = new HashedListTargetCollection<>(snpCounts);
    final File resultFile = createTempFile("snp-segmenter-test-result", ".seg");
    SNPSegmenter.writeSegmentFile(snps, sampleName, resultFile);
    final File expectedFile = new File(TEST_SUB_DIR, "snp-segmenter-test-expected.seg");
    Assert.assertTrue(resultFile.exists(), "SNPSegmenterTest output was not written to temp file: " + resultFile);
    final List<SimpleInterval> result = SegmentUtils.readIntervalsFromSegmentFile(resultFile);
    final List<SimpleInterval> expected = SegmentUtils.readIntervalsFromSegmentFile(expectedFile);
    Assert.assertEquals(result, expected);
}
Also used : AllelicCountCollection(org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCountCollection) SimpleInterval(org.broadinstitute.hellbender.utils.SimpleInterval) File(java.io.File) AllelicCount(org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCount) BaseTest(org.broadinstitute.hellbender.utils.test.BaseTest) Test(org.testng.annotations.Test)

Example 27 with AllelicCountCollection

use of org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCountCollection in project gatk by broadinstitute.

the class AlleleFractionModellerUnitTest method testMCMCWithAllelicPoN.

/**
     * Test MCMC inference on simulated data using an allelic PoN.  Note that these MCMC tests were written to use
     * simulated hets before the allelic PoN was introduced.  Rather than generate a simulated PoN on the fly,
     * we simply use a fixed simulated PoN loaded from a file and check that its MLE hyperparameters are "sampled"
     * correctly by simply taking the MLE PoN values---i.e., the PoN does not actually cover the simulated sites and
     * hence is not used to correct reference bias in the simulated data in any way.
     * This latter functionality is tested on fixed data loaded from files in
     * {@link AlleleFractionModellerUnitTest#testBiasCorrection} instead.
     */
@Test
public void testMCMCWithAllelicPoN() {
    final double meanBiasSimulated = 1.2;
    final double biasVarianceSimulated = 0.04;
    // PoN generated with alpha = 65
    final double meanBiasOfPoN = 1.083;
    // PoN generated with beta = 60
    final double biasVarianceOfPoN = 0.0181;
    final AllelicPanelOfNormals allelicPoN = new AllelicPanelOfNormals(new AllelicCountCollection(ALLELIC_PON_NORMAL_COUNTS_FILE));
    testMCMC(meanBiasSimulated, biasVarianceSimulated, meanBiasOfPoN, biasVarianceOfPoN, allelicPoN);
}
Also used : AllelicPanelOfNormals(org.broadinstitute.hellbender.tools.pon.allelic.AllelicPanelOfNormals) AllelicCountCollection(org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCountCollection) BaseTest(org.broadinstitute.hellbender.utils.test.BaseTest) Test(org.testng.annotations.Test)

Example 28 with AllelicCountCollection

use of org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCountCollection in project gatk by broadinstitute.

the class AlleleFractionModellerUnitTest method dataBiasCorrection.

@DataProvider(name = "biasCorrection")
public Object[][] dataBiasCorrection() {
    LoggingUtils.setLoggingLevel(Log.LogLevel.INFO);
    final AllelicCountCollection sampleNormal = new AllelicCountCollection(SAMPLE_NORMAL_FILE);
    final AllelicCountCollection sampleWithBadSNPs = new AllelicCountCollection(SAMPLE_WITH_BAD_SNPS_FILE);
    final AllelicCountCollection sampleWithEvent = new AllelicCountCollection(SAMPLE_WITH_EVENT_FILE);
    final AllelicPanelOfNormals allelicPoNNormal = new AllelicPanelOfNormals(new AllelicCountCollection(ALLELIC_PON_NORMAL_COUNTS_FILE));
    final AllelicPanelOfNormals allelicPoNWithBadSNPs = new AllelicPanelOfNormals(new AllelicCountCollection(ALLELIC_PON_WITH_BAD_SNPS_COUNTS_FILE));
    final double minorFractionExpectedInMiddleSegmentNormal = 0.5;
    final double minorFractionExpectedInMiddleSegmentWithBadSNPsAndNormalPoN = 0.4;
    final double minorFractionExpectedInMiddleSegmentWithEvent = 0.33;
    return new Object[][] { { sampleNormal, allelicPoNNormal, minorFractionExpectedInMiddleSegmentNormal }, { sampleWithBadSNPs, allelicPoNNormal, minorFractionExpectedInMiddleSegmentWithBadSNPsAndNormalPoN }, { sampleWithEvent, allelicPoNNormal, minorFractionExpectedInMiddleSegmentWithEvent }, { sampleWithBadSNPs, allelicPoNWithBadSNPs, minorFractionExpectedInMiddleSegmentNormal } };
}
Also used : AllelicPanelOfNormals(org.broadinstitute.hellbender.tools.pon.allelic.AllelicPanelOfNormals) AllelicCountCollection(org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCountCollection) DataProvider(org.testng.annotations.DataProvider)

Example 29 with AllelicCountCollection

use of org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCountCollection in project gatk-protected by broadinstitute.

the class AllelicPanelOfNormalsCreator method create.

/**
     * Creates an {@link AllelicPanelOfNormals} given a site-frequency threshold;
     * sites appearing in strictly less than this fraction of samples will not be included in the panel of normals.
     * @param siteFrequencyThreshold    site-frequency threshold
     * @return                          an {@link AllelicPanelOfNormals} containing sites
     *                                  above the site-frequency threshold
     */
public AllelicPanelOfNormals create(final double siteFrequencyThreshold) {
    logger.info("Creating allelic panel of normals...");
    //used to filter on frequency
    final Map<SimpleInterval, MutableInt> numberOfSamplesMap = new HashMap<>();
    //store only the total counts (smaller memory footprint)
    final Map<SimpleInterval, AllelicCount> totalCountsMap = new HashMap<>();
    int pulldownFileCounter = 1;
    final int totalNumberOfSamples = pulldownFiles.size();
    for (final File pulldownFile : pulldownFiles) {
        logger.info("Processing pulldown file " + pulldownFileCounter++ + "/" + totalNumberOfSamples + " (" + pulldownFile + ")...");
        final AllelicCountCollection pulldownCounts = new AllelicCountCollection(pulldownFile);
        for (final AllelicCount count : pulldownCounts.getCounts()) {
            //update the sum of ref and alt counts at each site
            final SimpleInterval site = count.getInterval();
            final AllelicCount currentCountAtSite = totalCountsMap.getOrDefault(site, new AllelicCount(site, 0, 0));
            final AllelicCount updatedCountAtSite = new AllelicCount(site, currentCountAtSite.getRefReadCount() + count.getRefReadCount(), currentCountAtSite.getAltReadCount() + count.getAltReadCount());
            totalCountsMap.put(site, updatedCountAtSite);
            //update the number of samples seen possessing each site
            final MutableInt numberOfSamplesAtSite = numberOfSamplesMap.get(site);
            if (numberOfSamplesAtSite == null) {
                numberOfSamplesMap.put(site, new MutableInt(1));
            } else {
                numberOfSamplesAtSite.increment();
            }
        }
    }
    logger.info("Total number of unique sites present in samples: " + totalCountsMap.size());
    //filter out sites that appear at a frequency strictly less than the provided threshold
    final AllelicCountCollection totalCounts = new AllelicCountCollection();
    numberOfSamplesMap.entrySet().stream().filter(e -> e.getValue().doubleValue() / totalNumberOfSamples >= siteFrequencyThreshold).map(e -> totalCountsMap.get(e.getKey())).forEach(totalCounts::add);
    logger.info(String.format("Number of unique sites present in samples above site frequency = %4.3f: %d", siteFrequencyThreshold, totalCounts.getCounts().size()));
    return new AllelicPanelOfNormals(totalCounts);
}
Also used : MutableInt(org.apache.commons.lang3.mutable.MutableInt) IOUtils(org.broadinstitute.hellbender.utils.io.IOUtils) AllelicCount(org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCount) HashMap(java.util.HashMap) SimpleInterval(org.broadinstitute.hellbender.utils.SimpleInterval) ParamUtils(org.broadinstitute.hellbender.utils.param.ParamUtils) File(java.io.File) ArrayList(java.util.ArrayList) List(java.util.List) Logger(org.apache.logging.log4j.Logger) Map(java.util.Map) Utils(org.broadinstitute.hellbender.utils.Utils) LogManager(org.apache.logging.log4j.LogManager) AllelicCountCollection(org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCountCollection) HashMap(java.util.HashMap) MutableInt(org.apache.commons.lang3.mutable.MutableInt) AllelicCountCollection(org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCountCollection) SimpleInterval(org.broadinstitute.hellbender.utils.SimpleInterval) File(java.io.File) AllelicCount(org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCount)

Example 30 with AllelicCountCollection

use of org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCountCollection in project gatk-protected by broadinstitute.

the class PerformJointSegmentation method doWork.

@Override
public Object doWork() {
    ParamUtils.isPositive(initialNumAFStates, "Must have at least one allele-fraction state.");
    ParamUtils.isPositive(initialNumCRStates, "Must have at least one copy-ratio state.");
    final AllelicPanelOfNormals allelicPoN = allelicPoNFile != null ? AllelicPanelOfNormals.read(allelicPoNFile) : AllelicPanelOfNormals.EMPTY_PON;
    final AllelicCountCollection acc = new AllelicCountCollection(snpCountsFile);
    final ReadCountCollection rcc;
    try {
        rcc = ReadCountCollectionUtils.parse(new File(coverageFile));
    } catch (final IOException ex) {
        throw new UserException.BadInput("could not read input file");
    }
    final JointAFCRSegmenter jointSegmenter = JointAFCRSegmenter.createJointSegmenter(initialNumCRStates, rcc, initialNumAFStates, acc, allelicPoN);
    final List<Pair<SimpleInterval, AFCRHiddenState>> segmentation = jointSegmenter.findSegments();
    final List<ACNVModeledSegment> segments = segmentation.stream().map(pair -> new ACNVModeledSegment(pair.getLeft(), errorlessPosterior(pair.getRight().getLog2CopyRatio()), errorlessPosterior(pair.getRight().getMinorAlleleFraction()))).collect(Collectors.toList());
    //TODO: make more reasonable output for ACNV 2.0
    SegmentUtils.writeACNVModeledSegmentFile(outputSegmentsFile, segments, new Genome(rcc, acc.getCounts()));
    return "SUCCESS";
}
Also used : DocumentedFeature(org.broadinstitute.barclay.help.DocumentedFeature) ExomeStandardArgumentDefinitions(org.broadinstitute.hellbender.cmdline.ExomeStandardArgumentDefinitions) CommandLineProgramProperties(org.broadinstitute.barclay.argparser.CommandLineProgramProperties) Arrays(java.util.Arrays) CommandLineProgram(org.broadinstitute.hellbender.cmdline.CommandLineProgram) CopyNumberProgramGroup(org.broadinstitute.hellbender.cmdline.programgroups.CopyNumberProgramGroup) Argument(org.broadinstitute.barclay.argparser.Argument) org.broadinstitute.hellbender.tools.exome(org.broadinstitute.hellbender.tools.exome) IOException(java.io.IOException) SimpleInterval(org.broadinstitute.hellbender.utils.SimpleInterval) Collectors(java.util.stream.Collectors) ParamUtils(org.broadinstitute.hellbender.utils.param.ParamUtils) File(java.io.File) List(java.util.List) Pair(org.apache.commons.lang3.tuple.Pair) UserException(org.broadinstitute.hellbender.exceptions.UserException) PosteriorSummary(org.broadinstitute.hellbender.utils.mcmc.PosteriorSummary) AllelicPanelOfNormals(org.broadinstitute.hellbender.tools.pon.allelic.AllelicPanelOfNormals) VisibleForTesting(com.google.common.annotations.VisibleForTesting) AllelicCountCollection(org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCountCollection) DecileCollection(org.broadinstitute.hellbender.utils.mcmc.DecileCollection) IOException(java.io.IOException) AllelicPanelOfNormals(org.broadinstitute.hellbender.tools.pon.allelic.AllelicPanelOfNormals) AllelicCountCollection(org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCountCollection) UserException(org.broadinstitute.hellbender.exceptions.UserException) File(java.io.File) Pair(org.apache.commons.lang3.tuple.Pair)

Aggregations

AllelicCountCollection (org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCountCollection)30 SimpleInterval (org.broadinstitute.hellbender.utils.SimpleInterval)16 AllelicPanelOfNormals (org.broadinstitute.hellbender.tools.pon.allelic.AllelicPanelOfNormals)14 Test (org.testng.annotations.Test)14 File (java.io.File)8 Collectors (java.util.stream.Collectors)8 AllelicCount (org.broadinstitute.hellbender.tools.exome.alleliccount.AllelicCount)8 IOException (java.io.IOException)6 List (java.util.List)6 GammaDistribution (org.apache.commons.math3.distribution.GammaDistribution)6 RandomGenerator (org.apache.commons.math3.random.RandomGenerator)6 UserException (org.broadinstitute.hellbender.exceptions.UserException)6 org.broadinstitute.hellbender.tools.exome (org.broadinstitute.hellbender.tools.exome)6 ModeledSegment (org.broadinstitute.hellbender.tools.exome.ModeledSegment)6 BaseTest (org.broadinstitute.hellbender.utils.test.BaseTest)6 AlleleFractionGlobalParameters (org.broadinstitute.hellbender.tools.exome.allelefraction.AlleleFractionGlobalParameters)5 java.util (java.util)4 Arrays (java.util.Arrays)4 Random (java.util.Random)4 IntStream (java.util.stream.IntStream)4