Search in sources :

Example 16 with NormalDistribution

use of org.apache.commons.math3.distribution.NormalDistribution in project gatk-protected by broadinstitute.

the class CoverageDropoutDetector method retrieveGaussianMixtureModelForFilteredTargets.

/** <p>Produces a Gaussian mixture model based on the difference between targets and segment means.</p>
     * <p>Filters targets to populations where more than the minProportion lie in a single segment.</p>
     * <p>Returns null if no pass filtering.  Please note that in these cases,
     * in the rest of this class, we use this to assume that a GMM is not a good model.</p>
     *
     * @param segments  -- segments with segment mean in log2 copy ratio space
     * @param targets -- targets with a log2 copy ratio estimate
     * @param minProportion -- minimum proportion of all targets that a given segment must have in order to be used
     *                      in the evaluation
     * @param numComponents -- number of components to use in the GMM.  Usually, this is 2.
     * @return  never {@code null}.  Fitting result with indications whether it converged or was even attempted.
     */
private MixtureMultivariateNormalFitResult retrieveGaussianMixtureModelForFilteredTargets(final List<ModeledSegment> segments, final TargetCollection<ReadCountRecord.SingleSampleRecord> targets, double minProportion, int numComponents) {
    // For each target in a segment that contains enough targets, normalize the difference against the segment mean
    //  and collapse the filtered targets into the copy ratio estimates.
    final List<Double> filteredTargetsSegDiff = getNumProbeFilteredTargetList(segments, targets, minProportion);
    if (filteredTargetsSegDiff.size() < numComponents) {
        return new MixtureMultivariateNormalFitResult(null, false, false);
    }
    // Assume that Apache Commons wants data points in the first dimension.
    // Note that second dimension of length 2 (instead of 1) is to wrok around funny Apache commons API.
    final double[][] filteredTargetsSegDiff2d = new double[filteredTargetsSegDiff.size()][2];
    // Convert the filtered targets into 2d array (even if second dimension is length 1).  The second dimension is
    //  uncorrelated Gaussian.  This is only to get around funny API in Apache Commons, which will throw an
    //  exception if the length of the second dimension is < 2
    final RandomGenerator rng = RandomGeneratorFactory.createRandomGenerator(new Random(RANDOM_SEED));
    final NormalDistribution nd = new NormalDistribution(rng, 0, .1);
    for (int i = 0; i < filteredTargetsSegDiff.size(); i++) {
        filteredTargetsSegDiff2d[i][0] = filteredTargetsSegDiff.get(i);
        filteredTargetsSegDiff2d[i][1] = nd.sample();
    }
    final MixtureMultivariateNormalDistribution estimateEM0 = MultivariateNormalMixtureExpectationMaximization.estimate(filteredTargetsSegDiff2d, numComponents);
    final MultivariateNormalMixtureExpectationMaximization multivariateNormalMixtureExpectationMaximization = new MultivariateNormalMixtureExpectationMaximization(filteredTargetsSegDiff2d);
    try {
        multivariateNormalMixtureExpectationMaximization.fit(estimateEM0);
    } catch (final MaxCountExceededException | ConvergenceException | SingularMatrixException e) {
        //  did not converge.  Include the model as it was when the exception was thrown.
        return new MixtureMultivariateNormalFitResult(multivariateNormalMixtureExpectationMaximization.getFittedModel(), false, true);
    }
    return new MixtureMultivariateNormalFitResult(multivariateNormalMixtureExpectationMaximization.getFittedModel(), true, true);
}
Also used : RandomGenerator(org.apache.commons.math3.random.RandomGenerator) MaxCountExceededException(org.apache.commons.math3.exception.MaxCountExceededException) MixtureMultivariateNormalDistribution(org.apache.commons.math3.distribution.MixtureMultivariateNormalDistribution) Random(java.util.Random) NormalDistribution(org.apache.commons.math3.distribution.NormalDistribution) MixtureMultivariateNormalDistribution(org.apache.commons.math3.distribution.MixtureMultivariateNormalDistribution) ConvergenceException(org.apache.commons.math3.exception.ConvergenceException) SingularMatrixException(org.apache.commons.math3.linear.SingularMatrixException) MultivariateNormalMixtureExpectationMaximization(org.apache.commons.math3.distribution.fitting.MultivariateNormalMixtureExpectationMaximization)

Example 17 with NormalDistribution

use of org.apache.commons.math3.distribution.NormalDistribution in project gatk-protected by broadinstitute.

the class CNLOHCaller method calculateDoubleGaussian.

/**
     * This function takes two half-Gaussian distributions and uses one for the values below the mode and the other for values above the
     *  mode.
     *
     *  If the mode is outside the credible interval, the calculation is done as if the interval boundary is
     *   {@link CNLOHCaller#MIN_DIST_FROM_MODE} above/below the mode.
     *
     *
     * @param val value to calculate the "pdf"
     * @param credibleMode mode of the presumed distribution
     * @param credibleLow 95% confidence interval on the low tail
     * @param credibleHigh 95% confidence interval on the high tail
     * @return pdf with a minimum value of {@link CNLOHCaller#MIN_L}
     */
@VisibleForTesting
static double calculateDoubleGaussian(final double val, final double credibleMode, final double credibleLow, final double credibleHigh) {
    final double hiDist = Math.max(credibleHigh - credibleMode, MIN_DIST_FROM_MODE);
    final double loDist = Math.max(credibleMode - credibleLow, MIN_DIST_FROM_MODE);
    return new NormalDistribution(null, credibleMode, (val >= credibleMode ? hiDist : loDist) / NUM_STD_95_CONF_INTERVAL_GAUSSIAN).density(val);
}
Also used : NormalDistribution(org.apache.commons.math3.distribution.NormalDistribution) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Example 18 with NormalDistribution

use of org.apache.commons.math3.distribution.NormalDistribution in project gatk-protected by broadinstitute.

the class SliceSamplerUnitTest method testInitialPointOutOfRange.

@Test(expectedExceptions = IllegalArgumentException.class)
public void testInitialPointOutOfRange() {
    rng.setSeed(RANDOM_SEED);
    final double mean = 5.;
    final double standardDeviation = 0.75;
    final NormalDistribution normalDistribution = new NormalDistribution(mean, standardDeviation);
    final Function<Double, Double> normalLogPDF = normalDistribution::logDensity;
    final double xInitial = -10.;
    final double xMin = 0.;
    final double xMax = 1.;
    final double width = 0.5;
    final SliceSampler normalSampler = new SliceSampler(rng, normalLogPDF, xMin, xMax, width);
    normalSampler.sample(xInitial);
}
Also used : NormalDistribution(org.apache.commons.math3.distribution.NormalDistribution) Test(org.testng.annotations.Test)

Example 19 with NormalDistribution

use of org.apache.commons.math3.distribution.NormalDistribution in project gatk-protected by broadinstitute.

the class SliceSamplerUnitTest method testSliceSamplingOfNormalDistribution.

/**
     * Test slice sampling of a normal distribution.  Checks that input mean and standard deviation are recovered
     * by 10000 samples to a relative error of 0.5% and 2%, respectively.
     */
@Test
public void testSliceSamplingOfNormalDistribution() {
    rng.setSeed(RANDOM_SEED);
    final double mean = 5.;
    final double standardDeviation = 0.75;
    final NormalDistribution normalDistribution = new NormalDistribution(mean, standardDeviation);
    final Function<Double, Double> normalLogPDF = normalDistribution::logDensity;
    final double xInitial = 1.;
    final double xMin = Double.NEGATIVE_INFINITY;
    final double xMax = Double.POSITIVE_INFINITY;
    final double width = 0.5;
    final int numSamples = 10000;
    final SliceSampler normalSampler = new SliceSampler(rng, normalLogPDF, xMin, xMax, width);
    final double[] samples = Doubles.toArray(normalSampler.sample(xInitial, numSamples));
    final double sampleMean = new Mean().evaluate(samples);
    final double sampleStandardDeviation = new StandardDeviation().evaluate(samples);
    Assert.assertEquals(relativeError(sampleMean, mean), 0., 0.005);
    Assert.assertEquals(relativeError(sampleStandardDeviation, standardDeviation), 0., 0.02);
}
Also used : Mean(org.apache.commons.math3.stat.descriptive.moment.Mean) NormalDistribution(org.apache.commons.math3.distribution.NormalDistribution) StandardDeviation(org.apache.commons.math3.stat.descriptive.moment.StandardDeviation) Test(org.testng.annotations.Test)

Example 20 with NormalDistribution

use of org.apache.commons.math3.distribution.NormalDistribution in project gatk by broadinstitute.

the class CoverageDropoutDetector method retrieveGaussianMixtureModelForFilteredTargets.

/** <p>Produces a Gaussian mixture model based on the difference between targets and segment means.</p>
     * <p>Filters targets to populations where more than the minProportion lie in a single segment.</p>
     * <p>Returns null if no pass filtering.  Please note that in these cases,
     * in the rest of this class, we use this to assume that a GMM is not a good model.</p>
     *
     * @param segments  -- segments with segment mean in log2 copy ratio space
     * @param targets -- targets with a log2 copy ratio estimate
     * @param minProportion -- minimum proportion of all targets that a given segment must have in order to be used
     *                      in the evaluation
     * @param numComponents -- number of components to use in the GMM.  Usually, this is 2.
     * @return  never {@code null}.  Fitting result with indications whether it converged or was even attempted.
     */
private MixtureMultivariateNormalFitResult retrieveGaussianMixtureModelForFilteredTargets(final List<ModeledSegment> segments, final TargetCollection<ReadCountRecord.SingleSampleRecord> targets, double minProportion, int numComponents) {
    // For each target in a segment that contains enough targets, normalize the difference against the segment mean
    //  and collapse the filtered targets into the copy ratio estimates.
    final List<Double> filteredTargetsSegDiff = getNumProbeFilteredTargetList(segments, targets, minProportion);
    if (filteredTargetsSegDiff.size() < numComponents) {
        return new MixtureMultivariateNormalFitResult(null, false, false);
    }
    // Assume that Apache Commons wants data points in the first dimension.
    // Note that second dimension of length 2 (instead of 1) is to wrok around funny Apache commons API.
    final double[][] filteredTargetsSegDiff2d = new double[filteredTargetsSegDiff.size()][2];
    // Convert the filtered targets into 2d array (even if second dimension is length 1).  The second dimension is
    //  uncorrelated Gaussian.  This is only to get around funny API in Apache Commons, which will throw an
    //  exception if the length of the second dimension is < 2
    final RandomGenerator rng = RandomGeneratorFactory.createRandomGenerator(new Random(RANDOM_SEED));
    final NormalDistribution nd = new NormalDistribution(rng, 0, .1);
    for (int i = 0; i < filteredTargetsSegDiff.size(); i++) {
        filteredTargetsSegDiff2d[i][0] = filteredTargetsSegDiff.get(i);
        filteredTargetsSegDiff2d[i][1] = nd.sample();
    }
    final MixtureMultivariateNormalDistribution estimateEM0 = MultivariateNormalMixtureExpectationMaximization.estimate(filteredTargetsSegDiff2d, numComponents);
    final MultivariateNormalMixtureExpectationMaximization multivariateNormalMixtureExpectationMaximization = new MultivariateNormalMixtureExpectationMaximization(filteredTargetsSegDiff2d);
    try {
        multivariateNormalMixtureExpectationMaximization.fit(estimateEM0);
    } catch (final MaxCountExceededException | ConvergenceException | SingularMatrixException e) {
        //  did not converge.  Include the model as it was when the exception was thrown.
        return new MixtureMultivariateNormalFitResult(multivariateNormalMixtureExpectationMaximization.getFittedModel(), false, true);
    }
    return new MixtureMultivariateNormalFitResult(multivariateNormalMixtureExpectationMaximization.getFittedModel(), true, true);
}
Also used : RandomGenerator(org.apache.commons.math3.random.RandomGenerator) MaxCountExceededException(org.apache.commons.math3.exception.MaxCountExceededException) MixtureMultivariateNormalDistribution(org.apache.commons.math3.distribution.MixtureMultivariateNormalDistribution) Random(java.util.Random) NormalDistribution(org.apache.commons.math3.distribution.NormalDistribution) MixtureMultivariateNormalDistribution(org.apache.commons.math3.distribution.MixtureMultivariateNormalDistribution) ConvergenceException(org.apache.commons.math3.exception.ConvergenceException) SingularMatrixException(org.apache.commons.math3.linear.SingularMatrixException) MultivariateNormalMixtureExpectationMaximization(org.apache.commons.math3.distribution.fitting.MultivariateNormalMixtureExpectationMaximization)

Aggregations

NormalDistribution (org.apache.commons.math3.distribution.NormalDistribution)25 Random (java.util.Random)6 RandomGenerator (org.apache.commons.math3.random.RandomGenerator)6 ArrayList (java.util.ArrayList)5 Test (org.testng.annotations.Test)5 AbstractRealDistribution (org.apache.commons.math3.distribution.AbstractRealDistribution)4 SimpleInterval (org.broadinstitute.hellbender.utils.SimpleInterval)4 RegDataSet (edu.neu.ccs.pyramid.dataset.RegDataSet)3 UniformRealDistribution (org.apache.commons.math3.distribution.UniformRealDistribution)3 VisibleForTesting (com.google.common.annotations.VisibleForTesting)2 MixtureMultivariateNormalDistribution (org.apache.commons.math3.distribution.MixtureMultivariateNormalDistribution)2 MultivariateNormalMixtureExpectationMaximization (org.apache.commons.math3.distribution.fitting.MultivariateNormalMixtureExpectationMaximization)2 ConvergenceException (org.apache.commons.math3.exception.ConvergenceException)2 MaxCountExceededException (org.apache.commons.math3.exception.MaxCountExceededException)2 SingularMatrixException (org.apache.commons.math3.linear.SingularMatrixException)2 Mean (org.apache.commons.math3.stat.descriptive.moment.Mean)2 StandardDeviation (org.apache.commons.math3.stat.descriptive.moment.StandardDeviation)2 MultiLabelClfDataSet (edu.neu.ccs.pyramid.dataset.MultiLabelClfDataSet)1 ValueType (io.druid.segment.column.ValueType)1 AbstractIntegerDistribution (org.apache.commons.math3.distribution.AbstractIntegerDistribution)1