Search in sources :

Example 56 with Variance

use of org.apache.commons.math3.stat.descriptive.moment.Variance in project gatk-protected by broadinstitute.

the class CopyRatioModellerUnitTest method testRunMCMCOnCopyRatioSegmentedGenome.

/**
     * Tests Bayesian inference of the copy-ratio model via MCMC.
     * <p>
     *     Recovery of input values for the variance and outlier-probability global parameters is checked.
     *     In particular, the true input value of the variance must fall within
     *     {@link CopyRatioModellerUnitTest#MULTIPLES_OF_SD_THRESHOLD}
     *     standard deviations of the posterior mean and the standard deviation of the posterior must agree
     *     with the analytic value to within a relative error of
     *     {@link CopyRatioModellerUnitTest#RELATIVE_ERROR_THRESHOLD} for 250 samples
     *     (after 250 burn-in samples have been discarded).  Similar criteria are applied
     *     to the recovery of the true input value for the outlier probability.
     * </p>
     * <p>
     *     Furthermore, the number of truth values for the segment-level means falling outside confidence intervals of
     *     1-sigma, 2-sigma, and 3-sigma given by the posteriors in each segment should be roughly consistent with
     *     a normal distribution (i.e., ~32, ~5, and ~0, respectively; we allow for errors of
     *     {@link CopyRatioModellerUnitTest#DELTA_NUMBER_OF_MEANS_ALLOWED_OUTSIDE_1_SIGMA},
     *     {@link CopyRatioModellerUnitTest#DELTA_NUMBER_OF_MEANS_ALLOWED_OUTSIDE_2_SIGMA}, and
     *     {@link CopyRatioModellerUnitTest#DELTA_NUMBER_OF_MEANS_ALLOWED_OUTSIDE_3_SIGMA}, respectively).
     *     The mean of the standard deviations of the posteriors for the segment-level means should also be
     *     recovered to within a relative error of {@link CopyRatioModellerUnitTest#RELATIVE_ERROR_THRESHOLD}.
     * </p>
     * <p>
     *     Finally, the recovered values for the latent outlier-indicator parameters should agree with those used to
     *     generate the data.  For each indicator, the recovered value (i.e., outlier or non-outlier) is taken to be
     *     that given by the majority of posterior samples.  We require that at least
     *     {@link CopyRatioModellerUnitTest#FRACTION_OF_OUTLIER_INDICATORS_CORRECT_THRESHOLD}
     *     of the 10000 indicators are recovered correctly.
     * </p>
     * <p>
     *     With these specifications, this unit test is not overly brittle (i.e., it should pass for a large majority
     *     of randomly generated data sets), but it is still brittle enough to check for correctness of the sampling
     *     (for example, specifying a sufficiently incorrect likelihood will cause the test to fail).
     * </p>
     */
@Test
public void testRunMCMCOnCopyRatioSegmentedGenome() throws IOException {
    final JavaSparkContext ctx = SparkContextFactory.getTestSparkContext();
    LoggingUtils.setLoggingLevel(Log.LogLevel.INFO);
    //load data (coverages and number of targets in each segment)
    final ReadCountCollection coverage = ReadCountCollectionUtils.parse(COVERAGES_FILE);
    //Genome with no SNPs
    final Genome genome = new Genome(coverage, Collections.emptyList());
    final SegmentedGenome segmentedGenome = new SegmentedGenome(SEGMENT_FILE, genome);
    //run MCMC
    final CopyRatioModeller modeller = new CopyRatioModeller(segmentedGenome);
    modeller.fitMCMC(NUM_SAMPLES, NUM_BURN_IN);
    //check statistics of global-parameter posterior samples (i.e., posterior mode and standard deviation)
    final Map<CopyRatioParameter, PosteriorSummary> globalParameterPosteriorSummaries = modeller.getGlobalParameterPosteriorSummaries(CREDIBLE_INTERVAL_ALPHA, ctx);
    final PosteriorSummary variancePosteriorSummary = globalParameterPosteriorSummaries.get(CopyRatioParameter.VARIANCE);
    final double variancePosteriorCenter = variancePosteriorSummary.getCenter();
    final double variancePosteriorStandardDeviation = (variancePosteriorSummary.getUpper() - variancePosteriorSummary.getLower()) / 2;
    Assert.assertEquals(Math.abs(variancePosteriorCenter - VARIANCE_TRUTH), 0., MULTIPLES_OF_SD_THRESHOLD * VARIANCE_POSTERIOR_STANDARD_DEVIATION_TRUTH);
    Assert.assertEquals(relativeError(variancePosteriorStandardDeviation, VARIANCE_POSTERIOR_STANDARD_DEVIATION_TRUTH), 0., RELATIVE_ERROR_THRESHOLD);
    final PosteriorSummary outlierProbabilityPosteriorSummary = globalParameterPosteriorSummaries.get(CopyRatioParameter.OUTLIER_PROBABILITY);
    final double outlierProbabilityPosteriorCenter = outlierProbabilityPosteriorSummary.getCenter();
    final double outlierProbabilityPosteriorStandardDeviation = (outlierProbabilityPosteriorSummary.getUpper() - outlierProbabilityPosteriorSummary.getLower()) / 2;
    Assert.assertEquals(Math.abs(outlierProbabilityPosteriorCenter - OUTLIER_PROBABILITY_TRUTH), 0., MULTIPLES_OF_SD_THRESHOLD * OUTLIER_PROBABILITY_POSTERIOR_STANDARD_DEVIATION_TRUTH);
    Assert.assertEquals(relativeError(outlierProbabilityPosteriorStandardDeviation, OUTLIER_PROBABILITY_POSTERIOR_STANDARD_DEVIATION_TRUTH), 0., RELATIVE_ERROR_THRESHOLD);
    //check statistics of segment-mean posterior samples (i.e., posterior means and standard deviations)
    final List<Double> meansTruth = loadList(MEANS_TRUTH_FILE, Double::parseDouble);
    int numMeansOutsideOneSigma = 0;
    int numMeansOutsideTwoSigma = 0;
    int numMeansOutsideThreeSigma = 0;
    final int numSegments = meansTruth.size();
    //segment-mean posteriors are expected to be Gaussian, so PosteriorSummary for
    // {@link CopyRatioModellerUnitTest#CREDIBLE_INTERVAL_ALPHA}=0.32 is
    //(posterior mean, posterior mean - posterior standard devation, posterior mean + posterior standard deviation)
    final List<PosteriorSummary> meanPosteriorSummaries = modeller.getSegmentMeansPosteriorSummaries(CREDIBLE_INTERVAL_ALPHA, ctx);
    final double[] meanPosteriorStandardDeviations = new double[numSegments];
    for (int segment = 0; segment < numSegments; segment++) {
        final double meanPosteriorCenter = meanPosteriorSummaries.get(segment).getCenter();
        final double meanPosteriorStandardDeviation = (meanPosteriorSummaries.get(segment).getUpper() - meanPosteriorSummaries.get(segment).getLower()) / 2.;
        meanPosteriorStandardDeviations[segment] = meanPosteriorStandardDeviation;
        final double absoluteDifferenceFromTruth = Math.abs(meanPosteriorCenter - meansTruth.get(segment));
        if (absoluteDifferenceFromTruth > meanPosteriorStandardDeviation) {
            numMeansOutsideOneSigma++;
        }
        if (absoluteDifferenceFromTruth > 2 * meanPosteriorStandardDeviation) {
            numMeansOutsideTwoSigma++;
        }
        if (absoluteDifferenceFromTruth > 3 * meanPosteriorStandardDeviation) {
            numMeansOutsideThreeSigma++;
        }
    }
    final double meanPosteriorStandardDeviationsMean = new Mean().evaluate(meanPosteriorStandardDeviations);
    Assert.assertEquals(numMeansOutsideOneSigma, 100 - 68, DELTA_NUMBER_OF_MEANS_ALLOWED_OUTSIDE_1_SIGMA);
    Assert.assertEquals(numMeansOutsideTwoSigma, 100 - 95, DELTA_NUMBER_OF_MEANS_ALLOWED_OUTSIDE_2_SIGMA);
    Assert.assertTrue(numMeansOutsideThreeSigma <= DELTA_NUMBER_OF_MEANS_ALLOWED_OUTSIDE_3_SIGMA);
    Assert.assertEquals(relativeError(meanPosteriorStandardDeviationsMean, MEAN_POSTERIOR_STANDARD_DEVIATION_MEAN_TRUTH), 0., RELATIVE_ERROR_THRESHOLD);
    //check accuracy of latent outlier-indicator posterior samples
    final List<CopyRatioState.OutlierIndicators> outlierIndicatorSamples = modeller.getOutlierIndicatorsSamples();
    int numIndicatorsCorrect = 0;
    final int numIndicatorSamples = outlierIndicatorSamples.size();
    final List<Integer> outlierIndicatorsTruthAsInt = loadList(OUTLIER_INDICATORS_TRUTH_FILE, Integer::parseInt);
    final List<Boolean> outlierIndicatorsTruth = outlierIndicatorsTruthAsInt.stream().map(i -> i == 1).collect(Collectors.toList());
    for (int target = 0; target < coverage.targets().size(); target++) {
        int numSamplesOutliers = 0;
        for (final CopyRatioState.OutlierIndicators sample : outlierIndicatorSamples) {
            if (sample.get(target)) {
                numSamplesOutliers++;
            }
        }
        //take predicted state of indicator to be given by the majority of samples
        if ((numSamplesOutliers >= numIndicatorSamples / 2.) == outlierIndicatorsTruth.get(target)) {
            numIndicatorsCorrect++;
        }
    }
    final double fractionOfOutlierIndicatorsCorrect = (double) numIndicatorsCorrect / coverage.targets().size();
    Assert.assertTrue(fractionOfOutlierIndicatorsCorrect >= FRACTION_OF_OUTLIER_INDICATORS_CORRECT_THRESHOLD);
}
Also used : BaseTest(org.broadinstitute.hellbender.utils.test.BaseTest) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) Genome(org.broadinstitute.hellbender.tools.exome.Genome) FileUtils(org.apache.commons.io.FileUtils) Test(org.testng.annotations.Test) IOException(java.io.IOException) Function(java.util.function.Function) Collectors(java.util.stream.Collectors) File(java.io.File) Mean(org.apache.commons.math3.stat.descriptive.moment.Mean) List(java.util.List) Log(htsjdk.samtools.util.Log) ReadCountCollection(org.broadinstitute.hellbender.tools.exome.ReadCountCollection) UserException(org.broadinstitute.hellbender.exceptions.UserException) Assert(org.testng.Assert) PosteriorSummary(org.broadinstitute.hellbender.utils.mcmc.PosteriorSummary) ReadCountCollectionUtils(org.broadinstitute.hellbender.tools.exome.ReadCountCollectionUtils) Map(java.util.Map) SparkContextFactory(org.broadinstitute.hellbender.engine.spark.SparkContextFactory) SegmentedGenome(org.broadinstitute.hellbender.tools.exome.SegmentedGenome) LoggingUtils(org.broadinstitute.hellbender.utils.LoggingUtils) Collections(java.util.Collections) Mean(org.apache.commons.math3.stat.descriptive.moment.Mean) ReadCountCollection(org.broadinstitute.hellbender.tools.exome.ReadCountCollection) PosteriorSummary(org.broadinstitute.hellbender.utils.mcmc.PosteriorSummary) SegmentedGenome(org.broadinstitute.hellbender.tools.exome.SegmentedGenome) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) Genome(org.broadinstitute.hellbender.tools.exome.Genome) SegmentedGenome(org.broadinstitute.hellbender.tools.exome.SegmentedGenome) BaseTest(org.broadinstitute.hellbender.utils.test.BaseTest) Test(org.testng.annotations.Test)

Example 57 with Variance

use of org.apache.commons.math3.stat.descriptive.moment.Variance in project uPortal by Jasig.

the class JpaStatisticalSummary method getPopulationVariance.

/**
 * Returns the <a href="http://en.wikibooks.org/wiki/Statistics/Summary/Variance">population
 * variance</a> of the values that have been added.
 *
 * <p>Double.NaN is returned if no values have been added.
 *
 * @return the population variance
 */
@Override
public double getPopulationVariance() {
    Variance populationVariance = new Variance(_getSecondMoment());
    populationVariance.setBiasCorrected(false);
    return populationVariance.getResult();
}
Also used : Variance(org.apache.commons.math3.stat.descriptive.moment.Variance)

Example 58 with Variance

use of org.apache.commons.math3.stat.descriptive.moment.Variance in project knime-core by knime.

the class CovarianceMatrixCalculator method calculateCovarianceMatrix.

/**
 * Computes the covariance matrix and puts the result in the given (optional) data container and additionally
 * returns a in memory representation. The data container is expected to have the data table spec returned at
 * {@link #getResultSpec()}. The implementation traverses the data once.
 *
 * @param exec the execution container
 * @param inTable input data
 * @param tableSize the data table size
 * @param resultDataContainer optional result data container
 * @return the covariance matrix
 * @throws CanceledExecutionException if the user canceled the execution
 */
public RealMatrix calculateCovarianceMatrix(final ExecutionMonitor exec, final DataTable inTable, final long tableSize, final DataContainer resultDataContainer) throws CanceledExecutionException {
    checkArgument(m_targetSpec.equalStructure(inTable.getDataTableSpec()), "Target tables spec is different from the one given in the constructor!");
    if (resultDataContainer != null) {
        checkArgument(m_resultSpec.equalStructure(resultDataContainer.getTableSpec()), "Result tables spec is invalid!");
    }
    final ExecutionMonitor computingProgress = exec.createSubProgress(resultDataContainer != null ? 0.8 : 1);
    List<StorelessCovariance> covariancesList = new ArrayList<>();
    // create covariance pairs
    for (int i = 0; i < m_indexes.length; i++) {
        for (int j = i; j < m_indexes.length; j++) {
            covariancesList.add(new StorelessCovariance(2));
        }
    }
    // compute rest of co-variance matrix
    int rowCount = 0;
    double[] buffer = new double[2];
    for (DataRow dataRow : inTable) {
        for (int i = 0; i < m_indexes.length; i++) {
            final int outerIndex = m_indexes[i];
            final DataCell outerCell = dataRow.getCell(outerIndex);
            if (outerCell.isMissing()) {
                // skip missing values
                continue;
            }
            final double outerDouble = ((DoubleValue) outerCell).getDoubleValue();
            for (int j = i; j < m_indexes.length; j++) {
                final int innerIndex = m_indexes[j];
                final DataCell innerCell = dataRow.getCell(innerIndex);
                if (innerCell.isMissing()) {
                    // skip missing values
                    continue;
                }
                final double innerDouble = ((DoubleValue) innerCell).getDoubleValue();
                buffer[0] = outerDouble;
                buffer[1] = innerDouble;
                int covListIndex = index(m_indexes.length, i, j);
                covariancesList.get(covListIndex).increment(buffer);
            }
        }
        computingProgress.setProgress(rowCount++ / (double) tableSize, "Calculate covariance values, processing row: '" + dataRow.getKey() + "'");
        computingProgress.checkCanceled();
    }
    // Copy the storeless covariances to a real matrix
    RealMatrix covMatrix = new Array2DRowRealMatrix(m_indexes.length, m_indexes.length);
    for (int i = 0; i < m_indexes.length; i++) {
        for (int j = i; j < m_indexes.length; j++) {
            int covListIndex = index(m_indexes.length, i, j);
            double covValue;
            try {
                covValue = i == j ? covariancesList.get(covListIndex).getCovariance(1, 1) : covariancesList.get(covListIndex).getCovariance(0, 1);
            } catch (NumberIsTooSmallException e) {
                throw new IllegalArgumentException(String.format("There were not enough valid values to " + "compute covariance between columns: '%s' and '%s'.", inTable.getDataTableSpec().getColumnSpec(m_indexes[i]).getName(), inTable.getDataTableSpec().getColumnSpec(m_indexes[j]).getName()), e);
            }
            covMatrix.setEntry(i, j, covValue);
            covMatrix.setEntry(j, i, covValue);
        }
    }
    if (resultDataContainer != null) {
        exec.setProgress("Writing matrix to data table");
        final ExecutionMonitor writingProgress = exec.createSubProgress(0.2);
        for (int i = 0; i < covMatrix.getRowDimension(); i++) {
            resultDataContainer.addRowToTable(new DefaultRow(RowKey.toRowKeys(resultDataContainer.getTableSpec().getColumnSpec(i).getName())[0], covMatrix.getRow(i)));
            exec.checkCanceled();
            writingProgress.setProgress((double) i / covMatrix.getRowDimension(), "Writing row: " + resultDataContainer.getTableSpec().getColumnSpec(i).getName());
        }
    }
    return covMatrix;
}
Also used : ArrayList(java.util.ArrayList) NumberIsTooSmallException(org.apache.commons.math3.exception.NumberIsTooSmallException) StorelessCovariance(org.apache.commons.math3.stat.correlation.StorelessCovariance) DataRow(org.knime.core.data.DataRow) Array2DRowRealMatrix(org.apache.commons.math3.linear.Array2DRowRealMatrix) RealMatrix(org.apache.commons.math3.linear.RealMatrix) Array2DRowRealMatrix(org.apache.commons.math3.linear.Array2DRowRealMatrix) DoubleValue(org.knime.core.data.DoubleValue) DataCell(org.knime.core.data.DataCell) ExecutionMonitor(org.knime.core.node.ExecutionMonitor) DefaultRow(org.knime.core.data.def.DefaultRow)

Example 59 with Variance

use of org.apache.commons.math3.stat.descriptive.moment.Variance in project knime-core by knime.

the class CovarianceMatrixCalculatorTest method computeCovarianceOfRandomDataWithMissingValues.

/**
 * Tests the covariance computation on data with missing values
 *
 * @throws InvalidSettingsException
 * @throws CanceledExecutionException
 */
@Test
public void computeCovarianceOfRandomDataWithMissingValues() throws InvalidSettingsException, CanceledExecutionException {
    long currentTimeMillis = System.currentTimeMillis();
    System.out.println("Mahalanobis test random seed: " + currentTimeMillis);
    final Random random = new Random(47);
    double[][] data = new double[10][];
    BufferedDataContainer inTableCont = generateData(random, data, SPEC_2);
    // add two rows with missing values, at the end both should be ignored
    DataCell[] row = new DataCell[2];
    row[0] = new DoubleCell(random.nextDouble());
    row[1] = DataType.getMissingCell();
    inTableCont.addRowToTable(new DefaultRow(new RowKey("Missing!1"), row));
    row[1] = new DoubleCell(random.nextDouble());
    row[0] = DataType.getMissingCell();
    inTableCont.addRowToTable(new DefaultRow(new RowKey("Missing!2"), row));
    inTableCont.close();
    BufferedDataTable inTable = inTableCont.getTable();
    // As the missing row should be ignored the test the covariance matrix computation should be the same
    CovarianceMatrixCalculator covMatrixCalculator = new CovarianceMatrixCalculator(SPEC_2, SPEC_2.getColumnNames());
    BufferedDataContainer covDataContainer = m_exec.createDataContainer(covMatrixCalculator.getResultSpec());
    RealMatrix covMatrixUnderTest = covMatrixCalculator.computeCovarianceMatrix(m_exec, inTable, covDataContainer);
    covDataContainer.close();
    Covariance covariance = new Covariance(data);
    RealMatrix referenceCovarianceMatrix = covariance.getCovarianceMatrix();
    BufferedDataTable covTableUnderTest = covDataContainer.getTable();
    // The diagonal is the variance which also changes considering missing values...
    // but we check only the part of the covariance matrix at the top right triangle.
    assertCovarianceMatrixEquality(covMatrixUnderTest, referenceCovarianceMatrix, covTableUnderTest, SPEC_2, false);
}
Also used : BufferedDataContainer(org.knime.core.node.BufferedDataContainer) RowKey(org.knime.core.data.RowKey) DoubleCell(org.knime.core.data.def.DoubleCell) Random(java.util.Random) RealMatrix(org.apache.commons.math3.linear.RealMatrix) Covariance(org.apache.commons.math3.stat.correlation.Covariance) BufferedDataTable(org.knime.core.node.BufferedDataTable) DataCell(org.knime.core.data.DataCell) DefaultRow(org.knime.core.data.def.DefaultRow) Test(org.junit.Test)

Example 60 with Variance

use of org.apache.commons.math3.stat.descriptive.moment.Variance in project vcell by virtualcell.

the class FluorescenceNoiseTest method doit.

private void doit() throws ImageException {
    int[] sizes = new int[] { 1, 2, 4, 8, 16, 32, 64, 128, 256 };
    for (int imageSize : sizes) {
        ISize size = new ISize(imageSize, imageSize, 1);
        Extent extent = new Extent(1, 1, 1);
        Origin origin = new Origin(0, 0, 0);
        // NormalizedSampleFunction sampleFunction = NormalizedSampleFunction.createUniform("uniformROI", origin, extent, size);
        NormalizedSampleFunction sampleFunction = NormalizedSampleFunction.fromGaussian("testGaussian", origin, extent, size, 0.5, 0.2, 0.1);
        SampleStatistics[] samples = new SampleStatistics[NUM_TRIALS];
        for (int i = 0; i < NUM_TRIALS; i++) {
            UShortImage rawImage = getUniformFluorescenceImage(size, extent, origin, MEAN_INTENSITY);
            samples[i] = sampleFunction.sample(rawImage);
        }
        Mean mean = new Mean();
        Variance var = new Variance();
        double[] weightedMeans = getWeightedMeans(samples);
        double[] weightedVariances = getWeightedVariances(samples);
        double weightedMeansVariance = var.evaluate(weightedMeans);
        double weightedMeansMean = mean.evaluate(weightedMeans);
        double weightedVarVariance = var.evaluate(weightedVariances);
        double weightedVarMean = mean.evaluate(weightedVariances);
        double V1 = samples[0].sumOfWeights;
        double V2 = samples[0].sumOfWeightsSquared;
        System.out.println("image is " + imageSize + "x" + imageSize + ", V1=" + V1 + ", V2=" + V2 + ", numTrials=" + NUM_TRIALS + ", sample means (mu=" + weightedMeansMean + ",s=" + weightedMeansVariance + "), sample variances (mu=" + weightedVarMean + ",s=" + weightedVarVariance);
    }
}
Also used : Origin(org.vcell.util.Origin) Mean(org.apache.commons.math3.stat.descriptive.moment.Mean) Extent(org.vcell.util.Extent) ISize(org.vcell.util.ISize) SampleStatistics(org.vcell.vmicro.workflow.data.NormalizedSampleFunction.SampleStatistics) UShortImage(cbit.vcell.VirtualMicroscopy.UShortImage) Variance(org.apache.commons.math3.stat.descriptive.moment.Variance) NormalizedSampleFunction(org.vcell.vmicro.workflow.data.NormalizedSampleFunction)

Aggregations

Collectors (java.util.stream.Collectors)24 IntStream (java.util.stream.IntStream)24 ParamUtils (org.broadinstitute.hellbender.utils.param.ParamUtils)22 Nonnull (javax.annotation.Nonnull)20 RandomGenerator (org.apache.commons.math3.random.RandomGenerator)18 Variance (org.apache.commons.math3.stat.descriptive.moment.Variance)18 List (java.util.List)16 FastMath (org.apache.commons.math3.util.FastMath)16 Utils (org.broadinstitute.hellbender.utils.Utils)16 INDArray (org.nd4j.linalg.api.ndarray.INDArray)16 Function (java.util.function.Function)15 Arrays (java.util.Arrays)14 Nullable (javax.annotation.Nullable)14 ImmutablePair (org.apache.commons.lang3.tuple.ImmutablePair)14 RealMatrix (org.apache.commons.math3.linear.RealMatrix)14 Logger (org.apache.logging.log4j.Logger)14 GATKException (org.broadinstitute.hellbender.exceptions.GATKException)14 UserException (org.broadinstitute.hellbender.exceptions.UserException)14 Nd4j (org.nd4j.linalg.factory.Nd4j)14 NDArrayIndex (org.nd4j.linalg.indexing.NDArrayIndex)14