Search in sources :

Example 11 with Percentile

use of org.apache.commons.math3.stat.descriptive.rank.Percentile in project gatk-protected by broadinstitute.

the class HDF5PCACoveragePoNCreationUtils method subsetReadCountsToUsableTargets.

/**
     * Subsets targets in the input count to the usable ones based on the percentile threshold indicated
     * by the user.
     *
     * <p>
     *     It returns a pair of object, where the left one is the updated read-counts with only the usable
     *     targets, and the right one is the corresponding target factors.
     * </p>
     *
     * @param readCounts the input read-counts.
     * @param targetFactorPercentileThreshold the minimum median count percentile under which targets are not considered useful.
     * @return never {@code null}.
     */
@VisibleForTesting
static Pair<ReadCountCollection, double[]> subsetReadCountsToUsableTargets(final ReadCountCollection readCounts, final double targetFactorPercentileThreshold, final Logger logger) {
    final double[] targetFactors = calculateTargetFactors(readCounts);
    final double threshold = new Percentile(targetFactorPercentileThreshold).evaluate(targetFactors);
    final List<Target> targetByIndex = readCounts.targets();
    final Set<Target> result = IntStream.range(0, targetFactors.length).filter(i -> targetFactors[i] >= threshold).mapToObj(targetByIndex::get).collect(Collectors.toCollection(LinkedHashSet::new));
    if (result.size() == targetByIndex.size()) {
        logger.info(String.format("All %d targets are kept", targetByIndex.size()));
        return new ImmutablePair<>(readCounts, targetFactors);
    } else {
        final int discardedCount = targetFactors.length - result.size();
        logger.info(String.format("Discarded %d target(s) out of %d with factors below %.2g (%.2f percentile)", discardedCount, targetFactors.length, threshold, targetFactorPercentileThreshold));
        final double[] targetFactorSubset = DoubleStream.of(targetFactors).filter(i -> i >= threshold).toArray();
        return new ImmutablePair<>(readCounts.subsetTargets(result), targetFactorSubset);
    }
}
Also used : IntStream(java.util.stream.IntStream) DefaultRealMatrixChangingVisitor(org.apache.commons.math3.linear.DefaultRealMatrixChangingVisitor) SVD(org.broadinstitute.hellbender.utils.svd.SVD) java.util(java.util) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) MatrixSummaryUtils(org.broadinstitute.hellbender.utils.MatrixSummaryUtils) ParamUtils(org.broadinstitute.hellbender.utils.param.ParamUtils) Pair(org.apache.commons.lang3.tuple.Pair) Median(org.apache.commons.math3.stat.descriptive.rank.Median) HDF5File(org.broadinstitute.hdf5.HDF5File) IOUtils(org.broadinstitute.hellbender.utils.io.IOUtils) org.broadinstitute.hellbender.tools.exome(org.broadinstitute.hellbender.tools.exome) IOException(java.io.IOException) Collectors(java.util.stream.Collectors) Sets(com.google.common.collect.Sets) ImmutablePair(org.apache.commons.lang3.tuple.ImmutablePair) File(java.io.File) DoubleStream(java.util.stream.DoubleStream) Percentile(org.apache.commons.math3.stat.descriptive.rank.Percentile) Logger(org.apache.logging.log4j.Logger) MathUtils(org.broadinstitute.hellbender.utils.MathUtils) UserException(org.broadinstitute.hellbender.exceptions.UserException) SVDFactory(org.broadinstitute.hellbender.utils.svd.SVDFactory) Utils(org.broadinstitute.hellbender.utils.Utils) RealMatrix(org.apache.commons.math3.linear.RealMatrix) VisibleForTesting(com.google.common.annotations.VisibleForTesting) LogManager(org.apache.logging.log4j.LogManager) Percentile(org.apache.commons.math3.stat.descriptive.rank.Percentile) ImmutablePair(org.apache.commons.lang3.tuple.ImmutablePair) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Example 12 with Percentile

use of org.apache.commons.math3.stat.descriptive.rank.Percentile in project gatk-protected by broadinstitute.

the class ReadCountCollectionUtilsUnitTest method testTruncateExtremeCounts.

@Test(dataProvider = "readCountAndPercentileData")
public void testTruncateExtremeCounts(final ReadCountCollection readCount, final double percentile) {
    final RealMatrix counts = readCount.counts();
    final double[] allCounts = Stream.of(counts.getData()).flatMap(row -> DoubleStream.of(row).boxed()).mapToDouble(Double::doubleValue).toArray();
    final double bottom = new Percentile(percentile).evaluate(allCounts);
    final double top = new Percentile(100 - percentile).evaluate(allCounts);
    final double[][] expected = new double[counts.getRowDimension()][];
    for (int i = 0; i < expected.length; i++) {
        expected[i] = DoubleStream.of(counts.getRow(i)).map(d -> d < bottom ? bottom : (d > top) ? top : d).toArray();
    }
    ReadCountCollectionUtils.truncateExtremeCounts(readCount, percentile, NULL_LOGGER);
    final RealMatrix newCounts = readCount.counts();
    Assert.assertEquals(newCounts.getRowDimension(), newCounts.getRowDimension());
    Assert.assertEquals(newCounts.getColumnDimension(), newCounts.getColumnDimension());
    for (int i = 0; i < expected.length; i++) {
        for (int j = 0; j < expected[i].length; j++) {
            Assert.assertEquals(newCounts.getEntry(i, j), expected[i][j]);
        }
    }
}
Also used : Percentile(org.apache.commons.math3.stat.descriptive.rank.Percentile) Array2DRowRealMatrix(org.apache.commons.math3.linear.Array2DRowRealMatrix) RealMatrix(org.apache.commons.math3.linear.RealMatrix) Test(org.testng.annotations.Test)

Example 13 with Percentile

use of org.apache.commons.math3.stat.descriptive.rank.Percentile in project gatk-protected by broadinstitute.

the class ReadCountCollectionUtilsUnitTest method testExtremeMedianColumnsData.

@Test(dataProvider = "readCountAndPercentileData")
public void testExtremeMedianColumnsData(final ReadCountCollection readCount, final double percentile) {
    final Median median = new Median();
    final RealMatrix counts = readCount.counts();
    final double[] columnMedians = IntStream.range(0, counts.getColumnDimension()).mapToDouble(i -> median.evaluate(counts.getColumn(i))).toArray();
    final double top = new Percentile(100 - percentile).evaluate(columnMedians);
    final double bottom = new Percentile(percentile).evaluate(columnMedians);
    final Boolean[] toBeKept = DoubleStream.of(columnMedians).mapToObj(d -> d <= top && d >= bottom).toArray(Boolean[]::new);
    final int toBeKeptCount = (int) Stream.of(toBeKept).filter(b -> b).count();
    final ReadCountCollection result = ReadCountCollectionUtils.removeColumnsWithExtremeMedianCounts(readCount, percentile, NULL_LOGGER);
    Assert.assertEquals(result.columnNames().size(), toBeKeptCount);
    int nextIndex = 0;
    for (int i = 0; i < toBeKept.length; i++) {
        if (toBeKept[i]) {
            int index = result.columnNames().indexOf(readCount.columnNames().get(i));
            Assert.assertEquals(index, nextIndex++);
            Assert.assertEquals(counts.getColumn(i), result.counts().getColumn(index));
        } else {
            Assert.assertEquals(result.columnNames().indexOf(readCount.columnNames().get(i)), -1);
        }
    }
}
Also used : IntStream(java.util.stream.IntStream) Arrays(java.util.Arrays) DataProvider(org.testng.annotations.DataProvider) Level(org.apache.logging.log4j.Level) Test(org.testng.annotations.Test) Random(java.util.Random) ArrayList(java.util.ArrayList) Message(org.apache.logging.log4j.message.Message) Assert(org.testng.Assert) Median(org.apache.commons.math3.stat.descriptive.rank.Median) Marker(org.apache.logging.log4j.Marker) AbstractLogger(org.apache.logging.log4j.spi.AbstractLogger) PrintWriter(java.io.PrintWriter) Array2DRowRealMatrix(org.apache.commons.math3.linear.Array2DRowRealMatrix) IOException(java.io.IOException) SimpleInterval(org.broadinstitute.hellbender.utils.SimpleInterval) Collectors(java.util.stream.Collectors) File(java.io.File) DoubleStream(java.util.stream.DoubleStream) List(java.util.List) Percentile(org.apache.commons.math3.stat.descriptive.rank.Percentile) Logger(org.apache.logging.log4j.Logger) Stream(java.util.stream.Stream) UserException(org.broadinstitute.hellbender.exceptions.UserException) RealMatrix(org.apache.commons.math3.linear.RealMatrix) Percentile(org.apache.commons.math3.stat.descriptive.rank.Percentile) Array2DRowRealMatrix(org.apache.commons.math3.linear.Array2DRowRealMatrix) RealMatrix(org.apache.commons.math3.linear.RealMatrix) Median(org.apache.commons.math3.stat.descriptive.rank.Median) Test(org.testng.annotations.Test)

Example 14 with Percentile

use of org.apache.commons.math3.stat.descriptive.rank.Percentile in project lightning by automatictester.

the class RespTimeNthPercentileTest method execute.

@Override
public void execute(ArrayList<String[]> originalJMeterTransactions) {
    try {
        JMeterTransactions transactions = filterTransactions((JMeterTransactions) originalJMeterTransactions);
        transactionCount = transactions.getTransactionCount();
        DescriptiveStatistics ds = new DescriptiveStatistics();
        ds.setPercentileImpl(new Percentile().withEstimationType(Percentile.EstimationType.R_3));
        for (String[] transaction : transactions) {
            String elapsed = transaction[1];
            ds.addValue(Double.parseDouble(elapsed));
        }
        longestTransactions = transactions.getLongestTransactions();
        actualResult = (int) ds.getPercentile((double) percentile);
        actualResultDescription = String.format(ACTUAL_RESULT_MESSAGE, new IntToOrdConverter().convert(percentile), actualResult);
        if (actualResult > maxRespTime) {
            result = TestResult.FAIL;
        } else {
            result = TestResult.PASS;
        }
    } catch (Exception e) {
        result = TestResult.ERROR;
        actualResultDescription = e.getMessage();
    }
}
Also used : DescriptiveStatistics(org.apache.commons.math3.stat.descriptive.DescriptiveStatistics) Percentile(org.apache.commons.math3.stat.descriptive.rank.Percentile) IntToOrdConverter(uk.co.automatictester.lightning.utils.IntToOrdConverter) JMeterTransactions(uk.co.automatictester.lightning.data.JMeterTransactions)

Example 15 with Percentile

use of org.apache.commons.math3.stat.descriptive.rank.Percentile in project nd4j by deeplearning4j.

the class Nd4jTestsC method testPercentile2.

@Test
public void testPercentile2() throws Exception {
    INDArray array = Nd4j.linspace(1, 9, 9);
    Percentile percentile = new Percentile(50);
    double exp = percentile.evaluate(array.data().asDouble());
    assertEquals(exp, array.percentileNumber(50));
}
Also used : Percentile(org.apache.commons.math3.stat.descriptive.rank.Percentile) INDArray(org.nd4j.linalg.api.ndarray.INDArray) Test(org.junit.Test)

Aggregations

Percentile (org.apache.commons.math3.stat.descriptive.rank.Percentile)31 ArrayList (java.util.ArrayList)16 RealMatrix (org.apache.commons.math3.linear.RealMatrix)16 Array2DRowRealMatrix (org.apache.commons.math3.linear.Array2DRowRealMatrix)14 List (java.util.List)11 Collectors (java.util.stream.Collectors)11 IntStream (java.util.stream.IntStream)11 File (java.io.File)10 DoubleStream (java.util.stream.DoubleStream)10 Median (org.apache.commons.math3.stat.descriptive.rank.Median)10 Logger (org.apache.logging.log4j.Logger)10 Test (org.testng.annotations.Test)10 Random (java.util.Random)9 Stream (java.util.stream.Stream)9 DescriptiveStatistics (org.apache.commons.math3.stat.descriptive.DescriptiveStatistics)9 Level (org.apache.logging.log4j.Level)8 Marker (org.apache.logging.log4j.Marker)8 Message (org.apache.logging.log4j.message.Message)8 AbstractLogger (org.apache.logging.log4j.spi.AbstractLogger)8 SimpleInterval (org.broadinstitute.hellbender.utils.SimpleInterval)8