Search in sources :

Example 1 with BernoulliDistribution

use of edu.neu.ccs.pyramid.util.BernoulliDistribution in project pyramid by cheng-li.

the class BM method computeLogClusterConditionalForEmpty.

private double computeLogClusterConditionalForEmpty(int clusterIndex) {
    double logProb = 0.0;
    for (int l = 0; l < dimension; l++) {
        BernoulliDistribution distribution = distributions[clusterIndex][l];
        logProb += distribution.logProbability(0);
    }
    return logProb;
}
Also used : BernoulliDistribution(edu.neu.ccs.pyramid.util.BernoulliDistribution)

Example 2 with BernoulliDistribution

use of edu.neu.ccs.pyramid.util.BernoulliDistribution in project pyramid by cheng-li.

the class ClusterLabels method getCluster.

private static List<WordFrequency> getCluster(BM bm, int k) throws Exception {
    BernoulliDistribution[][] distributions = bm.getDistributions();
    List<Pair<String, Double>> pairs = new ArrayList<>();
    for (int d = 0; d < bm.getDimension(); d++) {
        Pair<String, Double> pair = new Pair<>(bm.getNames().get(d), distributions[k][d].getP());
        pairs.add(pair);
    }
    Comparator<Pair<String, Double>> comparator = Comparator.comparing(Pair::getSecond);
    List<Pair<String, Double>> sorted = pairs.stream().sorted(comparator.reversed()).collect(Collectors.toList());
    List<WordFrequency> frequencies = new ArrayList<>();
    double sum = sorted.stream().filter(pair -> pair.getSecond() > 0).limit(20).mapToDouble(Pair::getSecond).sum();
    sorted.stream().filter(pair -> pair.getSecond() > 0).limit(20).forEach(pair -> {
        WordFrequency wordFrequency = new WordFrequency(pair.getFirst(), (int) (pair.getSecond() * 200 / sum));
        frequencies.add(wordFrequency);
    });
    return frequencies;
}
Also used : edu.neu.ccs.pyramid.util(edu.neu.ccs.pyramid.util) java.util(java.util) ArgSort(edu.neu.ccs.pyramid.util.ArgSort) CollisionMode(com.kennycason.kumo.CollisionMode) CenterWordStart(com.kennycason.kumo.wordstart.CenterWordStart) Random(java.util.Random) BMTrainer(edu.neu.ccs.pyramid.clustering.bm.BMTrainer) ArrayList(java.util.ArrayList) LinearFontScalar(com.kennycason.kumo.font.scale.LinearFontScalar) RectangleBackground(com.kennycason.kumo.bg.RectangleBackground) WordCloud(com.kennycason.kumo.WordCloud) Pair(edu.neu.ccs.pyramid.util.Pair) Config(edu.neu.ccs.pyramid.configuration.Config) BernoulliDistribution(edu.neu.ccs.pyramid.util.BernoulliDistribution) AngleGenerator(com.kennycason.kumo.image.AngleGenerator) FileUtils(org.apache.commons.io.FileUtils) Collectors(java.util.stream.Collectors) ColorPalette(com.kennycason.kumo.palette.ColorPalette) File(java.io.File) java.awt(java.awt) List(java.util.List) Serialization(edu.neu.ccs.pyramid.util.Serialization) Paths(java.nio.file.Paths) edu.neu.ccs.pyramid.dataset(edu.neu.ccs.pyramid.dataset) WordFrequency(com.kennycason.kumo.WordFrequency) Comparator(java.util.Comparator) BM(edu.neu.ccs.pyramid.clustering.bm.BM) ArrayList(java.util.ArrayList) WordFrequency(com.kennycason.kumo.WordFrequency) Pair(edu.neu.ccs.pyramid.util.Pair)

Example 3 with BernoulliDistribution

use of edu.neu.ccs.pyramid.util.BernoulliDistribution in project pyramid by cheng-li.

the class RegressionSynthesizer method linear.

public static RegDataSet linear() {
    int numData = 50;
    RegDataSet dataSet = RegDataSetBuilder.getBuilder().numDataPoints(numData).numFeatures(16000).dense(true).missingValue(false).build();
    Vector weights = new DenseVector(16000);
    weights.set(0, 0.001);
    weights.set(1, 0.001);
    weights.set(2, 0.001);
    weights.set(3, 0.001);
    for (int i = 0; i < numData; i++) {
        for (int j = 0; j < 16000; j++) {
            BernoulliDistribution bernoulliDistribution = new BernoulliDistribution(0.5);
            int sample = bernoulliDistribution.sample();
            if (sample == 0) {
                dataSet.setFeatureValue(i, j, -1);
            } else {
                dataSet.setFeatureValue(i, j, 1);
            }
        }
        double label = weights.dot(dataSet.getRow(i));
        dataSet.setLabel(i, label);
    }
    return dataSet;
}
Also used : BernoulliDistribution(edu.neu.ccs.pyramid.util.BernoulliDistribution) RegDataSet(edu.neu.ccs.pyramid.dataset.RegDataSet) DenseVector(org.apache.mahout.math.DenseVector) Vector(org.apache.mahout.math.Vector) DenseVector(org.apache.mahout.math.DenseVector)

Example 4 with BernoulliDistribution

use of edu.neu.ccs.pyramid.util.BernoulliDistribution in project pyramid by cheng-li.

the class BMTrainer method updateCluster.

/**
     *
     * @param k cluster index
     */
private void updateCluster(int k) {
    final double effectiveTotal = IntStream.range(0, dataSet.getNumDataPoints()).parallel().mapToDouble(i -> gammas[i][k]).sum();
    IntStream.range(0, dataSet.getNumFeatures()).parallel().forEach(d -> {
        double sum = weightedSum(k, d);
        double average = sum / effectiveTotal;
        if (average >= 1) {
            average = 0.9999;
        }
        bm.distributions[k][d] = new BernoulliDistribution(average);
    });
    bm.mixtureCoefficients[k] = effectiveTotal / dataSet.getNumDataPoints();
    bm.logMixtureCoefficients[k] = Math.log(bm.mixtureCoefficients[k]);
}
Also used : MathUtil(edu.neu.ccs.pyramid.util.MathUtil) IntStream(java.util.stream.IntStream) KLDivergence(edu.neu.ccs.pyramid.eval.KLDivergence) edu.neu.ccs.pyramid.optimization(edu.neu.ccs.pyramid.optimization) Logger(org.apache.logging.log4j.Logger) DenseVector(org.apache.mahout.math.DenseVector) BernoulliDistribution(edu.neu.ccs.pyramid.util.BernoulliDistribution) Entropy(edu.neu.ccs.pyramid.eval.Entropy) Vector(org.apache.mahout.math.Vector) DataSet(edu.neu.ccs.pyramid.dataset.DataSet) LogManager(org.apache.logging.log4j.LogManager) BernoulliDistribution(edu.neu.ccs.pyramid.util.BernoulliDistribution)

Example 5 with BernoulliDistribution

use of edu.neu.ccs.pyramid.util.BernoulliDistribution in project pyramid by cheng-li.

the class BM method clusterConditionalLogProb.

public double clusterConditionalLogProb(Vector vector, int clusterIndex) {
    double logProb = logClusterConditioinalForEmpty[clusterIndex];
    for (Vector.Element nonzero : vector.nonZeroes()) {
        int l = nonzero.index();
        BernoulliDistribution distribution = distributions[clusterIndex][l];
        logProb -= distribution.logProbability(0);
        logProb += distribution.logProbability(1);
    }
    return logProb;
}
Also used : BernoulliDistribution(edu.neu.ccs.pyramid.util.BernoulliDistribution) DenseVector(org.apache.mahout.math.DenseVector) Vector(org.apache.mahout.math.Vector)

Aggregations

BernoulliDistribution (edu.neu.ccs.pyramid.util.BernoulliDistribution)7 DenseVector (org.apache.mahout.math.DenseVector)3 Vector (org.apache.mahout.math.Vector)3 ArrayList (java.util.ArrayList)2 CollisionMode (com.kennycason.kumo.CollisionMode)1 WordCloud (com.kennycason.kumo.WordCloud)1 WordFrequency (com.kennycason.kumo.WordFrequency)1 RectangleBackground (com.kennycason.kumo.bg.RectangleBackground)1 LinearFontScalar (com.kennycason.kumo.font.scale.LinearFontScalar)1 AngleGenerator (com.kennycason.kumo.image.AngleGenerator)1 ColorPalette (com.kennycason.kumo.palette.ColorPalette)1 CenterWordStart (com.kennycason.kumo.wordstart.CenterWordStart)1 BM (edu.neu.ccs.pyramid.clustering.bm.BM)1 BMTrainer (edu.neu.ccs.pyramid.clustering.bm.BMTrainer)1 Config (edu.neu.ccs.pyramid.configuration.Config)1 edu.neu.ccs.pyramid.dataset (edu.neu.ccs.pyramid.dataset)1 DataSet (edu.neu.ccs.pyramid.dataset.DataSet)1 MultiLabel (edu.neu.ccs.pyramid.dataset.MultiLabel)1 MultiLabelClfDataSet (edu.neu.ccs.pyramid.dataset.MultiLabelClfDataSet)1 RegDataSet (edu.neu.ccs.pyramid.dataset.RegDataSet)1