Search in sources :

Example 56 with MultiLabel

use of edu.neu.ccs.pyramid.dataset.MultiLabel in project pyramid by cheng-li.

the class MultiLabelSynthesizer method flipOneNonUniform.

/**
     * y0: w=(0,1)
     * y1: w=(1,1)
     * y2: w=(1,0)
     * y3: w=(1,-1)
     * @param numData
     * @return
     */
public static MultiLabelClfDataSet flipOneNonUniform(int numData) {
    int numClass = 4;
    int numFeature = 2;
    MultiLabelClfDataSet dataSet = MLClfDataSetBuilder.getBuilder().numFeatures(numFeature).numClasses(numClass).numDataPoints(numData).build();
    // generate weights
    Vector[] weights = new Vector[numClass];
    for (int k = 0; k < numClass; k++) {
        Vector vector = new DenseVector(numFeature);
        weights[k] = vector;
    }
    weights[0].set(0, 0);
    weights[0].set(1, 1);
    weights[1].set(0, 1);
    weights[1].set(1, 1);
    weights[2].set(0, 1);
    weights[2].set(1, 0);
    weights[3].set(0, 1);
    weights[3].set(1, -1);
    // generate features
    for (int i = 0; i < numData; i++) {
        for (int j = 0; j < numFeature; j++) {
            dataSet.setFeatureValue(i, j, Sampling.doubleUniform(-1, 1));
        }
    }
    // assign labels
    for (int i = 0; i < numData; i++) {
        for (int k = 0; k < numClass; k++) {
            double dot = weights[k].dot(dataSet.getRow(i));
            if (dot >= 0) {
                dataSet.addLabel(i, k);
            }
        }
    }
    int[] indices = { 0, 1, 2, 3 };
    double[] probs = { 0.4, 0.2, 0.2, 0.2 };
    IntegerDistribution distribution = new EnumeratedIntegerDistribution(indices, probs);
    // flip
    for (int i = 0; i < numData; i++) {
        int toChange = distribution.sample();
        MultiLabel label = dataSet.getMultiLabels()[i];
        if (label.matchClass(toChange)) {
            label.removeLabel(toChange);
        } else {
            label.addLabel(toChange);
        }
    }
    return dataSet;
}
Also used : EnumeratedIntegerDistribution(org.apache.commons.math3.distribution.EnumeratedIntegerDistribution) MultiLabel(edu.neu.ccs.pyramid.dataset.MultiLabel) IntegerDistribution(org.apache.commons.math3.distribution.IntegerDistribution) EnumeratedIntegerDistribution(org.apache.commons.math3.distribution.EnumeratedIntegerDistribution) DenseVector(org.apache.mahout.math.DenseVector) Vector(org.apache.mahout.math.Vector) MultiLabelClfDataSet(edu.neu.ccs.pyramid.dataset.MultiLabelClfDataSet) DenseVector(org.apache.mahout.math.DenseVector)

Example 57 with MultiLabel

use of edu.neu.ccs.pyramid.dataset.MultiLabel in project pyramid by cheng-li.

the class BMSelector method selectGammas.

public static double[][] selectGammas(int numClasses, MultiLabel[] multiLabels, int numClusters) {
    DataSet dataSet = DataSetBuilder.getBuilder().numDataPoints(multiLabels.length).numFeatures(numClasses).density(Density.SPARSE_RANDOM).build();
    for (int i = 0; i < multiLabels.length; i++) {
        MultiLabel multiLabel = multiLabels[i];
        for (int label : multiLabel.getMatchedLabels()) {
            dataSet.setFeatureValue(i, label, 1);
        }
    }
    BMTrainer trainer = BMSelector.selectTrainer(dataSet, numClusters, 10);
    //        System.out.println("gamma = "+ Arrays.deepToString(trainer.gammas));
    return trainer.gammas;
}
Also used : MultiLabel(edu.neu.ccs.pyramid.dataset.MultiLabel) DataSet(edu.neu.ccs.pyramid.dataset.DataSet)

Example 58 with MultiLabel

use of edu.neu.ccs.pyramid.dataset.MultiLabel in project pyramid by cheng-li.

the class HMLGradientBoosting method predictClassProbs.

/**
     * expensive operation
     * @param vector
     * @return
     */
public double[] predictClassProbs(Vector vector) {
    double[] assignmentProbs = predictAssignmentProbs(vector);
    double[] classProbs = new double[numClasses];
    for (int a = 0; a < assignments.size(); a++) {
        MultiLabel assignment = assignments.get(a);
        double prob = assignmentProbs[a];
        for (Integer label : assignment.getMatchedLabels()) {
            double oldProb = classProbs[label];
            classProbs[label] = oldProb + prob;
        }
    }
    return classProbs;
}
Also used : MultiLabel(edu.neu.ccs.pyramid.dataset.MultiLabel)

Example 59 with MultiLabel

use of edu.neu.ccs.pyramid.dataset.MultiLabel in project pyramid by cheng-li.

the class HMLGradientBoosting method predict.

public MultiLabel predict(Vector vector) {
    double maxScore = Double.NEGATIVE_INFINITY;
    MultiLabel prediction = null;
    double[] classeScores = predictClassScores(vector);
    for (MultiLabel assignment : this.assignments) {
        double score = this.calAssignmentScore(assignment, classeScores);
        if (score > maxScore) {
            maxScore = score;
            prediction = assignment;
        }
    }
    return prediction;
}
Also used : MultiLabel(edu.neu.ccs.pyramid.dataset.MultiLabel)

Example 60 with MultiLabel

use of edu.neu.ccs.pyramid.dataset.MultiLabel in project pyramid by cheng-li.

the class NoiseOptimizerLR method updateTransformProb.

private void updateTransformProb(int dataPoint, int comIndex) {
    MultiLabel labels = dataSet.getMultiLabels()[dataPoint];
    MultiLabel candidate = combinations.get(comIndex);
    Vector toMinus = new DenseVector(dataSet.getNumClasses());
    for (int i = 0; i < dataSet.getNumClasses(); i++) {
        toMinus.set(i, 0.5);
    }
    double prod = 1;
    for (int l = 0; l < dataSet.getNumClasses(); l++) {
        if (labels.matchClass(l)) {
            prod *= this.lrTransforms.get(l).predictClassProb(candidate.toVector(dataSet.getNumClasses()).minus(toMinus), 1);
        } else {
            prod *= this.lrTransforms.get(l).predictClassProb(candidate.toVector(dataSet.getNumClasses()).minus(toMinus), 0);
        }
    }
    transformProbs[dataPoint][comIndex] = prod;
}
Also used : MultiLabel(edu.neu.ccs.pyramid.dataset.MultiLabel) DenseVector(org.apache.mahout.math.DenseVector) Vector(org.apache.mahout.math.Vector) DenseVector(org.apache.mahout.math.DenseVector)

Aggregations

MultiLabel (edu.neu.ccs.pyramid.dataset.MultiLabel)101 Vector (org.apache.mahout.math.Vector)22 MultiLabelClfDataSet (edu.neu.ccs.pyramid.dataset.MultiLabelClfDataSet)21 File (java.io.File)14 DenseVector (org.apache.mahout.math.DenseVector)13 CMLCRF (edu.neu.ccs.pyramid.multilabel_classification.crf.CMLCRF)12 Pair (edu.neu.ccs.pyramid.util.Pair)8 LBFGS (edu.neu.ccs.pyramid.optimization.LBFGS)7 ArrayList (java.util.ArrayList)7 MLMeasures (edu.neu.ccs.pyramid.eval.MLMeasures)6 CRFLoss (edu.neu.ccs.pyramid.multilabel_classification.crf.CRFLoss)6 MultiLabelClassifier (edu.neu.ccs.pyramid.multilabel_classification.MultiLabelClassifier)5 GeneralF1Predictor (edu.neu.ccs.pyramid.multilabel_classification.plugin_rule.GeneralF1Predictor)5 Collectors (java.util.stream.Collectors)5 EarlyStopper (edu.neu.ccs.pyramid.optimization.EarlyStopper)4 java.util (java.util)4 StopWatch (org.apache.commons.lang3.time.StopWatch)4 Config (edu.neu.ccs.pyramid.configuration.Config)3 DataSetUtil (edu.neu.ccs.pyramid.dataset.DataSetUtil)3 TRECFormat (edu.neu.ccs.pyramid.dataset.TRECFormat)3