Search in sources :

Example 11 with RegDataSet

use of edu.neu.ccs.pyramid.dataset.RegDataSet in project pyramid by cheng-li.

the class RegressionSynthesizer method gaussianMixture.

public RegDataSet gaussianMixture() {
    NormalDistribution leftGaussian = new NormalDistribution(0.2, 0.01);
    NormalDistribution rightGaussian = new NormalDistribution(0.7, 0.1);
    RegDataSet dataSet = RegDataSetBuilder.getBuilder().numDataPoints(numDataPoints).numFeatures(1).dense(true).missingValue(false).build();
    for (int i = 0; i < numDataPoints; i++) {
        double featureValue = Sampling.doubleUniform(0, 1);
        double label;
        if (featureValue > 0.5) {
            label = leftGaussian.sample();
        } else {
            label = rightGaussian.sample();
        }
        dataSet.setFeatureValue(i, 0, featureValue);
        dataSet.setLabel(i, label);
    }
    return dataSet;
}
Also used : NormalDistribution(org.apache.commons.math3.distribution.NormalDistribution) RegDataSet(edu.neu.ccs.pyramid.dataset.RegDataSet)

Example 12 with RegDataSet

use of edu.neu.ccs.pyramid.dataset.RegDataSet in project pyramid by cheng-li.

the class RegressionSynthesizer method univarBeta.

public RegDataSet univarBeta() {
    BetaDistribution betaDistribution = new BetaDistribution(2, 5);
    RegDataSet dataSet = RegDataSetBuilder.getBuilder().numDataPoints(numDataPoints).numFeatures(1).dense(true).missingValue(false).build();
    for (int i = 0; i < numDataPoints; i++) {
        double featureValue = Sampling.doubleUniform(0, 1);
        double label;
        label = betaDistribution.density(featureValue);
        label += noise.sample();
        dataSet.setFeatureValue(i, 0, featureValue);
        dataSet.setLabel(i, label);
    }
    return dataSet;
}
Also used : BetaDistribution(org.apache.commons.math3.distribution.BetaDistribution) RegDataSet(edu.neu.ccs.pyramid.dataset.RegDataSet)

Example 13 with RegDataSet

use of edu.neu.ccs.pyramid.dataset.RegDataSet in project pyramid by cheng-li.

the class RegressionSynthesizer method multivarLine.

public RegDataSet multivarLine() {
    int numFeatures = 2;
    RegDataSet dataSet = RegDataSetBuilder.getBuilder().numDataPoints(numDataPoints).numFeatures(2).dense(true).missingValue(false).build();
    for (int i = 0; i < numDataPoints; i++) {
        for (int j = 0; j < numFeatures; j++) {
            double featureValue = Sampling.doubleUniform(0, 1);
            dataSet.setFeatureValue(i, j, featureValue);
        }
        double label = 0;
        for (int j = 0; j < numFeatures; j++) {
            label += dataSet.getRow(i).get(j);
        }
        label += noise.sample();
        dataSet.setLabel(i, label);
    }
    return dataSet;
}
Also used : RegDataSet(edu.neu.ccs.pyramid.dataset.RegDataSet)

Example 14 with RegDataSet

use of edu.neu.ccs.pyramid.dataset.RegDataSet in project pyramid by cheng-li.

the class RegressionSynthesizer method linear.

public static RegDataSet linear() {
    int numData = 50;
    RegDataSet dataSet = RegDataSetBuilder.getBuilder().numDataPoints(numData).numFeatures(16000).dense(true).missingValue(false).build();
    Vector weights = new DenseVector(16000);
    weights.set(0, 0.001);
    weights.set(1, 0.001);
    weights.set(2, 0.001);
    weights.set(3, 0.001);
    for (int i = 0; i < numData; i++) {
        for (int j = 0; j < 16000; j++) {
            BernoulliDistribution bernoulliDistribution = new BernoulliDistribution(0.5);
            int sample = bernoulliDistribution.sample();
            if (sample == 0) {
                dataSet.setFeatureValue(i, j, -1);
            } else {
                dataSet.setFeatureValue(i, j, 1);
            }
        }
        double label = weights.dot(dataSet.getRow(i));
        dataSet.setLabel(i, label);
    }
    return dataSet;
}
Also used : BernoulliDistribution(edu.neu.ccs.pyramid.util.BernoulliDistribution) RegDataSet(edu.neu.ccs.pyramid.dataset.RegDataSet) DenseVector(org.apache.mahout.math.DenseVector) Vector(org.apache.mahout.math.Vector) DenseVector(org.apache.mahout.math.DenseVector)

Example 15 with RegDataSet

use of edu.neu.ccs.pyramid.dataset.RegDataSet in project pyramid by cheng-li.

the class RegressionSynthesizer method univarSine.

public RegDataSet univarSine() {
    RegDataSet dataSet = RegDataSetBuilder.getBuilder().numDataPoints(numDataPoints).numFeatures(1).dense(true).missingValue(false).build();
    for (int i = 0; i < numDataPoints; i++) {
        double featureValue = Sampling.doubleUniform(-Math.PI, Math.PI);
        double label;
        label = Math.sin(featureValue);
        label += noise.sample();
        dataSet.setFeatureValue(i, 0, featureValue);
        dataSet.setLabel(i, label);
    }
    return dataSet;
}
Also used : RegDataSet(edu.neu.ccs.pyramid.dataset.RegDataSet)

Aggregations

RegDataSet (edu.neu.ccs.pyramid.dataset.RegDataSet)21 File (java.io.File)9 DataSetType (edu.neu.ccs.pyramid.dataset.DataSetType)4 RegTreeConfig (edu.neu.ccs.pyramid.regression.regression_tree.RegTreeConfig)3 NormalDistribution (org.apache.commons.math3.distribution.NormalDistribution)3 Vector (org.apache.mahout.math.Vector)3 Config (edu.neu.ccs.pyramid.configuration.Config)2 LSBoost (edu.neu.ccs.pyramid.regression.least_squares_boost.LSBoost)2 RegTreeFactory (edu.neu.ccs.pyramid.regression.regression_tree.RegTreeFactory)2 Pair (edu.neu.ccs.pyramid.util.Pair)2 ArrayList (java.util.ArrayList)2 StopWatch (org.apache.commons.lang3.time.StopWatch)2 ObjectMapper (com.fasterxml.jackson.databind.ObjectMapper)1 StandardFormat (edu.neu.ccs.pyramid.dataset.StandardFormat)1 RMSE (edu.neu.ccs.pyramid.eval.RMSE)1 LSBoostOptimizer (edu.neu.ccs.pyramid.regression.least_squares_boost.LSBoostOptimizer)1 ElasticNetLinearRegOptimizer (edu.neu.ccs.pyramid.regression.linear_regression.ElasticNetLinearRegOptimizer)1 LinearRegression (edu.neu.ccs.pyramid.regression.linear_regression.LinearRegression)1 RegressionTree (edu.neu.ccs.pyramid.regression.regression_tree.RegressionTree)1 TreeRule (edu.neu.ccs.pyramid.regression.regression_tree.TreeRule)1