Search in sources :

Example 26 with SparseDistributedMatrix

use of org.apache.ignite.ml.math.impls.matrix.SparseDistributedMatrix in project ignite by apache.

the class ColumnDecisionTreeTrainerBenchmark method tstF1.

/**
 * Test decision tree regression.
 * To run this test rename this method so it starts from 'test'.
 */
public void tstF1() {
    IgniteUtils.setCurrentIgniteName(ignite.configuration().getIgniteInstanceName());
    int ptsCnt = 10000;
    Map<Integer, double[]> ranges = new HashMap<>();
    ranges.put(0, new double[] { -100.0, 100.0 });
    ranges.put(1, new double[] { -100.0, 100.0 });
    ranges.put(2, new double[] { -100.0, 100.0 });
    int featCnt = 100;
    double[] defRng = { -1.0, 1.0 };
    Vector[] trainVectors = vecsFromRanges(ranges, featCnt, defRng, new Random(123L), ptsCnt, f1);
    SparseDistributedMatrix m = new SparseDistributedMatrix(ptsCnt, featCnt + 1, StorageConstants.COLUMN_STORAGE_MODE, StorageConstants.RANDOM_ACCESS_MODE);
    SparseDistributedMatrixStorage sto = (SparseDistributedMatrixStorage) m.getStorage();
    loadVectorsIntoSparseDistributedMatrixCache(sto.cache().getName(), sto.getUUID(), Arrays.stream(trainVectors).iterator(), featCnt + 1);
    IgniteFunction<DoubleStream, Double> regCalc = s -> s.average().orElse(0.0);
    ColumnDecisionTreeTrainer<VarianceSplitCalculator.VarianceData> trainer = new ColumnDecisionTreeTrainer<>(10, ContinuousSplitCalculators.VARIANCE, RegionCalculators.VARIANCE, regCalc, ignite);
    X.println("Training started.");
    long before = System.currentTimeMillis();
    DecisionTreeModel mdl = trainer.train(new MatrixColumnDecisionTreeTrainerInput(m, new HashMap<>()));
    X.println("Training finished in: " + (System.currentTimeMillis() - before) + " ms.");
    Vector[] testVectors = vecsFromRanges(ranges, featCnt, defRng, new Random(123L), 20, f1);
    IgniteTriFunction<Model<Vector, Double>, Stream<IgniteBiTuple<Vector, Double>>, Function<Double, Double>, Double> mse = Estimators.MSE();
    Double accuracy = mse.apply(mdl, Arrays.stream(testVectors).map(v -> new IgniteBiTuple<>(v.viewPart(0, featCnt), v.getX(featCnt))), Function.identity());
    X.println("MSE: " + accuracy);
}
Also used : CacheAtomicityMode(org.apache.ignite.cache.CacheAtomicityMode) Arrays(java.util.Arrays) FeaturesCache(org.apache.ignite.ml.trees.trainers.columnbased.caches.FeaturesCache) IgniteTestResources(org.apache.ignite.testframework.junits.IgniteTestResources) Random(java.util.Random) BiIndex(org.apache.ignite.ml.trees.trainers.columnbased.BiIndex) SparseDistributedMatrix(org.apache.ignite.ml.math.impls.matrix.SparseDistributedMatrix) SparseDistributedMatrixStorage(org.apache.ignite.ml.math.impls.storage.matrix.SparseDistributedMatrixStorage) VarianceSplitCalculator(org.apache.ignite.ml.trees.trainers.columnbased.contsplitcalcs.VarianceSplitCalculator) Vector(org.apache.ignite.ml.math.Vector) Estimators(org.apache.ignite.ml.estimators.Estimators) Map(java.util.Map) X(org.apache.ignite.internal.util.typedef.X) Level(org.apache.log4j.Level) DenseLocalOnHeapVector(org.apache.ignite.ml.math.impls.vector.DenseLocalOnHeapVector) MatrixColumnDecisionTreeTrainerInput(org.apache.ignite.ml.trees.trainers.columnbased.MatrixColumnDecisionTreeTrainerInput) LabeledVectorDouble(org.apache.ignite.ml.structures.LabeledVectorDouble) BaseDecisionTreeTest(org.apache.ignite.ml.trees.BaseDecisionTreeTest) IgniteTriFunction(org.apache.ignite.ml.math.functions.IgniteTriFunction) ProjectionsCache(org.apache.ignite.ml.trees.trainers.columnbased.caches.ProjectionsCache) UUID(java.util.UUID) StreamTransformer(org.apache.ignite.stream.StreamTransformer) Collectors(java.util.stream.Collectors) IgniteCache(org.apache.ignite.IgniteCache) ContextCache(org.apache.ignite.ml.trees.trainers.columnbased.caches.ContextCache) DoubleStream(java.util.stream.DoubleStream) IgniteBiTuple(org.apache.ignite.lang.IgniteBiTuple) List(java.util.List) IgniteConfiguration(org.apache.ignite.configuration.IgniteConfiguration) Stream(java.util.stream.Stream) SparseMatrixKey(org.apache.ignite.ml.math.distributed.keys.impl.SparseMatrixKey) SplitCache(org.apache.ignite.ml.trees.trainers.columnbased.caches.SplitCache) RegionCalculators(org.apache.ignite.ml.trees.trainers.columnbased.regcalcs.RegionCalculators) IntStream(java.util.stream.IntStream) DecisionTreeModel(org.apache.ignite.ml.trees.models.DecisionTreeModel) IgniteFunction(org.apache.ignite.ml.math.functions.IgniteFunction) Model(org.apache.ignite.ml.Model) HashMap(java.util.HashMap) Function(java.util.function.Function) GiniSplitCalculator(org.apache.ignite.ml.trees.trainers.columnbased.contsplitcalcs.GiniSplitCalculator) BiIndexedCacheColumnDecisionTreeTrainerInput(org.apache.ignite.ml.trees.trainers.columnbased.BiIndexedCacheColumnDecisionTreeTrainerInput) CacheWriteSynchronizationMode(org.apache.ignite.cache.CacheWriteSynchronizationMode) IgniteUtils(org.apache.ignite.internal.util.IgniteUtils) MnistUtils(org.apache.ignite.ml.util.MnistUtils) LinkedList(java.util.LinkedList) Properties(java.util.Properties) Iterator(java.util.Iterator) ContinuousSplitCalculators(org.apache.ignite.ml.trees.trainers.columnbased.contsplitcalcs.ContinuousSplitCalculators) IOException(java.io.IOException) SplitDataGenerator(org.apache.ignite.ml.trees.SplitDataGenerator) Int2DoubleOpenHashMap(it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap) Ignition(org.apache.ignite.Ignition) CacheConfiguration(org.apache.ignite.configuration.CacheConfiguration) IgniteDataStreamer(org.apache.ignite.IgniteDataStreamer) Tracer(org.apache.ignite.ml.math.Tracer) StorageConstants(org.apache.ignite.ml.math.StorageConstants) Assert(org.junit.Assert) Collections(java.util.Collections) ColumnDecisionTreeTrainer(org.apache.ignite.ml.trees.trainers.columnbased.ColumnDecisionTreeTrainer) GridCacheProcessor(org.apache.ignite.internal.processors.cache.GridCacheProcessor) InputStream(java.io.InputStream) CacheMode(org.apache.ignite.cache.CacheMode) HashMap(java.util.HashMap) Int2DoubleOpenHashMap(it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap) MatrixColumnDecisionTreeTrainerInput(org.apache.ignite.ml.trees.trainers.columnbased.MatrixColumnDecisionTreeTrainerInput) IgniteBiTuple(org.apache.ignite.lang.IgniteBiTuple) DecisionTreeModel(org.apache.ignite.ml.trees.models.DecisionTreeModel) IgniteTriFunction(org.apache.ignite.ml.math.functions.IgniteTriFunction) IgniteFunction(org.apache.ignite.ml.math.functions.IgniteFunction) Function(java.util.function.Function) Random(java.util.Random) DoubleStream(java.util.stream.DoubleStream) Stream(java.util.stream.Stream) IntStream(java.util.stream.IntStream) InputStream(java.io.InputStream) Vector(org.apache.ignite.ml.math.Vector) DenseLocalOnHeapVector(org.apache.ignite.ml.math.impls.vector.DenseLocalOnHeapVector) ColumnDecisionTreeTrainer(org.apache.ignite.ml.trees.trainers.columnbased.ColumnDecisionTreeTrainer) SparseDistributedMatrix(org.apache.ignite.ml.math.impls.matrix.SparseDistributedMatrix) SparseDistributedMatrixStorage(org.apache.ignite.ml.math.impls.storage.matrix.SparseDistributedMatrixStorage) LabeledVectorDouble(org.apache.ignite.ml.structures.LabeledVectorDouble) DecisionTreeModel(org.apache.ignite.ml.trees.models.DecisionTreeModel) Model(org.apache.ignite.ml.Model) DoubleStream(java.util.stream.DoubleStream)

Example 27 with SparseDistributedMatrix

use of org.apache.ignite.ml.math.impls.matrix.SparseDistributedMatrix in project ignite by apache.

the class GradientDescent method calculateDistributedGradient.

/**
 * Calculates gradient based in distributed matrix using {@link SparseDistributedMatrixMapReducer}.
 *
 * @param data Distributed matrix
 * @param weights Point to calculate gradient
 * @return Gradient
 */
private Vector calculateDistributedGradient(SparseDistributedMatrix data, Vector weights) {
    SparseDistributedMatrixMapReducer mapReducer = new SparseDistributedMatrixMapReducer(data);
    return mapReducer.mapReduce((matrix, args) -> {
        Matrix inputs = extractInputs(matrix);
        Vector groundTruth = extractGroundTruth(matrix);
        return lossGradient.compute(inputs, groundTruth, args);
    }, gradients -> {
        int cnt = 0;
        Vector resGradient = new DenseLocalOnHeapVector(data.columnSize());
        for (Vector gradient : gradients) {
            if (gradient != null) {
                resGradient = resGradient.plus(gradient);
                cnt++;
            }
        }
        return resGradient.divide(cnt);
    }, weights);
}
Also used : Matrix(org.apache.ignite.ml.math.Matrix) SparseDistributedMatrix(org.apache.ignite.ml.math.impls.matrix.SparseDistributedMatrix) DenseLocalOnHeapVector(org.apache.ignite.ml.math.impls.vector.DenseLocalOnHeapVector) FunctionVector(org.apache.ignite.ml.math.impls.vector.FunctionVector) Vector(org.apache.ignite.ml.math.Vector) DenseLocalOnHeapVector(org.apache.ignite.ml.math.impls.vector.DenseLocalOnHeapVector) SparseDistributedMatrixMapReducer(org.apache.ignite.ml.optimization.util.SparseDistributedMatrixMapReducer)

Example 28 with SparseDistributedMatrix

use of org.apache.ignite.ml.math.impls.matrix.SparseDistributedMatrix in project ignite by apache.

the class KMeansDistributedClustererTestMultiNode method testClusterizationOnDatasetWithObviousStructure.

/**
 */
public void testClusterizationOnDatasetWithObviousStructure() throws IOException {
    IgniteUtils.setCurrentIgniteName(ignite.configuration().getIgniteInstanceName());
    int ptsCnt = 10000;
    int squareSideLen = 10000;
    Random rnd = new Random(123456L);
    // Let centers be in the vertices of square.
    Map<Integer, Vector> centers = new HashMap<>();
    centers.put(100, new DenseLocalOnHeapVector(new double[] { 0.0, 0.0 }));
    centers.put(900, new DenseLocalOnHeapVector(new double[] { squareSideLen, 0.0 }));
    centers.put(3000, new DenseLocalOnHeapVector(new double[] { 0.0, squareSideLen }));
    centers.put(6000, new DenseLocalOnHeapVector(new double[] { squareSideLen, squareSideLen }));
    SparseDistributedMatrix points = new SparseDistributedMatrix(ptsCnt, 2, StorageConstants.ROW_STORAGE_MODE, StorageConstants.RANDOM_ACCESS_MODE);
    List<Integer> permutation = IntStream.range(0, ptsCnt).boxed().collect(Collectors.toList());
    Collections.shuffle(permutation, rnd);
    int totalCnt = 0;
    for (Integer count : centers.keySet()) {
        for (int i = 0; i < count; i++) {
            Vector pnt = new DenseLocalOnHeapVector(2).assign(centers.get(count));
            // Perturbate point on random value.
            pnt.map(val -> val + rnd.nextDouble() * squareSideLen / 100);
            points.assignRow(permutation.get(totalCnt), pnt);
            totalCnt++;
        }
    }
    EuclideanDistance dist = new EuclideanDistance();
    KMeansDistributedClusterer clusterer = new KMeansDistributedClusterer(dist, 3, 100, 1L);
    clusterer.cluster(points, 4);
    points.destroy();
}
Also used : SparseDistributedMatrix(org.apache.ignite.ml.math.impls.matrix.SparseDistributedMatrix) EuclideanDistance(org.apache.ignite.ml.math.distances.EuclideanDistance) Random(java.util.Random) HashMap(java.util.HashMap) DenseLocalOnHeapVector(org.apache.ignite.ml.math.impls.vector.DenseLocalOnHeapVector) Vector(org.apache.ignite.ml.math.Vector) DenseLocalOnHeapVector(org.apache.ignite.ml.math.impls.vector.DenseLocalOnHeapVector)

Example 29 with SparseDistributedMatrix

use of org.apache.ignite.ml.math.impls.matrix.SparseDistributedMatrix in project ignite by apache.

the class KMeansDistributedClustererTestMultiNode method testPerformClusterAnalysisDegenerate.

/**
 */
public void testPerformClusterAnalysisDegenerate() {
    IgniteUtils.setCurrentIgniteName(ignite.configuration().getIgniteInstanceName());
    KMeansDistributedClusterer clusterer = new KMeansDistributedClusterer(new EuclideanDistance(), 1, 1, 1L);
    double[] v1 = new double[] { 1959, 325100 };
    double[] v2 = new double[] { 1960, 373200 };
    SparseDistributedMatrix points = new SparseDistributedMatrix(2, 2, StorageConstants.ROW_STORAGE_MODE, StorageConstants.RANDOM_ACCESS_MODE);
    points.setRow(0, v1);
    points.setRow(1, v2);
    clusterer.cluster(points, 1);
    points.destroy();
}
Also used : EuclideanDistance(org.apache.ignite.ml.math.distances.EuclideanDistance) SparseDistributedMatrix(org.apache.ignite.ml.math.impls.matrix.SparseDistributedMatrix)

Aggregations

SparseDistributedMatrix (org.apache.ignite.ml.math.impls.matrix.SparseDistributedMatrix)29 Vector (org.apache.ignite.ml.math.Vector)18 DenseLocalOnHeapVector (org.apache.ignite.ml.math.impls.vector.DenseLocalOnHeapVector)14 Random (java.util.Random)11 EuclideanDistance (org.apache.ignite.ml.math.distances.EuclideanDistance)9 Ignite (org.apache.ignite.Ignite)8 IgniteThread (org.apache.ignite.thread.IgniteThread)8 HashMap (java.util.HashMap)7 List (java.util.List)7 Map (java.util.Map)7 Collectors (java.util.stream.Collectors)7 StorageConstants (org.apache.ignite.ml.math.StorageConstants)7 UUID (java.util.UUID)6 DistanceMeasure (org.apache.ignite.ml.math.distances.DistanceMeasure)6 IgniteFunction (org.apache.ignite.ml.math.functions.IgniteFunction)6 SparseDistributedMatrixStorage (org.apache.ignite.ml.math.impls.storage.matrix.SparseDistributedMatrixStorage)6 Collections (java.util.Collections)5 LinkedList (java.util.LinkedList)5 DoubleStream (java.util.stream.DoubleStream)5 IgniteUtils (org.apache.ignite.internal.util.IgniteUtils)5