Search in sources :

Example 1 with ProxyDatabase

use of de.lmu.ifi.dbs.elki.database.ProxyDatabase in project elki by elki-project.

the class CenterOfMassMetaClustering method runClusteringAlgorithm.

/**
 * Run a clustering algorithm on a single instance.
 *
 * @param parent Parent result to attach to
 * @param ids Object IDs to process
 * @param store Input data
 * @param dim Dimensionality
 * @param title Title of relation
 * @return Clustering result
 */
protected C runClusteringAlgorithm(ResultHierarchy hierarchy, Result parent, DBIDs ids, DataStore<DoubleVector> store, int dim, String title) {
    SimpleTypeInformation<DoubleVector> t = new VectorFieldTypeInformation<>(DoubleVector.FACTORY, dim);
    Relation<DoubleVector> sample = new MaterializedRelation<>(t, ids, title, store);
    ProxyDatabase d = new ProxyDatabase(ids, sample);
    C clusterResult = inner.run(d);
    d.getHierarchy().remove(sample);
    d.getHierarchy().remove(clusterResult);
    hierarchy.add(parent, sample);
    hierarchy.add(sample, clusterResult);
    return clusterResult;
}
Also used : VectorFieldTypeInformation(de.lmu.ifi.dbs.elki.data.type.VectorFieldTypeInformation) ProxyDatabase(de.lmu.ifi.dbs.elki.database.ProxyDatabase) DoubleVector(de.lmu.ifi.dbs.elki.data.DoubleVector) MaterializedRelation(de.lmu.ifi.dbs.elki.database.relation.MaterializedRelation)

Example 2 with ProxyDatabase

use of de.lmu.ifi.dbs.elki.database.ProxyDatabase in project elki by elki-project.

the class KMeansBisecting method run.

@Override
public Clustering<M> run(Database database, Relation<V> relation) {
    ProxyDatabase proxyDB = new ProxyDatabase(relation.getDBIDs(), database);
    // Linked list is preferrable for scratch, as we will A) not need that many
    // clusters and B) be doing random removals of the largest cluster (often at
    // the head)
    LinkedList<Cluster<M>> currentClusterList = new LinkedList<>();
    FiniteProgress prog = LOG.isVerbose() ? new FiniteProgress("Bisecting k-means", k - 1, LOG) : null;
    for (int j = 0; j < this.k - 1; j++) {
        // Choose a cluster to split and project database to cluster
        if (currentClusterList.isEmpty()) {
            proxyDB = new ProxyDatabase(relation.getDBIDs(), database);
        } else {
            Cluster<M> largestCluster = null;
            for (Cluster<M> cluster : currentClusterList) {
                if (largestCluster == null || cluster.size() > largestCluster.size()) {
                    largestCluster = cluster;
                }
            }
            currentClusterList.remove(largestCluster);
            proxyDB.setDBIDs(largestCluster.getIDs());
        }
        // Run the inner k-means algorithm:
        // FIXME: ensure we run on the correct relation in a multirelational
        // setting!
        Clustering<M> innerResult = innerkMeans.run(proxyDB);
        // Add resulting clusters to current result.
        currentClusterList.addAll(innerResult.getAllClusters());
        LOG.incrementProcessed(prog);
        if (LOG.isVerbose()) {
            LOG.verbose("Iteration " + j);
        }
    }
    LOG.ensureCompleted(prog);
    // add all current clusters to the result
    Clustering<M> result = new Clustering<>("Bisecting k-Means Result", "Bisecting-k-means");
    for (Cluster<M> cluster : currentClusterList) {
        result.addToplevelCluster(cluster);
    }
    return result;
}
Also used : FiniteProgress(de.lmu.ifi.dbs.elki.logging.progress.FiniteProgress) ProxyDatabase(de.lmu.ifi.dbs.elki.database.ProxyDatabase) Cluster(de.lmu.ifi.dbs.elki.data.Cluster) Clustering(de.lmu.ifi.dbs.elki.data.Clustering) LinkedList(java.util.LinkedList)

Example 3 with ProxyDatabase

use of de.lmu.ifi.dbs.elki.database.ProxyDatabase in project elki by elki-project.

the class CASH method buildDB.

/**
 * Builds a dim-1 dimensional database where the objects are projected into
 * the specified subspace.
 *
 * @param dim the dimensionality of the database
 * @param basis the basis defining the subspace
 * @param ids the ids for the new database
 * @param relation the database storing the parameterization functions
 * @return a dim-1 dimensional database where the objects are projected into
 *         the specified subspace
 */
private MaterializedRelation<ParameterizationFunction> buildDB(int dim, double[][] basis, DBIDs ids, Relation<ParameterizationFunction> relation) {
    ProxyDatabase proxy = new ProxyDatabase(ids);
    SimpleTypeInformation<ParameterizationFunction> type = new SimpleTypeInformation<>(ParameterizationFunction.class);
    WritableDataStore<ParameterizationFunction> prep = DataStoreUtil.makeStorage(ids, DataStoreFactory.HINT_HOT, ParameterizationFunction.class);
    // Project
    for (DBIDIter iter = ids.iter(); iter.valid(); iter.advance()) {
        prep.put(iter, project(basis, relation.get(iter)));
    }
    if (LOG.isDebugging()) {
        LOG.debugFine("db fuer dim " + (dim - 1) + ": " + ids.size());
    }
    MaterializedRelation<ParameterizationFunction> prel = new MaterializedRelation<>(type, ids, null, prep);
    proxy.addRelation(prel);
    return prel;
}
Also used : ProxyDatabase(de.lmu.ifi.dbs.elki.database.ProxyDatabase) ParameterizationFunction(de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.cash.ParameterizationFunction) SimpleTypeInformation(de.lmu.ifi.dbs.elki.data.type.SimpleTypeInformation) DBIDIter(de.lmu.ifi.dbs.elki.database.ids.DBIDIter) MaterializedRelation(de.lmu.ifi.dbs.elki.database.relation.MaterializedRelation)

Example 4 with ProxyDatabase

use of de.lmu.ifi.dbs.elki.database.ProxyDatabase in project elki by elki-project.

the class RepresentativeUncertainClustering method run.

/**
 * This run method will do the wrapping.
 *
 * Its called from {@link AbstractAlgorithm#run(Database)} and performs the
 * call to the algorithms particular run method as well as the storing and
 * comparison of the resulting Clusterings.
 *
 * @param database Database
 * @param relation Data relation of uncertain objects
 * @return Clustering result
 */
public Clustering<?> run(Database database, Relation<? extends UncertainObject> relation) {
    ResultHierarchy hierarchy = database.getHierarchy();
    ArrayList<Clustering<?>> clusterings = new ArrayList<>();
    final int dim = RelationUtil.dimensionality(relation);
    DBIDs ids = relation.getDBIDs();
    // To collect samples
    Result samples = new BasicResult("Samples", "samples");
    // Step 1: Cluster sampled possible worlds:
    Random rand = random.getSingleThreadedRandom();
    FiniteProgress sampleP = LOG.isVerbose() ? new FiniteProgress("Clustering samples", numsamples, LOG) : null;
    for (int i = 0; i < numsamples; i++) {
        WritableDataStore<DoubleVector> store = DataStoreUtil.makeStorage(ids, DataStoreFactory.HINT_DB, DoubleVector.class);
        for (DBIDIter iter = ids.iter(); iter.valid(); iter.advance()) {
            store.put(iter, relation.get(iter).drawSample(rand));
        }
        clusterings.add(runClusteringAlgorithm(hierarchy, samples, ids, store, dim, "Sample " + i));
        LOG.incrementProcessed(sampleP);
    }
    LOG.ensureCompleted(sampleP);
    // Step 2: perform the meta clustering (on samples only).
    DBIDRange rids = DBIDFactory.FACTORY.generateStaticDBIDRange(clusterings.size());
    WritableDataStore<Clustering<?>> datastore = DataStoreUtil.makeStorage(rids, DataStoreFactory.HINT_DB, Clustering.class);
    {
        Iterator<Clustering<?>> it2 = clusterings.iterator();
        for (DBIDIter iter = rids.iter(); iter.valid(); iter.advance()) {
            datastore.put(iter, it2.next());
        }
    }
    assert (rids.size() == clusterings.size());
    // Build a relation, and a distance matrix.
    Relation<Clustering<?>> crel = new MaterializedRelation<Clustering<?>>(Clustering.TYPE, rids, "Clusterings", datastore);
    PrecomputedDistanceMatrix<Clustering<?>> mat = new PrecomputedDistanceMatrix<>(crel, rids, distance);
    mat.initialize();
    ProxyDatabase d = new ProxyDatabase(rids, crel);
    d.getHierarchy().add(crel, mat);
    Clustering<?> c = metaAlgorithm.run(d);
    // Detach from database
    d.getHierarchy().remove(d, c);
    // Evaluation
    Result reps = new BasicResult("Representants", "representative");
    hierarchy.add(relation, reps);
    DistanceQuery<Clustering<?>> dq = mat.getDistanceQuery(distance);
    List<? extends Cluster<?>> cl = c.getAllClusters();
    List<DoubleObjPair<Clustering<?>>> evaluated = new ArrayList<>(cl.size());
    for (Cluster<?> clus : cl) {
        double besttau = Double.POSITIVE_INFINITY;
        Clustering<?> bestc = null;
        for (DBIDIter it1 = clus.getIDs().iter(); it1.valid(); it1.advance()) {
            double tau = 0.;
            Clustering<?> curc = crel.get(it1);
            for (DBIDIter it2 = clus.getIDs().iter(); it2.valid(); it2.advance()) {
                if (DBIDUtil.equal(it1, it2)) {
                    continue;
                }
                double di = dq.distance(curc, it2);
                tau = di > tau ? di : tau;
            }
            // Cluster member with the least maximum distance.
            if (tau < besttau) {
                besttau = tau;
                bestc = curc;
            }
        }
        if (bestc == null) {
            // E.g. degenerate empty clusters
            continue;
        }
        // Global tau:
        double gtau = 0.;
        for (DBIDIter it2 = crel.iterDBIDs(); it2.valid(); it2.advance()) {
            double di = dq.distance(bestc, it2);
            gtau = di > gtau ? di : gtau;
        }
        final double cprob = computeConfidence(clus.size(), crel.size());
        // Build an evaluation result
        hierarchy.add(bestc, new RepresentativenessEvaluation(gtau, besttau, cprob));
        evaluated.add(new DoubleObjPair<Clustering<?>>(cprob, bestc));
    }
    // Sort evaluated results by confidence:
    Collections.sort(evaluated, Collections.reverseOrder());
    for (DoubleObjPair<Clustering<?>> pair : evaluated) {
        // Attach parent relation (= sample) to the representative samples.
        for (It<Relation<?>> it = hierarchy.iterParents(pair.second).filter(Relation.class); it.valid(); it.advance()) {
            hierarchy.add(reps, it.get());
        }
    }
    // Add the random samples below the representative results only:
    if (keep) {
        hierarchy.add(relation, samples);
    } else {
        hierarchy.removeSubtree(samples);
    }
    return c;
}
Also used : ArrayList(java.util.ArrayList) Result(de.lmu.ifi.dbs.elki.result.Result) EvaluationResult(de.lmu.ifi.dbs.elki.result.EvaluationResult) BasicResult(de.lmu.ifi.dbs.elki.result.BasicResult) DBIDIter(de.lmu.ifi.dbs.elki.database.ids.DBIDIter) MaterializedRelation(de.lmu.ifi.dbs.elki.database.relation.MaterializedRelation) MaterializedRelation(de.lmu.ifi.dbs.elki.database.relation.MaterializedRelation) Relation(de.lmu.ifi.dbs.elki.database.relation.Relation) Random(java.util.Random) BasicResult(de.lmu.ifi.dbs.elki.result.BasicResult) Iterator(java.util.Iterator) ResultHierarchy(de.lmu.ifi.dbs.elki.result.ResultHierarchy) DBIDs(de.lmu.ifi.dbs.elki.database.ids.DBIDs) FiniteProgress(de.lmu.ifi.dbs.elki.logging.progress.FiniteProgress) ProxyDatabase(de.lmu.ifi.dbs.elki.database.ProxyDatabase) PrecomputedDistanceMatrix(de.lmu.ifi.dbs.elki.index.distancematrix.PrecomputedDistanceMatrix) Clustering(de.lmu.ifi.dbs.elki.data.Clustering) DoubleObjPair(de.lmu.ifi.dbs.elki.utilities.pairs.DoubleObjPair) DBIDRange(de.lmu.ifi.dbs.elki.database.ids.DBIDRange) DoubleVector(de.lmu.ifi.dbs.elki.data.DoubleVector)

Example 5 with ProxyDatabase

use of de.lmu.ifi.dbs.elki.database.ProxyDatabase in project elki by elki-project.

the class RepresentativeUncertainClustering method runClusteringAlgorithm.

/**
 * Run a clustering algorithm on a single instance.
 *
 * @param parent Parent result to attach to
 * @param ids Object IDs to process
 * @param store Input data
 * @param dim Dimensionality
 * @param title Title of relation
 * @return Clustering result
 */
protected Clustering<?> runClusteringAlgorithm(ResultHierarchy hierarchy, Result parent, DBIDs ids, DataStore<DoubleVector> store, int dim, String title) {
    SimpleTypeInformation<DoubleVector> t = new VectorFieldTypeInformation<>(DoubleVector.FACTORY, dim);
    Relation<DoubleVector> sample = new MaterializedRelation<>(t, ids, title, store);
    ProxyDatabase d = new ProxyDatabase(ids, sample);
    Clustering<?> clusterResult = samplesAlgorithm.run(d);
    d.getHierarchy().remove(sample);
    d.getHierarchy().remove(clusterResult);
    hierarchy.add(parent, sample);
    hierarchy.add(sample, clusterResult);
    return clusterResult;
}
Also used : VectorFieldTypeInformation(de.lmu.ifi.dbs.elki.data.type.VectorFieldTypeInformation) ProxyDatabase(de.lmu.ifi.dbs.elki.database.ProxyDatabase) DoubleVector(de.lmu.ifi.dbs.elki.data.DoubleVector) MaterializedRelation(de.lmu.ifi.dbs.elki.database.relation.MaterializedRelation)

Aggregations

ProxyDatabase (de.lmu.ifi.dbs.elki.database.ProxyDatabase)11 MaterializedRelation (de.lmu.ifi.dbs.elki.database.relation.MaterializedRelation)6 DBIDIter (de.lmu.ifi.dbs.elki.database.ids.DBIDIter)5 VectorFieldTypeInformation (de.lmu.ifi.dbs.elki.data.type.VectorFieldTypeInformation)4 DBIDs (de.lmu.ifi.dbs.elki.database.ids.DBIDs)4 ArrayList (java.util.ArrayList)4 Cluster (de.lmu.ifi.dbs.elki.data.Cluster)3 Clustering (de.lmu.ifi.dbs.elki.data.Clustering)3 DoubleVector (de.lmu.ifi.dbs.elki.data.DoubleVector)3 FiniteProgress (de.lmu.ifi.dbs.elki.logging.progress.FiniteProgress)3 Relation (de.lmu.ifi.dbs.elki.database.relation.Relation)2 DBSCAN (de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN)1 ParameterizationFunction (de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.cash.ParameterizationFunction)1 Model (de.lmu.ifi.dbs.elki.data.model.Model)1 SubspaceModel (de.lmu.ifi.dbs.elki.data.model.SubspaceModel)1 NumericalFeatureSelection (de.lmu.ifi.dbs.elki.data.projection.NumericalFeatureSelection)1 SimpleTypeInformation (de.lmu.ifi.dbs.elki.data.type.SimpleTypeInformation)1 WritableDoubleDataStore (de.lmu.ifi.dbs.elki.database.datastore.WritableDoubleDataStore)1 ArrayDBIDs (de.lmu.ifi.dbs.elki.database.ids.ArrayDBIDs)1 ArrayModifiableDBIDs (de.lmu.ifi.dbs.elki.database.ids.ArrayModifiableDBIDs)1