Search in sources :

Example 1 with Cluster

use of org.carrot2.core.Cluster in project lucene-solr by apache.

the class CarrotClusteringEngine method cluster.

@Override
public Object cluster(Query query, SolrDocumentList solrDocList, Map<SolrDocument, Integer> docIds, SolrQueryRequest sreq) {
    try {
        // Prepare attributes for Carrot2 clustering call
        Map<String, Object> attributes = new HashMap<>();
        List<Document> documents = getDocuments(solrDocList, docIds, query, sreq);
        attributes.put(AttributeNames.DOCUMENTS, documents);
        attributes.put(AttributeNames.QUERY, query.toString());
        // Pass the fields on which clustering runs.
        attributes.put("solrFieldNames", getFieldsForClustering(sreq));
        // Pass extra overriding attributes from the request, if any
        extractCarrotAttributes(sreq.getParams(), attributes);
        // Perform clustering and convert to an output structure of clusters.
        //
        // Carrot2 uses current thread's context class loader to get
        // certain classes (e.g. custom tokenizer/stemmer) at runtime.
        // To make sure classes from contrib JARs are available,
        // we swap the context class loader for the time of clustering.
        Thread ct = Thread.currentThread();
        ClassLoader prev = ct.getContextClassLoader();
        try {
            ct.setContextClassLoader(core.getResourceLoader().getClassLoader());
            return clustersToNamedList(controller.process(attributes, clusteringAlgorithmClass).getClusters(), sreq.getParams());
        } finally {
            ct.setContextClassLoader(prev);
        }
    } catch (Exception e) {
        log.error("Carrot2 clustering failed", e);
        throw new SolrException(ErrorCode.SERVER_ERROR, "Carrot2 clustering failed", e);
    }
}
Also used : HashMap(java.util.HashMap) Document(org.carrot2.core.Document) SolrDocument(org.apache.solr.common.SolrDocument) SolrException(org.apache.solr.common.SolrException) IOException(java.io.IOException) SolrException(org.apache.solr.common.SolrException)

Example 2 with Cluster

use of org.carrot2.core.Cluster in project lucene-solr by apache.

the class EchoClusteringAlgorithm method process.

@Override
public void process() throws ProcessingException {
    clusters = new ArrayList<>();
    for (Document document : documents) {
        final Cluster cluster = new Cluster();
        cluster.addPhrases(document.getTitle(), document.getSummary());
        if (document.getLanguage() != null) {
            cluster.addPhrases(document.getLanguage().name());
        }
        for (String field : customFields.split(",")) {
            Object value = document.getField(field);
            if (value != null) {
                cluster.addPhrases(value.toString());
            }
        }
        cluster.addDocuments(document);
        clusters.add(cluster);
    }
}
Also used : Cluster(org.carrot2.core.Cluster) Document(org.carrot2.core.Document)

Example 3 with Cluster

use of org.carrot2.core.Cluster in project lucene-solr by apache.

the class EchoTokensClusteringAlgorithm method process.

@Override
public void process() throws ProcessingException {
    final PreprocessingContext preprocessingContext = preprocessing.preprocess(documents, "", LanguageCode.ENGLISH);
    clusters = new ArrayList<>();
    for (char[] token : preprocessingContext.allTokens.image) {
        if (token != null) {
            clusters.add(new Cluster(new String(token)));
        }
    }
}
Also used : PreprocessingContext(org.carrot2.text.preprocessing.PreprocessingContext) Cluster(org.carrot2.core.Cluster)

Example 4 with Cluster

use of org.carrot2.core.Cluster in project lucene-solr by apache.

the class EchoStemsClusteringAlgorithm method process.

@Override
public void process() throws ProcessingException {
    final PreprocessingContext preprocessingContext = preprocessing.preprocess(documents, "", LanguageCode.ENGLISH);
    final AllTokens allTokens = preprocessingContext.allTokens;
    final AllWords allWords = preprocessingContext.allWords;
    final AllStems allStems = preprocessingContext.allStems;
    clusters = new ArrayList<>();
    for (int i = 0; i < allTokens.image.length; i++) {
        if (allTokens.wordIndex[i] >= 0) {
            clusters.add(new Cluster(new String(allStems.image[allWords.stemIndex[allTokens.wordIndex[i]]])));
        }
    }
}
Also used : PreprocessingContext(org.carrot2.text.preprocessing.PreprocessingContext) AllStems(org.carrot2.text.preprocessing.PreprocessingContext.AllStems) Cluster(org.carrot2.core.Cluster) AllTokens(org.carrot2.text.preprocessing.PreprocessingContext.AllTokens) AllWords(org.carrot2.text.preprocessing.PreprocessingContext.AllWords)

Example 5 with Cluster

use of org.carrot2.core.Cluster in project lucene-solr by apache.

the class CarrotClusteringEngine method clustersToNamedList.

private void clustersToNamedList(List<Cluster> outputClusters, List<NamedList<Object>> parent, boolean outputSubClusters, int maxLabels) {
    for (Cluster outCluster : outputClusters) {
        NamedList<Object> cluster = new SimpleOrderedMap<>();
        parent.add(cluster);
        // Add labels
        List<String> labels = outCluster.getPhrases();
        if (labels.size() > maxLabels) {
            labels = labels.subList(0, maxLabels);
        }
        cluster.add("labels", labels);
        // Add cluster score
        final Double score = outCluster.getScore();
        if (score != null) {
            cluster.add("score", score);
        }
        // Add other topics marker
        if (outCluster.isOtherTopics()) {
            cluster.add("other-topics", outCluster.isOtherTopics());
        }
        // Add documents
        List<Document> docs = outputSubClusters ? outCluster.getDocuments() : outCluster.getAllDocuments();
        List<Object> docList = new ArrayList<>();
        cluster.add("docs", docList);
        for (Document doc : docs) {
            docList.add(doc.getField(SOLR_DOCUMENT_ID));
        }
        // Add subclusters
        if (outputSubClusters && !outCluster.getSubclusters().isEmpty()) {
            List<NamedList<Object>> subclusters = new ArrayList<>();
            cluster.add("clusters", subclusters);
            clustersToNamedList(outCluster.getSubclusters(), subclusters, outputSubClusters, maxLabels);
        }
    }
}
Also used : NamedList(org.apache.solr.common.util.NamedList) ArrayList(java.util.ArrayList) Cluster(org.carrot2.core.Cluster) Document(org.carrot2.core.Document) SolrDocument(org.apache.solr.common.SolrDocument) SimpleOrderedMap(org.apache.solr.common.util.SimpleOrderedMap)

Aggregations

Cluster (org.carrot2.core.Cluster)5 Document (org.carrot2.core.Document)3 SolrDocument (org.apache.solr.common.SolrDocument)2 PreprocessingContext (org.carrot2.text.preprocessing.PreprocessingContext)2 IOException (java.io.IOException)1 ArrayList (java.util.ArrayList)1 HashMap (java.util.HashMap)1 SolrException (org.apache.solr.common.SolrException)1 NamedList (org.apache.solr.common.util.NamedList)1 SimpleOrderedMap (org.apache.solr.common.util.SimpleOrderedMap)1 ILexicalData (org.carrot2.text.linguistic.ILexicalData)1 AllStems (org.carrot2.text.preprocessing.PreprocessingContext.AllStems)1 AllTokens (org.carrot2.text.preprocessing.PreprocessingContext.AllTokens)1 AllWords (org.carrot2.text.preprocessing.PreprocessingContext.AllWords)1 MutableCharArray (org.carrot2.text.util.MutableCharArray)1