Search in sources :

Example 6 with Cluster

use of org.carrot2.core.Cluster in project lucene-solr by apache.

the class LexicalResourcesCheckClusteringAlgorithm method process.

@Override
public void process() throws ProcessingException {
    clusters = new ArrayList<>();
    if (wordsToCheck == null) {
        return;
    }
    // Test with Maltese so that the English clustering performed in other tests
    // is not affected by the test stopwords and stoplabels.
    ILexicalData lexicalData = preprocessing.lexicalDataFactory.getLexicalData(LanguageCode.MALTESE);
    for (String word : wordsToCheck.split(",")) {
        if (!lexicalData.isCommonWord(new MutableCharArray(word)) && !lexicalData.isStopLabel(word)) {
            clusters.add(new Cluster(word));
        }
    }
}
Also used : ILexicalData(org.carrot2.text.linguistic.ILexicalData) MutableCharArray(org.carrot2.text.util.MutableCharArray) Cluster(org.carrot2.core.Cluster)

Aggregations

Cluster (org.carrot2.core.Cluster)5 Document (org.carrot2.core.Document)3 SolrDocument (org.apache.solr.common.SolrDocument)2 PreprocessingContext (org.carrot2.text.preprocessing.PreprocessingContext)2 IOException (java.io.IOException)1 ArrayList (java.util.ArrayList)1 HashMap (java.util.HashMap)1 SolrException (org.apache.solr.common.SolrException)1 NamedList (org.apache.solr.common.util.NamedList)1 SimpleOrderedMap (org.apache.solr.common.util.SimpleOrderedMap)1 ILexicalData (org.carrot2.text.linguistic.ILexicalData)1 AllStems (org.carrot2.text.preprocessing.PreprocessingContext.AllStems)1 AllTokens (org.carrot2.text.preprocessing.PreprocessingContext.AllTokens)1 AllWords (org.carrot2.text.preprocessing.PreprocessingContext.AllWords)1 MutableCharArray (org.carrot2.text.util.MutableCharArray)1