Search in sources :

Example 1 with ILexicalData

use of org.carrot2.text.linguistic.ILexicalData in project lucene-solr by apache.

the class LexicalResourcesCheckClusteringAlgorithm method process.

@Override
public void process() throws ProcessingException {
    clusters = new ArrayList<>();
    if (wordsToCheck == null) {
        return;
    }
    // Test with Maltese so that the English clustering performed in other tests
    // is not affected by the test stopwords and stoplabels.
    ILexicalData lexicalData = preprocessing.lexicalDataFactory.getLexicalData(LanguageCode.MALTESE);
    for (String word : wordsToCheck.split(",")) {
        if (!lexicalData.isCommonWord(new MutableCharArray(word)) && !lexicalData.isStopLabel(word)) {
            clusters.add(new Cluster(word));
        }
    }
}
Also used : ILexicalData(org.carrot2.text.linguistic.ILexicalData) MutableCharArray(org.carrot2.text.util.MutableCharArray) Cluster(org.carrot2.core.Cluster)

Aggregations

Cluster (org.carrot2.core.Cluster)1 ILexicalData (org.carrot2.text.linguistic.ILexicalData)1 MutableCharArray (org.carrot2.text.util.MutableCharArray)1