Search in sources :

Example 1 with AllWords

use of org.carrot2.text.preprocessing.PreprocessingContext.AllWords in project lucene-solr by apache.

the class EchoStemsClusteringAlgorithm method process.

@Override
public void process() throws ProcessingException {
    final PreprocessingContext preprocessingContext = preprocessing.preprocess(documents, "", LanguageCode.ENGLISH);
    final AllTokens allTokens = preprocessingContext.allTokens;
    final AllWords allWords = preprocessingContext.allWords;
    final AllStems allStems = preprocessingContext.allStems;
    clusters = new ArrayList<>();
    for (int i = 0; i < allTokens.image.length; i++) {
        if (allTokens.wordIndex[i] >= 0) {
            clusters.add(new Cluster(new String(allStems.image[allWords.stemIndex[allTokens.wordIndex[i]]])));
        }
    }
}
Also used : PreprocessingContext(org.carrot2.text.preprocessing.PreprocessingContext) AllStems(org.carrot2.text.preprocessing.PreprocessingContext.AllStems) Cluster(org.carrot2.core.Cluster) AllTokens(org.carrot2.text.preprocessing.PreprocessingContext.AllTokens) AllWords(org.carrot2.text.preprocessing.PreprocessingContext.AllWords)

Aggregations

Cluster (org.carrot2.core.Cluster)1 PreprocessingContext (org.carrot2.text.preprocessing.PreprocessingContext)1 AllStems (org.carrot2.text.preprocessing.PreprocessingContext.AllStems)1 AllTokens (org.carrot2.text.preprocessing.PreprocessingContext.AllTokens)1 AllWords (org.carrot2.text.preprocessing.PreprocessingContext.AllWords)1