use of org.carrot2.text.preprocessing.PreprocessingContext in project lucene-solr by apache.
the class EchoTokensClusteringAlgorithm method process.
@Override
public void process() throws ProcessingException {
final PreprocessingContext preprocessingContext = preprocessing.preprocess(documents, "", LanguageCode.ENGLISH);
clusters = new ArrayList<>();
for (char[] token : preprocessingContext.allTokens.image) {
if (token != null) {
clusters.add(new Cluster(new String(token)));
}
}
}
use of org.carrot2.text.preprocessing.PreprocessingContext in project lucene-solr by apache.
the class EchoStemsClusteringAlgorithm method process.
@Override
public void process() throws ProcessingException {
final PreprocessingContext preprocessingContext = preprocessing.preprocess(documents, "", LanguageCode.ENGLISH);
final AllTokens allTokens = preprocessingContext.allTokens;
final AllWords allWords = preprocessingContext.allWords;
final AllStems allStems = preprocessingContext.allStems;
clusters = new ArrayList<>();
for (int i = 0; i < allTokens.image.length; i++) {
if (allTokens.wordIndex[i] >= 0) {
clusters.add(new Cluster(new String(allStems.image[allWords.stemIndex[allTokens.wordIndex[i]]])));
}
}
}
Aggregations