use of org.apache.lucene.analysis.core.LowerCaseFilter in project cogcomp-nlp by CogComp.
the class WikiURLAnalyzer method createComponents.
@Override
protected TokenStreamComponents createComponents(final String fieldName) {
final Tokenizer source = new KeywordTokenizer();
TokenStream result = new StandardFilter(source);
result = new CharacterFilter(result);
result = new ASCIIFoldingFilter(result);
result = new LowerCaseFilter(result);
return new TokenStreamComponents(source, result);
}
use of org.apache.lucene.analysis.core.LowerCaseFilter in project vertigo by KleeGroup.
the class DefaultAnalyzer method createComponents.
/**
* Creates a TokenStream which tokenizes all the text in the provided Reader.
*
* @return A TokenStream build from a StandardTokenizer filtered with
* StandardFilter, StopFilter, FrenchStemFilter and LowerCaseFilter
*/
@Override
protected TokenStreamComponents createComponents(final String fieldName) {
/* initialisation du token */
final Tokenizer source = new StandardTokenizer();
// -----
/* on retire les élisions*/
final CharArraySet elisionSet = new CharArraySet(Arrays.asList(LuceneConstants.ELISION_ARTICLES), true);
TokenStream filter = new ElisionFilter(source, elisionSet);
/* on retire article adjectif */
filter = new StopFilter(filter, stopWords);
/* on retire les accents */
filter = new ASCIIFoldingFilter(filter);
/* on met en minuscule */
filter = new LowerCaseFilter(filter);
return new TokenStreamComponents(source, filter);
}
Aggregations