Search in sources :

Example 6 with ASCIIFoldingFilter

use of org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter in project cogcomp-nlp by CogComp.

the class WikiURLAnalyzer method createComponents.

@Override
protected TokenStreamComponents createComponents(final String fieldName) {
    final Tokenizer source = new KeywordTokenizer();
    TokenStream result = new StandardFilter(source);
    result = new CharacterFilter(result);
    result = new ASCIIFoldingFilter(result);
    result = new LowerCaseFilter(result);
    return new TokenStreamComponents(source, result);
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) StandardFilter(org.apache.lucene.analysis.standard.StandardFilter) ASCIIFoldingFilter(org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter) KeywordTokenizer(org.apache.lucene.analysis.core.KeywordTokenizer) Tokenizer(org.apache.lucene.analysis.Tokenizer) KeywordTokenizer(org.apache.lucene.analysis.core.KeywordTokenizer) LowerCaseFilter(org.apache.lucene.analysis.core.LowerCaseFilter)

Aggregations

Tokenizer (org.apache.lucene.analysis.Tokenizer)6 ASCIIFoldingFilter (org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter)6 TokenStream (org.apache.lucene.analysis.TokenStream)5 LowerCaseFilter (org.apache.lucene.analysis.core.LowerCaseFilter)4 StandardFilter (org.apache.lucene.analysis.standard.StandardFilter)4 StandardTokenizer (org.apache.lucene.analysis.standard.StandardTokenizer)3 KeywordTokenizer (org.apache.lucene.analysis.core.KeywordTokenizer)2 StopFilter (org.apache.lucene.analysis.core.StopFilter)2 EnglishPossessiveFilter (org.apache.lucene.analysis.en.EnglishPossessiveFilter)2 PorterStemFilter (org.apache.lucene.analysis.en.PorterStemFilter)2 WordDelimiterFilter (org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter)2 Analyzer (org.apache.lucene.analysis.Analyzer)1 LowerCaseFilter (org.apache.lucene.analysis.LowerCaseFilter)1 MockTokenizer (org.apache.lucene.analysis.MockTokenizer)1 StopFilter (org.apache.lucene.analysis.StopFilter)1 TokenFilter (org.apache.lucene.analysis.TokenFilter)1 WhitespaceTokenizer (org.apache.lucene.analysis.core.WhitespaceTokenizer)1 FingerprintFilter (org.apache.lucene.analysis.miscellaneous.FingerprintFilter)1 ShingleFilter (org.apache.lucene.analysis.shingle.ShingleFilter)1 CharTokenizer (org.apache.lucene.analysis.util.CharTokenizer)1