Search in sources :

Example 1 with TweetLowerCaseEntityPreservingFilter

use of io.anserini.analysis.TweetLowerCaseEntityPreservingFilter in project Anserini by castorini.

the class TRECAnalyzer method createComponents.

@Override
protected TokenStreamComponents createComponents(String fieldName) {
    Tokenizer source = new WhitespaceTokenizer();
    TokenStream filter = new TweetLowerCaseEntityPreservingFilter(source);
    return new TokenStreamComponents(source, filter);
}
Also used : WhitespaceTokenizer(org.apache.lucene.analysis.core.WhitespaceTokenizer) TweetLowerCaseEntityPreservingFilter(io.anserini.analysis.TweetLowerCaseEntityPreservingFilter) TokenStream(org.apache.lucene.analysis.TokenStream) WhitespaceTokenizer(org.apache.lucene.analysis.core.WhitespaceTokenizer) Tokenizer(org.apache.lucene.analysis.Tokenizer)

Aggregations

TweetLowerCaseEntityPreservingFilter (io.anserini.analysis.TweetLowerCaseEntityPreservingFilter)1 TokenStream (org.apache.lucene.analysis.TokenStream)1 Tokenizer (org.apache.lucene.analysis.Tokenizer)1 WhitespaceTokenizer (org.apache.lucene.analysis.core.WhitespaceTokenizer)1