Search in sources :

Example 6 with TokenStream

use of org.apache.lucene.analysis.TokenStream in project elasticsearch by elastic.

the class CustomAnalyzer method createComponents.

@Override
protected TokenStreamComponents createComponents(String fieldName) {
    Tokenizer tokenizer = tokenizerFactory.create();
    TokenStream tokenStream = tokenizer;
    for (TokenFilterFactory tokenFilter : tokenFilters) {
        tokenStream = tokenFilter.create(tokenStream);
    }
    return new TokenStreamComponents(tokenizer, tokenStream);
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) Tokenizer(org.apache.lucene.analysis.Tokenizer)

Example 7 with TokenStream

use of org.apache.lucene.analysis.TokenStream in project elasticsearch by elastic.

the class EdgeNGramTokenFilterFactory method create.

@Override
public TokenStream create(TokenStream tokenStream) {
    TokenStream result = tokenStream;
    // side=BACK is not supported anymore but applying ReverseStringFilter up-front and after the token filter has the same effect
    if (side == SIDE_BACK) {
        result = new ReverseStringFilter(result);
    }
    result = new EdgeNGramTokenFilter(result, minGram, maxGram);
    // side=BACK is not supported anymore but applying ReverseStringFilter up-front and after the token filter has the same effect
    if (side == SIDE_BACK) {
        result = new ReverseStringFilter(result);
    }
    return result;
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) EdgeNGramTokenFilter(org.apache.lucene.analysis.ngram.EdgeNGramTokenFilter) ReverseStringFilter(org.apache.lucene.analysis.reverse.ReverseStringFilter)

Example 8 with TokenStream

use of org.apache.lucene.analysis.TokenStream in project elasticsearch by elastic.

the class FingerprintTokenFilterFactory method create.

@Override
public TokenStream create(TokenStream tokenStream) {
    TokenStream result = tokenStream;
    result = new FingerprintFilter(result, maxOutputSize, separator);
    return result;
}
Also used : FingerprintFilter(org.apache.lucene.analysis.miscellaneous.FingerprintFilter) TokenStream(org.apache.lucene.analysis.TokenStream)

Example 9 with TokenStream

use of org.apache.lucene.analysis.TokenStream in project elasticsearch by elastic.

the class PatternAnalyzer method createComponents.

@Override
protected TokenStreamComponents createComponents(String s) {
    final Tokenizer tokenizer = new PatternTokenizer(pattern, -1);
    TokenStream stream = tokenizer;
    if (lowercase) {
        stream = new LowerCaseFilter(stream);
    }
    if (stopWords != null) {
        stream = new StopFilter(stream, stopWords);
    }
    return new TokenStreamComponents(tokenizer, stream);
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) StopFilter(org.apache.lucene.analysis.StopFilter) PatternTokenizer(org.apache.lucene.analysis.pattern.PatternTokenizer) Tokenizer(org.apache.lucene.analysis.Tokenizer) PatternTokenizer(org.apache.lucene.analysis.pattern.PatternTokenizer) LowerCaseFilter(org.apache.lucene.analysis.LowerCaseFilter)

Example 10 with TokenStream

use of org.apache.lucene.analysis.TokenStream in project elasticsearch by elastic.

the class SimpleIcuCollationTokenFilterTests method assertCollation.

private void assertCollation(TokenFilterFactory factory, String string1, String string2, int comparison) throws IOException {
    Tokenizer tokenizer = new KeywordTokenizer();
    tokenizer.setReader(new StringReader(string1));
    TokenStream stream1 = factory.create(tokenizer);
    tokenizer = new KeywordTokenizer();
    tokenizer.setReader(new StringReader(string2));
    TokenStream stream2 = factory.create(tokenizer);
    assertCollation(stream1, stream2, comparison);
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) StringReader(java.io.StringReader) KeywordTokenizer(org.apache.lucene.analysis.core.KeywordTokenizer) Tokenizer(org.apache.lucene.analysis.Tokenizer) KeywordTokenizer(org.apache.lucene.analysis.core.KeywordTokenizer)

Aggregations

TokenStream (org.apache.lucene.analysis.TokenStream)848 StringReader (java.io.StringReader)336 Tokenizer (org.apache.lucene.analysis.Tokenizer)244 Reader (java.io.Reader)175 CharTermAttribute (org.apache.lucene.analysis.tokenattributes.CharTermAttribute)140 MockTokenizer (org.apache.lucene.analysis.MockTokenizer)128 Analyzer (org.apache.lucene.analysis.Analyzer)121 CannedTokenStream (org.apache.lucene.analysis.CannedTokenStream)94 LowerCaseFilter (org.apache.lucene.analysis.LowerCaseFilter)88 IOException (java.io.IOException)85 StandardFilter (org.apache.lucene.analysis.standard.StandardFilter)73 Term (org.apache.lucene.index.Term)66 Document (org.apache.lucene.document.Document)64 StandardTokenizer (org.apache.lucene.analysis.standard.StandardTokenizer)59 ArrayList (java.util.ArrayList)58 StopFilter (org.apache.lucene.analysis.StopFilter)58 KeywordTokenizer (org.apache.lucene.analysis.core.KeywordTokenizer)57 SetKeywordMarkerFilter (org.apache.lucene.analysis.miscellaneous.SetKeywordMarkerFilter)53 Test (org.junit.Test)53 OffsetAttribute (org.apache.lucene.analysis.tokenattributes.OffsetAttribute)46