Search in sources :

Example 1 with EdgeNGramTokenizer

use of org.apache.lucene.analysis.ngram.EdgeNGramTokenizer in project lucene-solr by apache.

the class TestBugInSomething method testUnicodeShinglesAndNgrams.

// LUCENE-5269
@Slow
public void testUnicodeShinglesAndNgrams() throws Exception {
    Analyzer analyzer = new Analyzer() {

        @Override
        protected TokenStreamComponents createComponents(String fieldName) {
            Tokenizer tokenizer = new EdgeNGramTokenizer(2, 94);
            //TokenStream stream = new SopTokenFilter(tokenizer);
            TokenStream stream = new ShingleFilter(tokenizer, 5);
            //stream = new SopTokenFilter(stream);
            stream = new NGramTokenFilter(stream, 55, 83);
            //stream = new SopTokenFilter(stream);
            return new TokenStreamComponents(tokenizer, stream);
        }
    };
    checkRandomData(random(), analyzer, 2000);
    analyzer.close();
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) ShingleFilter(org.apache.lucene.analysis.shingle.ShingleFilter) EdgeNGramTokenizer(org.apache.lucene.analysis.ngram.EdgeNGramTokenizer) NGramTokenFilter(org.apache.lucene.analysis.ngram.NGramTokenFilter) Analyzer(org.apache.lucene.analysis.Analyzer) WikipediaTokenizer(org.apache.lucene.analysis.wikipedia.WikipediaTokenizer) Tokenizer(org.apache.lucene.analysis.Tokenizer) MockTokenizer(org.apache.lucene.analysis.MockTokenizer) EdgeNGramTokenizer(org.apache.lucene.analysis.ngram.EdgeNGramTokenizer)

Aggregations

Analyzer (org.apache.lucene.analysis.Analyzer)1 MockTokenizer (org.apache.lucene.analysis.MockTokenizer)1 TokenStream (org.apache.lucene.analysis.TokenStream)1 Tokenizer (org.apache.lucene.analysis.Tokenizer)1 EdgeNGramTokenizer (org.apache.lucene.analysis.ngram.EdgeNGramTokenizer)1 NGramTokenFilter (org.apache.lucene.analysis.ngram.NGramTokenFilter)1 ShingleFilter (org.apache.lucene.analysis.shingle.ShingleFilter)1 WikipediaTokenizer (org.apache.lucene.analysis.wikipedia.WikipediaTokenizer)1