Search in sources :

Example 1 with CommonGramsFilter

use of org.apache.lucene.analysis.commongrams.CommonGramsFilter in project lucene-solr by apache.

the class TestBugInSomething method test.

public void test() throws Exception {
    final CharArraySet cas = new CharArraySet(3, false);
    cas.add("jjp");
    cas.add("wlmwoknt");
    cas.add("tcgyreo");
    final NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder();
    builder.add("mtqlpi", "");
    builder.add("mwoknt", "jjp");
    builder.add("tcgyreo", "zpfpajyws");
    final NormalizeCharMap map = builder.build();
    Analyzer a = new Analyzer() {

        @Override
        protected TokenStreamComponents createComponents(String fieldName) {
            Tokenizer t = new MockTokenizer(MockTokenFilter.ENGLISH_STOPSET, false, -65);
            TokenFilter f = new CommonGramsFilter(t, cas);
            return new TokenStreamComponents(t, f);
        }

        @Override
        protected Reader initReader(String fieldName, Reader reader) {
            reader = new MockCharFilter(reader, 0);
            reader = new MappingCharFilter(map, reader);
            reader = new TestRandomChains.CheckThatYouDidntReadAnythingReaderWrapper(reader);
            return reader;
        }
    };
    checkAnalysisConsistency(random(), a, false, "wmgddzunizdomqyj");
    a.close();
}
Also used : CharArraySet(org.apache.lucene.analysis.CharArraySet) MockCharFilter(org.apache.lucene.analysis.MockCharFilter) Reader(java.io.Reader) StringReader(java.io.StringReader) Analyzer(org.apache.lucene.analysis.Analyzer) MockTokenizer(org.apache.lucene.analysis.MockTokenizer) CommonGramsFilter(org.apache.lucene.analysis.commongrams.CommonGramsFilter) MappingCharFilter(org.apache.lucene.analysis.charfilter.MappingCharFilter) NormalizeCharMap(org.apache.lucene.analysis.charfilter.NormalizeCharMap) WikipediaTokenizer(org.apache.lucene.analysis.wikipedia.WikipediaTokenizer) Tokenizer(org.apache.lucene.analysis.Tokenizer) MockTokenizer(org.apache.lucene.analysis.MockTokenizer) EdgeNGramTokenizer(org.apache.lucene.analysis.ngram.EdgeNGramTokenizer) NGramTokenFilter(org.apache.lucene.analysis.ngram.NGramTokenFilter) MockTokenFilter(org.apache.lucene.analysis.MockTokenFilter) TokenFilter(org.apache.lucene.analysis.TokenFilter)

Aggregations

Reader (java.io.Reader)1 StringReader (java.io.StringReader)1 Analyzer (org.apache.lucene.analysis.Analyzer)1 CharArraySet (org.apache.lucene.analysis.CharArraySet)1 MockCharFilter (org.apache.lucene.analysis.MockCharFilter)1 MockTokenFilter (org.apache.lucene.analysis.MockTokenFilter)1 MockTokenizer (org.apache.lucene.analysis.MockTokenizer)1 TokenFilter (org.apache.lucene.analysis.TokenFilter)1 Tokenizer (org.apache.lucene.analysis.Tokenizer)1 MappingCharFilter (org.apache.lucene.analysis.charfilter.MappingCharFilter)1 NormalizeCharMap (org.apache.lucene.analysis.charfilter.NormalizeCharMap)1 CommonGramsFilter (org.apache.lucene.analysis.commongrams.CommonGramsFilter)1 EdgeNGramTokenizer (org.apache.lucene.analysis.ngram.EdgeNGramTokenizer)1 NGramTokenFilter (org.apache.lucene.analysis.ngram.NGramTokenFilter)1 WikipediaTokenizer (org.apache.lucene.analysis.wikipedia.WikipediaTokenizer)1