Search in sources :

Example 1 with MockCharFilter

use of org.apache.lucene.analysis.MockCharFilter in project lucene-solr by apache.

the class TestPerFieldAnalyzerWrapper method testCharFilters.

public void testCharFilters() throws Exception {
    Analyzer a = new Analyzer() {

        @Override
        protected TokenStreamComponents createComponents(String fieldName) {
            return new TokenStreamComponents(new MockTokenizer());
        }

        @Override
        protected Reader initReader(String fieldName, Reader reader) {
            return new MockCharFilter(reader, 7);
        }
    };
    assertAnalyzesTo(a, "ab", new String[] { "aab" }, new int[] { 0 }, new int[] { 2 });
    // now wrap in PFAW
    PerFieldAnalyzerWrapper p = new PerFieldAnalyzerWrapper(a, Collections.<String, Analyzer>emptyMap());
    assertAnalyzesTo(p, "ab", new String[] { "aab" }, new int[] { 0 }, new int[] { 2 });
    p.close();
    // TODO: fix this about PFAW, its a trap
    a.close();
}
Also used : MockTokenizer(org.apache.lucene.analysis.MockTokenizer) MockCharFilter(org.apache.lucene.analysis.MockCharFilter) Reader(java.io.Reader) Analyzer(org.apache.lucene.analysis.Analyzer) SimpleAnalyzer(org.apache.lucene.analysis.core.SimpleAnalyzer) WhitespaceAnalyzer(org.apache.lucene.analysis.core.WhitespaceAnalyzer)

Example 2 with MockCharFilter

use of org.apache.lucene.analysis.MockCharFilter in project lucene-solr by apache.

the class TestBugInSomething method test.

public void test() throws Exception {
    final CharArraySet cas = new CharArraySet(3, false);
    cas.add("jjp");
    cas.add("wlmwoknt");
    cas.add("tcgyreo");
    final NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder();
    builder.add("mtqlpi", "");
    builder.add("mwoknt", "jjp");
    builder.add("tcgyreo", "zpfpajyws");
    final NormalizeCharMap map = builder.build();
    Analyzer a = new Analyzer() {

        @Override
        protected TokenStreamComponents createComponents(String fieldName) {
            Tokenizer t = new MockTokenizer(MockTokenFilter.ENGLISH_STOPSET, false, -65);
            TokenFilter f = new CommonGramsFilter(t, cas);
            return new TokenStreamComponents(t, f);
        }

        @Override
        protected Reader initReader(String fieldName, Reader reader) {
            reader = new MockCharFilter(reader, 0);
            reader = new MappingCharFilter(map, reader);
            reader = new TestRandomChains.CheckThatYouDidntReadAnythingReaderWrapper(reader);
            return reader;
        }
    };
    checkAnalysisConsistency(random(), a, false, "wmgddzunizdomqyj");
    a.close();
}
Also used : CharArraySet(org.apache.lucene.analysis.CharArraySet) MockCharFilter(org.apache.lucene.analysis.MockCharFilter) Reader(java.io.Reader) StringReader(java.io.StringReader) Analyzer(org.apache.lucene.analysis.Analyzer) MockTokenizer(org.apache.lucene.analysis.MockTokenizer) CommonGramsFilter(org.apache.lucene.analysis.commongrams.CommonGramsFilter) MappingCharFilter(org.apache.lucene.analysis.charfilter.MappingCharFilter) NormalizeCharMap(org.apache.lucene.analysis.charfilter.NormalizeCharMap) WikipediaTokenizer(org.apache.lucene.analysis.wikipedia.WikipediaTokenizer) Tokenizer(org.apache.lucene.analysis.Tokenizer) MockTokenizer(org.apache.lucene.analysis.MockTokenizer) EdgeNGramTokenizer(org.apache.lucene.analysis.ngram.EdgeNGramTokenizer) NGramTokenFilter(org.apache.lucene.analysis.ngram.NGramTokenFilter) MockTokenFilter(org.apache.lucene.analysis.MockTokenFilter) TokenFilter(org.apache.lucene.analysis.TokenFilter)

Aggregations

Reader (java.io.Reader)2 Analyzer (org.apache.lucene.analysis.Analyzer)2 MockCharFilter (org.apache.lucene.analysis.MockCharFilter)2 MockTokenizer (org.apache.lucene.analysis.MockTokenizer)2 StringReader (java.io.StringReader)1 CharArraySet (org.apache.lucene.analysis.CharArraySet)1 MockTokenFilter (org.apache.lucene.analysis.MockTokenFilter)1 TokenFilter (org.apache.lucene.analysis.TokenFilter)1 Tokenizer (org.apache.lucene.analysis.Tokenizer)1 MappingCharFilter (org.apache.lucene.analysis.charfilter.MappingCharFilter)1 NormalizeCharMap (org.apache.lucene.analysis.charfilter.NormalizeCharMap)1 CommonGramsFilter (org.apache.lucene.analysis.commongrams.CommonGramsFilter)1 SimpleAnalyzer (org.apache.lucene.analysis.core.SimpleAnalyzer)1 WhitespaceAnalyzer (org.apache.lucene.analysis.core.WhitespaceAnalyzer)1 EdgeNGramTokenizer (org.apache.lucene.analysis.ngram.EdgeNGramTokenizer)1 NGramTokenFilter (org.apache.lucene.analysis.ngram.NGramTokenFilter)1 WikipediaTokenizer (org.apache.lucene.analysis.wikipedia.WikipediaTokenizer)1