Search in sources :

Example 31 with CharFilter

use of org.apache.lucene.analysis.CharFilter in project lucene-solr by apache.

the class TestMappingCharFilter method test4to2.

public void test4to2() throws Exception {
    CharFilter cs = new MappingCharFilter(normMap, new StringReader("cccc"));
    TokenStream ts = whitespaceMockTokenizer(cs);
    assertTokenStreamContents(ts, new String[] { "cc" }, new int[] { 0 }, new int[] { 4 }, 4);
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) CharFilter(org.apache.lucene.analysis.CharFilter) StringReader(java.io.StringReader)

Example 32 with CharFilter

use of org.apache.lucene.analysis.CharFilter in project lucene-solr by apache.

the class TestMappingCharFilter method testReaderReset.

public void testReaderReset() throws Exception {
    CharFilter cs = new MappingCharFilter(normMap, new StringReader("x"));
    char[] buf = new char[10];
    int len = cs.read(buf, 0, 10);
    assertEquals(1, len);
    assertEquals('x', buf[0]);
    len = cs.read(buf, 0, 10);
    assertEquals(-1, len);
    // rewind
    cs.reset();
    len = cs.read(buf, 0, 10);
    assertEquals(1, len);
    assertEquals('x', buf[0]);
}
Also used : CharFilter(org.apache.lucene.analysis.CharFilter) StringReader(java.io.StringReader)

Example 33 with CharFilter

use of org.apache.lucene.analysis.CharFilter in project lucene-solr by apache.

the class TestMappingCharFilter method test2to1.

public void test2to1() throws Exception {
    CharFilter cs = new MappingCharFilter(normMap, new StringReader("aa"));
    TokenStream ts = whitespaceMockTokenizer(cs);
    assertTokenStreamContents(ts, new String[] { "a" }, new int[] { 0 }, new int[] { 2 }, 2);
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) CharFilter(org.apache.lucene.analysis.CharFilter) StringReader(java.io.StringReader)

Example 34 with CharFilter

use of org.apache.lucene.analysis.CharFilter in project lucene-solr by apache.

the class TestMappingCharFilter method testNonBMPChar.

public void testNonBMPChar() throws Exception {
    CharFilter cs = new MappingCharFilter(normMap, new StringReader(UnicodeUtil.newString(new int[] { 0x1D122 }, 0, 1)));
    TokenStream ts = whitespaceMockTokenizer(cs);
    assertTokenStreamContents(ts, new String[] { "fclef" }, new int[] { 0 }, new int[] { 2 }, 2);
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) CharFilter(org.apache.lucene.analysis.CharFilter) StringReader(java.io.StringReader)

Example 35 with CharFilter

use of org.apache.lucene.analysis.CharFilter in project lucene-solr by apache.

the class TestMappingCharFilter method test3to1.

public void test3to1() throws Exception {
    CharFilter cs = new MappingCharFilter(normMap, new StringReader("bbb"));
    TokenStream ts = whitespaceMockTokenizer(cs);
    assertTokenStreamContents(ts, new String[] { "b" }, new int[] { 0 }, new int[] { 3 }, 3);
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) CharFilter(org.apache.lucene.analysis.CharFilter) StringReader(java.io.StringReader)

Aggregations

CharFilter (org.apache.lucene.analysis.CharFilter)41 StringReader (java.io.StringReader)40 TokenStream (org.apache.lucene.analysis.TokenStream)26 Tokenizer (org.apache.lucene.analysis.Tokenizer)10 MockTokenizer (org.apache.lucene.analysis.MockTokenizer)7 MappingCharFilter (org.apache.lucene.analysis.charfilter.MappingCharFilter)4 Normalizer2 (com.ibm.icu.text.Normalizer2)3 ArrayList (java.util.ArrayList)3 NormalizeCharMap (org.apache.lucene.analysis.charfilter.NormalizeCharMap)3 NGramTokenizer (org.apache.lucene.analysis.ngram.NGramTokenizer)3 HashMap (java.util.HashMap)2 Settings (org.elasticsearch.common.settings.Settings)2 Index (org.elasticsearch.index.Index)2 AnalysisICUPlugin (org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin)2 IOException (java.io.IOException)1 MockCharFilter (org.apache.lucene.analysis.MockCharFilter)1