Search in sources :

Example 26 with CharFilter

use of org.apache.lucene.analysis.CharFilter in project lucene-solr by apache.

the class TestICUNormalizer2CharFilter method testMassiveLigature.

public void testMassiveLigature() throws IOException {
    String input = "ﷺ";
    CharFilter reader = new ICUNormalizer2CharFilter(new StringReader(input), Normalizer2.getInstance(null, "nfkc_cf", Normalizer2.Mode.COMPOSE));
    Tokenizer tokenStream = new MockTokenizer(MockTokenizer.WHITESPACE, false);
    tokenStream.setReader(reader);
    assertTokenStreamContents(tokenStream, new String[] { "صلى", "الله", "عليه", "وسلم" }, new int[] { 0, 0, 0, 0 }, new int[] { 0, 0, 0, 1 }, input.length());
}
Also used : MockTokenizer(org.apache.lucene.analysis.MockTokenizer) CharFilter(org.apache.lucene.analysis.CharFilter) StringReader(java.io.StringReader) Tokenizer(org.apache.lucene.analysis.Tokenizer) MockTokenizer(org.apache.lucene.analysis.MockTokenizer) NGramTokenizer(org.apache.lucene.analysis.ngram.NGramTokenizer)

Example 27 with CharFilter

use of org.apache.lucene.analysis.CharFilter in project lucene-solr by apache.

the class TestMappingCharFilter method test1to1.

public void test1to1() throws Exception {
    CharFilter cs = new MappingCharFilter(normMap, new StringReader("h"));
    TokenStream ts = whitespaceMockTokenizer(cs);
    assertTokenStreamContents(ts, new String[] { "i" }, new int[] { 0 }, new int[] { 1 }, 1);
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) CharFilter(org.apache.lucene.analysis.CharFilter) StringReader(java.io.StringReader)

Example 28 with CharFilter

use of org.apache.lucene.analysis.CharFilter in project lucene-solr by apache.

the class TestMappingCharFilter method test5to0.

public void test5to0() throws Exception {
    CharFilter cs = new MappingCharFilter(normMap, new StringReader("empty"));
    TokenStream ts = whitespaceMockTokenizer(cs);
    assertTokenStreamContents(ts, new String[0], new int[] {}, new int[] {}, 5);
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) CharFilter(org.apache.lucene.analysis.CharFilter) StringReader(java.io.StringReader)

Example 29 with CharFilter

use of org.apache.lucene.analysis.CharFilter in project lucene-solr by apache.

the class TestMappingCharFilter method testNothingChange.

public void testNothingChange() throws Exception {
    CharFilter cs = new MappingCharFilter(normMap, new StringReader("x"));
    TokenStream ts = whitespaceMockTokenizer(cs);
    assertTokenStreamContents(ts, new String[] { "x" }, new int[] { 0 }, new int[] { 1 }, 1);
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) CharFilter(org.apache.lucene.analysis.CharFilter) StringReader(java.io.StringReader)

Example 30 with CharFilter

use of org.apache.lucene.analysis.CharFilter in project lucene-solr by apache.

the class TestMappingCharFilter method test2to4.

public void test2to4() throws Exception {
    CharFilter cs = new MappingCharFilter(normMap, new StringReader("ll"));
    TokenStream ts = whitespaceMockTokenizer(cs);
    assertTokenStreamContents(ts, new String[] { "llll" }, new int[] { 0 }, new int[] { 2 }, 2);
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) CharFilter(org.apache.lucene.analysis.CharFilter) StringReader(java.io.StringReader)

Aggregations

CharFilter (org.apache.lucene.analysis.CharFilter)41 StringReader (java.io.StringReader)40 TokenStream (org.apache.lucene.analysis.TokenStream)26 Tokenizer (org.apache.lucene.analysis.Tokenizer)10 MockTokenizer (org.apache.lucene.analysis.MockTokenizer)7 MappingCharFilter (org.apache.lucene.analysis.charfilter.MappingCharFilter)4 Normalizer2 (com.ibm.icu.text.Normalizer2)3 ArrayList (java.util.ArrayList)3 NormalizeCharMap (org.apache.lucene.analysis.charfilter.NormalizeCharMap)3 NGramTokenizer (org.apache.lucene.analysis.ngram.NGramTokenizer)3 HashMap (java.util.HashMap)2 Settings (org.elasticsearch.common.settings.Settings)2 Index (org.elasticsearch.index.Index)2 AnalysisICUPlugin (org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin)2 IOException (java.io.IOException)1 MockCharFilter (org.apache.lucene.analysis.MockCharFilter)1