Search in sources :

Example 6 with HyphenationTree

use of org.apache.lucene.analysis.compound.hyphenation.HyphenationTree in project lucene-solr by apache.

the class TestCompoundWordTokenFilter method testEmptyTerm.

public void testEmptyTerm() throws Exception {
    final CharArraySet dict = makeDictionary("a", "e", "i", "o", "u", "y", "bc", "def");
    Analyzer a = new Analyzer() {

        @Override
        protected TokenStreamComponents createComponents(String fieldName) {
            Tokenizer tokenizer = new KeywordTokenizer();
            return new TokenStreamComponents(tokenizer, new DictionaryCompoundWordTokenFilter(tokenizer, dict));
        }
    };
    checkOneTerm(a, "", "");
    a.close();
    InputSource is = new InputSource(getClass().getResource("da_UTF8.xml").toExternalForm());
    final HyphenationTree hyphenator = HyphenationCompoundWordTokenFilter.getHyphenationTree(is);
    Analyzer b = new Analyzer() {

        @Override
        protected TokenStreamComponents createComponents(String fieldName) {
            Tokenizer tokenizer = new KeywordTokenizer();
            TokenFilter filter = new HyphenationCompoundWordTokenFilter(tokenizer, hyphenator);
            return new TokenStreamComponents(tokenizer, filter);
        }
    };
    checkOneTerm(b, "", "");
    b.close();
}
Also used : CharArraySet(org.apache.lucene.analysis.CharArraySet) InputSource(org.xml.sax.InputSource) HyphenationTree(org.apache.lucene.analysis.compound.hyphenation.HyphenationTree) Analyzer(org.apache.lucene.analysis.Analyzer) KeywordTokenizer(org.apache.lucene.analysis.core.KeywordTokenizer) Tokenizer(org.apache.lucene.analysis.Tokenizer) MockTokenizer(org.apache.lucene.analysis.MockTokenizer) KeywordTokenizer(org.apache.lucene.analysis.core.KeywordTokenizer) TokenFilter(org.apache.lucene.analysis.TokenFilter)

Aggregations

HyphenationTree (org.apache.lucene.analysis.compound.hyphenation.HyphenationTree)6 InputSource (org.xml.sax.InputSource)5 CharArraySet (org.apache.lucene.analysis.CharArraySet)4 Analyzer (org.apache.lucene.analysis.Analyzer)2 MockTokenizer (org.apache.lucene.analysis.MockTokenizer)2 TokenFilter (org.apache.lucene.analysis.TokenFilter)2 Tokenizer (org.apache.lucene.analysis.Tokenizer)2 KeywordTokenizer (org.apache.lucene.analysis.core.KeywordTokenizer)2