Search in sources :

Example 6 with JapaneseTokenizer

use of org.apache.lucene.analysis.ja.JapaneseTokenizer in project elasticsearch by elastic.

the class KuromojiAnalysisTests method testNumberFilterFactory.

public void testNumberFilterFactory() throws Exception {
    TestAnalysis analysis = createTestAnalysis();
    TokenFilterFactory tokenFilter = analysis.tokenFilter.get("kuromoji_number");
    assertThat(tokenFilter, instanceOf(KuromojiNumberFilterFactory.class));
    String source = "本日十万二千五百円のワインを買った";
    String[] expected = new String[] { "本日", "102500", "円", "の", "ワイン", "を", "買っ", "た" };
    Tokenizer tokenizer = new JapaneseTokenizer(null, true, JapaneseTokenizer.Mode.SEARCH);
    tokenizer.setReader(new StringReader(source));
    assertSimpleTSOutput(tokenFilter.create(tokenizer), expected);
}
Also used : StringReader(java.io.StringReader) JapaneseTokenizer(org.apache.lucene.analysis.ja.JapaneseTokenizer) Tokenizer(org.apache.lucene.analysis.Tokenizer) JapaneseTokenizer(org.apache.lucene.analysis.ja.JapaneseTokenizer)

Example 7 with JapaneseTokenizer

use of org.apache.lucene.analysis.ja.JapaneseTokenizer in project elasticsearch by elastic.

the class KuromojiTokenizerFactory method create.

@Override
public Tokenizer create() {
    JapaneseTokenizer t = new JapaneseTokenizer(userDictionary, discartPunctuation, mode);
    int nBestCost = this.nBestCost;
    if (nBestExamples != null) {
        nBestCost = Math.max(nBestCost, t.calcNBestCost(nBestExamples));
    }
    t.setNBestCost(nBestCost);
    return t;
}
Also used : JapaneseTokenizer(org.apache.lucene.analysis.ja.JapaneseTokenizer)

Aggregations

JapaneseTokenizer (org.apache.lucene.analysis.ja.JapaneseTokenizer)7 StringReader (java.io.StringReader)6 Tokenizer (org.apache.lucene.analysis.Tokenizer)5 JapaneseAnalyzer (org.apache.lucene.analysis.ja.JapaneseAnalyzer)1 CharArraySet (org.apache.lucene.analysis.util.CharArraySet)1