use of org.apache.lucene.analysis.ja.JapaneseTokenizer in project elasticsearch by elastic.
the class KuromojiAnalysisTests method testNumberFilterFactory.
public void testNumberFilterFactory() throws Exception {
TestAnalysis analysis = createTestAnalysis();
TokenFilterFactory tokenFilter = analysis.tokenFilter.get("kuromoji_number");
assertThat(tokenFilter, instanceOf(KuromojiNumberFilterFactory.class));
String source = "本日十万二千五百円のワインを買った";
String[] expected = new String[] { "本日", "102500", "円", "の", "ワイン", "を", "買っ", "た" };
Tokenizer tokenizer = new JapaneseTokenizer(null, true, JapaneseTokenizer.Mode.SEARCH);
tokenizer.setReader(new StringReader(source));
assertSimpleTSOutput(tokenFilter.create(tokenizer), expected);
}
use of org.apache.lucene.analysis.ja.JapaneseTokenizer in project elasticsearch by elastic.
the class KuromojiTokenizerFactory method create.
@Override
public Tokenizer create() {
JapaneseTokenizer t = new JapaneseTokenizer(userDictionary, discartPunctuation, mode);
int nBestCost = this.nBestCost;
if (nBestExamples != null) {
nBestCost = Math.max(nBestCost, t.calcNBestCost(nBestExamples));
}
t.setNBestCost(nBestCost);
return t;
}
Aggregations