use of org.apache.lucene.analysis.util.CharArraySet in project omegat by omegat-org.
the class LucenePersianTokenizer method getTokenStream.
@SuppressWarnings("resource")
@Override
protected TokenStream getTokenStream(final String strOrig, final boolean stemsAllowed, final boolean stopWordsAllowed) throws IOException {
if (stemsAllowed) {
CharArraySet stopWords = stopWordsAllowed ? PersianAnalyzer.getDefaultStopSet() : CharArraySet.EMPTY_SET;
PersianAnalyzer analyzer = new PersianAnalyzer(stopWords);
return analyzer.tokenStream("", new StringReader(strOrig));
} else {
return getStandardTokenStream(strOrig);
}
}
use of org.apache.lucene.analysis.util.CharArraySet in project Vidyavana by borsosl.
the class QueryAnalyzer method createComponents.
@Override
protected TokenStreamComponents createComponents(String fieldName) {
Tokenizer tokenizer = new QueryTokenizer();
TokenStream filter = new SeparateQueryOperatorFilter(tokenizer);
filter = new StopFilter(filter, new CharArraySet(Arrays.asList("a", "az", "és"), false));
return new TokenStreamComponents(tokenizer, filter);
}
Aggregations