Search in sources :

Example 11 with CharArraySet

use of org.apache.lucene.analysis.util.CharArraySet in project omegat by omegat-org.

the class LucenePersianTokenizer method getTokenStream.

@SuppressWarnings("resource")
@Override
protected TokenStream getTokenStream(final String strOrig, final boolean stemsAllowed, final boolean stopWordsAllowed) throws IOException {
    if (stemsAllowed) {
        CharArraySet stopWords = stopWordsAllowed ? PersianAnalyzer.getDefaultStopSet() : CharArraySet.EMPTY_SET;
        PersianAnalyzer analyzer = new PersianAnalyzer(stopWords);
        return analyzer.tokenStream("", new StringReader(strOrig));
    } else {
        return getStandardTokenStream(strOrig);
    }
}
Also used : CharArraySet(org.apache.lucene.analysis.util.CharArraySet) StringReader(java.io.StringReader) PersianAnalyzer(org.apache.lucene.analysis.fa.PersianAnalyzer)

Example 12 with CharArraySet

use of org.apache.lucene.analysis.util.CharArraySet in project Vidyavana by borsosl.

the class QueryAnalyzer method createComponents.

@Override
protected TokenStreamComponents createComponents(String fieldName) {
    Tokenizer tokenizer = new QueryTokenizer();
    TokenStream filter = new SeparateQueryOperatorFilter(tokenizer);
    filter = new StopFilter(filter, new CharArraySet(Arrays.asList("a", "az", "és"), false));
    return new TokenStreamComponents(tokenizer, filter);
}
Also used : CharArraySet(org.apache.lucene.analysis.util.CharArraySet) TokenStream(org.apache.lucene.analysis.TokenStream) StopFilter(org.apache.lucene.analysis.core.StopFilter) Tokenizer(org.apache.lucene.analysis.Tokenizer)

Aggregations

CharArraySet (org.apache.lucene.analysis.util.CharArraySet)12 StringReader (java.io.StringReader)6 TokenStream (org.apache.lucene.analysis.TokenStream)5 Analyzer (org.apache.lucene.analysis.Analyzer)4 StandardAnalyzer (org.apache.lucene.analysis.standard.StandardAnalyzer)4 IOException (java.io.IOException)3 Tokenizer (org.apache.lucene.analysis.Tokenizer)3 StopFilter (org.apache.lucene.analysis.core.StopFilter)3 CharTermAttribute (org.apache.lucene.analysis.tokenattributes.CharTermAttribute)2 OException (com.orientechnologies.common.exception.OException)1 OIndexException (com.orientechnologies.orient.core.index.OIndexException)1 DataflowException (edu.uci.ics.texera.api.exception.DataflowException)1 TexeraException (edu.uci.ics.texera.api.exception.TexeraException)1 DataFlowException (edu.uci.ics.textdb.api.exception.DataFlowException)1 File (java.io.File)1 Constructor (java.lang.reflect.Constructor)1 InvocationTargetException (java.lang.reflect.InvocationTargetException)1 ArrayList (java.util.ArrayList)1 ArabicAnalyzer (org.apache.lucene.analysis.ar.ArabicAnalyzer)1 LowerCaseFilter (org.apache.lucene.analysis.core.LowerCaseFilter)1