Search in sources :

Example 1 with CharFilterFactory

use of org.apache.lucene.analysis.util.CharFilterFactory in project jackrabbit-oak by apache.

the class TokenizerChain method toString.

@Override
public String toString() {
    StringBuilder sb = new StringBuilder("TokenizerChain(");
    for (CharFilterFactory filter : charFilters) {
        sb.append(filter);
        sb.append(", ");
    }
    sb.append(tokenizer);
    for (TokenFilterFactory filter : filters) {
        sb.append(", ");
        sb.append(filter);
    }
    sb.append(')');
    return sb.toString();
}
Also used : CharFilterFactory(org.apache.lucene.analysis.util.CharFilterFactory) TokenFilterFactory(org.apache.lucene.analysis.util.TokenFilterFactory)

Example 2 with CharFilterFactory

use of org.apache.lucene.analysis.util.CharFilterFactory in project jackrabbit-oak by apache.

the class TokenizerChain method initReader.

@Override
public Reader initReader(String fieldName, Reader reader) {
    if (charFilters != null && charFilters.length > 0) {
        Reader cs = reader;
        for (CharFilterFactory charFilter : charFilters) {
            cs = charFilter.create(cs);
        }
        reader = cs;
    }
    return reader;
}
Also used : CharFilterFactory(org.apache.lucene.analysis.util.CharFilterFactory) Reader(java.io.Reader)

Example 3 with CharFilterFactory

use of org.apache.lucene.analysis.util.CharFilterFactory in project jackrabbit-oak by apache.

the class NodeStateAnalyzerFactory method composeAnalyzer.

private Analyzer composeAnalyzer(NodeState state) {
    TokenizerFactory tf = loadTokenizer(state.getChildNode(LuceneIndexConstants.ANL_TOKENIZER));
    CharFilterFactory[] cfs = loadCharFilterFactories(state.getChildNode(LuceneIndexConstants.ANL_CHAR_FILTERS));
    TokenFilterFactory[] tffs = loadTokenFilterFactories(state.getChildNode(LuceneIndexConstants.ANL_FILTERS));
    return new TokenizerChain(cfs, tf, tffs);
}
Also used : TokenizerFactory(org.apache.lucene.analysis.util.TokenizerFactory) TokenizerChain(org.apache.jackrabbit.oak.plugins.index.lucene.util.TokenizerChain) CharFilterFactory(org.apache.lucene.analysis.util.CharFilterFactory) TokenFilterFactory(org.apache.lucene.analysis.util.TokenFilterFactory)

Example 4 with CharFilterFactory

use of org.apache.lucene.analysis.util.CharFilterFactory in project lucene-solr by apache.

the class CustomAnalyzer method initReaderForNormalization.

@Override
protected Reader initReaderForNormalization(String fieldName, Reader reader) {
    for (CharFilterFactory charFilter : charFilters) {
        if (charFilter instanceof MultiTermAwareComponent) {
            charFilter = (CharFilterFactory) ((MultiTermAwareComponent) charFilter).getMultiTermComponent();
            reader = charFilter.create(reader);
        }
    }
    return reader;
}
Also used : MultiTermAwareComponent(org.apache.lucene.analysis.util.MultiTermAwareComponent) CharFilterFactory(org.apache.lucene.analysis.util.CharFilterFactory)

Example 5 with CharFilterFactory

use of org.apache.lucene.analysis.util.CharFilterFactory in project lucene-solr by apache.

the class TestFactories method doTestTokenizer.

private void doTestTokenizer(String tokenizer) throws IOException {
    Class<? extends TokenizerFactory> factoryClazz = TokenizerFactory.lookupClass(tokenizer);
    TokenizerFactory factory = (TokenizerFactory) initialize(factoryClazz);
    if (factory != null) {
        // if it implements MultiTermAware, sanity check its impl
        if (factory instanceof MultiTermAwareComponent) {
            AbstractAnalysisFactory mtc = ((MultiTermAwareComponent) factory).getMultiTermComponent();
            assertNotNull(mtc);
            // it's not ok to return e.g. a charfilter here: but a tokenizer could wrap a filter around it
            assertFalse(mtc instanceof CharFilterFactory);
        }
        // beast it just a little, it shouldnt throw exceptions:
        // (it should have thrown them in initialize)
        Analyzer a = new FactoryAnalyzer(factory, null, null);
        checkRandomData(random(), a, 20, 20, false, false);
        a.close();
    }
}
Also used : MultiTermAwareComponent(org.apache.lucene.analysis.util.MultiTermAwareComponent) TokenizerFactory(org.apache.lucene.analysis.util.TokenizerFactory) CharFilterFactory(org.apache.lucene.analysis.util.CharFilterFactory) AbstractAnalysisFactory(org.apache.lucene.analysis.util.AbstractAnalysisFactory) Analyzer(org.apache.lucene.analysis.Analyzer)

Aggregations

CharFilterFactory (org.apache.lucene.analysis.util.CharFilterFactory)26 TokenFilterFactory (org.apache.lucene.analysis.util.TokenFilterFactory)16 TokenizerFactory (org.apache.lucene.analysis.util.TokenizerFactory)12 Analyzer (org.apache.lucene.analysis.Analyzer)7 MultiTermAwareComponent (org.apache.lucene.analysis.util.MultiTermAwareComponent)6 TokenizerChain (org.apache.solr.analysis.TokenizerChain)5 Reader (java.io.Reader)4 ArrayList (java.util.ArrayList)4 HashMap (java.util.HashMap)4 AbstractAnalysisFactory (org.apache.lucene.analysis.util.AbstractAnalysisFactory)4 StringReader (java.io.StringReader)3 Map (java.util.Map)3 TokenStream (org.apache.lucene.analysis.TokenStream)3 Tokenizer (org.apache.lucene.analysis.Tokenizer)3 ResourceLoaderAware (org.apache.lucene.analysis.util.ResourceLoaderAware)3 SolrException (org.apache.solr.common.SolrException)3 JsonElement (com.google.gson.JsonElement)2 JsonObject (com.google.gson.JsonObject)2 IOException (java.io.IOException)2 KeywordAnalyzer (org.apache.lucene.analysis.core.KeywordAnalyzer)2