Search in sources :

Example 1 with JapaneseTokenizerFactory

use of org.apache.lucene.analysis.ja.JapaneseTokenizerFactory in project stanbol by apache.

the class KuromojiNlpEngine method activate.

/**
 * Activate and read the properties. Configures and initialises a POSTagger for each language configured in
 * CONFIG_LANGUAGES.
 *
 * @param ce the {@link org.osgi.service.component.ComponentContext}
 */
@Activate
protected void activate(ComponentContext ce) throws ConfigurationException, IOException {
    log.info("activating smartcn tokenizing engine");
    super.activate(ce);
    // init the Solr ResourceLoader used for initialising the components
    // first a ResourceLoader for this classloader, 2nd one using the commons.solr.core classloader
    // and third the parentResourceLoader (if present).
    resourceLoader = new StanbolResourceLoader(KuromojiNlpEngine.class.getClassLoader(), new StanbolResourceLoader(parentResourceLoader));
    tokenizerFactory = new JapaneseTokenizerFactory(TOKENIZER_FACTORY_CONFIG);
    ((ResourceLoaderAware) tokenizerFactory).inform(resourceLoader);
    // base form filter
    TokenFilterFactory baseFormFilterFactory = new JapaneseBaseFormFilterFactory(BASE_FORM_FILTER_CONFIG);
    filterFactories.add(baseFormFilterFactory);
    // POS filter
    TokenFilterFactory posFilterFactory = new JapanesePartOfSpeechStopFilterFactory(POS_FILTER_CONFIG);
    ((ResourceLoaderAware) posFilterFactory).inform(resourceLoader);
    filterFactories.add(posFilterFactory);
    // Stemming
    TokenFilterFactory stemmFilterFactory = new JapaneseKatakanaStemFilterFactory(STEMM_FILTER_CONFIG);
    filterFactories.add(stemmFilterFactory);
}
Also used : StanbolResourceLoader(org.apache.stanbol.commons.solr.utils.StanbolResourceLoader) JapaneseTokenizerFactory(org.apache.lucene.analysis.ja.JapaneseTokenizerFactory) JapanesePartOfSpeechStopFilterFactory(org.apache.lucene.analysis.ja.JapanesePartOfSpeechStopFilterFactory) JapaneseKatakanaStemFilterFactory(org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilterFactory) ResourceLoaderAware(org.apache.lucene.analysis.util.ResourceLoaderAware) JapaneseBaseFormFilterFactory(org.apache.lucene.analysis.ja.JapaneseBaseFormFilterFactory) TokenFilterFactory(org.apache.lucene.analysis.util.TokenFilterFactory) Activate(org.apache.felix.scr.annotations.Activate)

Aggregations

Activate (org.apache.felix.scr.annotations.Activate)1 JapaneseBaseFormFilterFactory (org.apache.lucene.analysis.ja.JapaneseBaseFormFilterFactory)1 JapaneseKatakanaStemFilterFactory (org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilterFactory)1 JapanesePartOfSpeechStopFilterFactory (org.apache.lucene.analysis.ja.JapanesePartOfSpeechStopFilterFactory)1 JapaneseTokenizerFactory (org.apache.lucene.analysis.ja.JapaneseTokenizerFactory)1 ResourceLoaderAware (org.apache.lucene.analysis.util.ResourceLoaderAware)1 TokenFilterFactory (org.apache.lucene.analysis.util.TokenFilterFactory)1 StanbolResourceLoader (org.apache.stanbol.commons.solr.utils.StanbolResourceLoader)1