use of opennlp.tools.tokenize.SimpleTokenizer in project stanbol by apache.
the class OpenNLP method getTokenizer.
/**
* Getter for the Tokenizer of a given language. This first tries to
* create an {@link TokenizerME} instance if the required
* {@link TokenizerModel} for the parsed language is available. if such a
* model is not available it returns the {@link SimpleTokenizer} instance.
* @param language the language or <code>null</code> to build a
* {@link SimpleTokenizer}
* @return the {@link Tokenizer} for the parsed language.
*/
public Tokenizer getTokenizer(String language) {
Tokenizer tokenizer = null;
if (language != null) {
try {
TokenizerModel model = getTokenizerModel(language);
if (model != null) {
tokenizer = new TokenizerME(model);
}
} catch (InvalidFormatException e) {
log.warn("Unable to load Tokenizer Model for " + language + ": " + "Will use Simple Tokenizer instead", e);
} catch (IOException e) {
log.warn("Unable to load Tokenizer Model for " + language + ": " + "Will use Simple Tokenizer instead", e);
}
}
if (tokenizer == null) {
log.debug("Use Simple Tokenizer for language {}", language);
tokenizer = SimpleTokenizer.INSTANCE;
} else {
log.debug("Use ME Tokenizer for language {}", language);
}
return tokenizer;
}
use of opennlp.tools.tokenize.SimpleTokenizer in project stanbol by apache.
the class OpenNLPTest method testFallbackToSimpleTokenizer.
@Test
public void testFallbackToSimpleTokenizer() throws IOException {
//however for the tokenizer it is expected that a fallback to the
//SimpleTokenizer is made
Tokenizer tokenizer = openNLP.getTokenizer("ru");
Assert.assertNotNull(tokenizer);
Assert.assertEquals(SimpleTokenizer.INSTANCE, tokenizer);
}
Aggregations