Search in sources :

Example 1 with SimpleTokenizer

use of opennlp.tools.tokenize.SimpleTokenizer in project stanbol by apache.

the class OpenNLP method getTokenizer.

/**
     * Getter for the Tokenizer of a given language. This first tries to
     * create an {@link TokenizerME} instance if the required 
     * {@link TokenizerModel} for the parsed language is available. if such a
     * model is not available it returns the {@link SimpleTokenizer} instance.
     * @param language the language or <code>null</code> to build a 
     * {@link SimpleTokenizer}
     * @return the {@link Tokenizer} for the parsed language.
     */
public Tokenizer getTokenizer(String language) {
    Tokenizer tokenizer = null;
    if (language != null) {
        try {
            TokenizerModel model = getTokenizerModel(language);
            if (model != null) {
                tokenizer = new TokenizerME(model);
            }
        } catch (InvalidFormatException e) {
            log.warn("Unable to load Tokenizer Model for " + language + ": " + "Will use Simple Tokenizer instead", e);
        } catch (IOException e) {
            log.warn("Unable to load Tokenizer Model for " + language + ": " + "Will use Simple Tokenizer instead", e);
        }
    }
    if (tokenizer == null) {
        log.debug("Use Simple Tokenizer for language {}", language);
        tokenizer = SimpleTokenizer.INSTANCE;
    } else {
        log.debug("Use ME Tokenizer for language {}", language);
    }
    return tokenizer;
}
Also used : TokenizerME(opennlp.tools.tokenize.TokenizerME) IOException(java.io.IOException) Tokenizer(opennlp.tools.tokenize.Tokenizer) SimpleTokenizer(opennlp.tools.tokenize.SimpleTokenizer) TokenizerModel(opennlp.tools.tokenize.TokenizerModel) InvalidFormatException(opennlp.tools.util.InvalidFormatException)

Example 2 with SimpleTokenizer

use of opennlp.tools.tokenize.SimpleTokenizer in project stanbol by apache.

the class OpenNLPTest method testFallbackToSimpleTokenizer.

@Test
public void testFallbackToSimpleTokenizer() throws IOException {
    //however for the tokenizer it is expected that a fallback to the
    //SimpleTokenizer is made
    Tokenizer tokenizer = openNLP.getTokenizer("ru");
    Assert.assertNotNull(tokenizer);
    Assert.assertEquals(SimpleTokenizer.INSTANCE, tokenizer);
}
Also used : Tokenizer(opennlp.tools.tokenize.Tokenizer) SimpleTokenizer(opennlp.tools.tokenize.SimpleTokenizer) Test(org.junit.Test)

Aggregations

SimpleTokenizer (opennlp.tools.tokenize.SimpleTokenizer)2 Tokenizer (opennlp.tools.tokenize.Tokenizer)2 IOException (java.io.IOException)1 TokenizerME (opennlp.tools.tokenize.TokenizerME)1 TokenizerModel (opennlp.tools.tokenize.TokenizerModel)1 InvalidFormatException (opennlp.tools.util.InvalidFormatException)1 Test (org.junit.Test)1