Search in sources :

Example 6 with TokenizerModel

use of opennlp.tools.tokenize.TokenizerModel in project textdb by TextDB.

the class NameFinderExample method Tokenize.

public static String[] Tokenize(String sentence) throws InvalidFormatException, IOException {
    InputStream is = new FileInputStream("./src/main/java/edu/uci/ics/textdb/sandbox/OpenNLPexample/en-token.bin");
    TokenizerModel model = new TokenizerModel(is);
    Tokenizer tokenizer = new TokenizerME(model);
    String[] tokens = tokenizer.tokenize(sentence);
    is.close();
    return tokens;
}
Also used : FileInputStream(java.io.FileInputStream) InputStream(java.io.InputStream) TokenizerME(opennlp.tools.tokenize.TokenizerME) TokenizerModel(opennlp.tools.tokenize.TokenizerModel) Tokenizer(opennlp.tools.tokenize.Tokenizer) FileInputStream(java.io.FileInputStream)

Example 7 with TokenizerModel

use of opennlp.tools.tokenize.TokenizerModel in project stanbol by apache.

the class OpenNLP method getTokenizer.

/**
     * Getter for the Tokenizer of a given language. This first tries to
     * create an {@link TokenizerME} instance if the required 
     * {@link TokenizerModel} for the parsed language is available. if such a
     * model is not available it returns the {@link SimpleTokenizer} instance.
     * @param language the language or <code>null</code> to build a 
     * {@link SimpleTokenizer}
     * @return the {@link Tokenizer} for the parsed language.
     */
public Tokenizer getTokenizer(String language) {
    Tokenizer tokenizer = null;
    if (language != null) {
        try {
            TokenizerModel model = getTokenizerModel(language);
            if (model != null) {
                tokenizer = new TokenizerME(model);
            }
        } catch (InvalidFormatException e) {
            log.warn("Unable to load Tokenizer Model for " + language + ": " + "Will use Simple Tokenizer instead", e);
        } catch (IOException e) {
            log.warn("Unable to load Tokenizer Model for " + language + ": " + "Will use Simple Tokenizer instead", e);
        }
    }
    if (tokenizer == null) {
        log.debug("Use Simple Tokenizer for language {}", language);
        tokenizer = SimpleTokenizer.INSTANCE;
    } else {
        log.debug("Use ME Tokenizer for language {}", language);
    }
    return tokenizer;
}
Also used : TokenizerME(opennlp.tools.tokenize.TokenizerME) IOException(java.io.IOException) Tokenizer(opennlp.tools.tokenize.Tokenizer) SimpleTokenizer(opennlp.tools.tokenize.SimpleTokenizer) TokenizerModel(opennlp.tools.tokenize.TokenizerModel) InvalidFormatException(opennlp.tools.util.InvalidFormatException)

Example 8 with TokenizerModel

use of opennlp.tools.tokenize.TokenizerModel in project stanbol by apache.

the class OpenNLPTest method testLoadEnTokenizer.

@Test
public void testLoadEnTokenizer() throws IOException {
    TokenizerModel model = openNLP.getTokenizerModel("en");
    Assert.assertNotNull(model);
    Tokenizer tokenizer = openNLP.getTokenizer("en");
    Assert.assertNotNull(tokenizer);
}
Also used : TokenizerModel(opennlp.tools.tokenize.TokenizerModel) Tokenizer(opennlp.tools.tokenize.Tokenizer) SimpleTokenizer(opennlp.tools.tokenize.SimpleTokenizer) Test(org.junit.Test)

Example 9 with TokenizerModel

use of opennlp.tools.tokenize.TokenizerModel in project textdb by TextDB.

the class POSTagexample method Tokenize.

public static String[] Tokenize(String sentence) throws InvalidFormatException, IOException {
    InputStream is = new FileInputStream("./src/main/java/edu/uci/ics/texera/sandbox/OpenNLPexample/en-token.bin");
    TokenizerModel model = new TokenizerModel(is);
    Tokenizer tokenizer = new TokenizerME(model);
    String[] tokens = tokenizer.tokenize(sentence);
    is.close();
    return tokens;
}
Also used : FileInputStream(java.io.FileInputStream) InputStream(java.io.InputStream) TokenizerME(opennlp.tools.tokenize.TokenizerME) TokenizerModel(opennlp.tools.tokenize.TokenizerModel) Tokenizer(opennlp.tools.tokenize.Tokenizer) FileInputStream(java.io.FileInputStream)

Aggregations

TokenizerModel (opennlp.tools.tokenize.TokenizerModel)9 Tokenizer (opennlp.tools.tokenize.Tokenizer)6 TokenizerME (opennlp.tools.tokenize.TokenizerME)6 FileInputStream (java.io.FileInputStream)4 InputStream (java.io.InputStream)4 Test (org.junit.Test)3 SimpleTokenizer (opennlp.tools.tokenize.SimpleTokenizer)2 IOException (java.io.IOException)1 ChunkerModel (opennlp.tools.chunker.ChunkerModel)1 TokenNameFinderModel (opennlp.tools.namefind.TokenNameFinderModel)1 POSModel (opennlp.tools.postag.POSModel)1 SentenceModel (opennlp.tools.sentdetect.SentenceModel)1 InvalidFormatException (opennlp.tools.util.InvalidFormatException)1 TokenizerModelResource (opennlp.uima.tokenize.TokenizerModelResource)1 ResourceAccessException (org.apache.uima.resource.ResourceAccessException)1 ResourceInitializationException (org.apache.uima.resource.ResourceInitializationException)1