Search in sources :

Example 1 with TokenizerModel

use of opennlp.tools.tokenize.TokenizerModel in project textdb by TextDB.

the class POSTagexample method Tokenize.

public static String[] Tokenize(String sentence) throws InvalidFormatException, IOException {
    InputStream is = new FileInputStream("./src/main/java/edu/uci/ics/textdb/sandbox/OpenNLPexample/en-token.bin");
    TokenizerModel model = new TokenizerModel(is);
    Tokenizer tokenizer = new TokenizerME(model);
    String[] tokens = tokenizer.tokenize(sentence);
    is.close();
    return tokens;
}
Also used : FileInputStream(java.io.FileInputStream) InputStream(java.io.InputStream) TokenizerME(opennlp.tools.tokenize.TokenizerME) TokenizerModel(opennlp.tools.tokenize.TokenizerModel) Tokenizer(opennlp.tools.tokenize.Tokenizer) FileInputStream(java.io.FileInputStream)

Example 2 with TokenizerModel

use of opennlp.tools.tokenize.TokenizerModel in project stanbol by apache.

the class OpenNLPTest method testLoadModelByName.

@Test
public void testLoadModelByName() throws IOException {
    TokenizerModel tokenModel = openNLP.getModel(TokenizerModel.class, "en-token.bin", null);
    Assert.assertNotNull(tokenModel);
    SentenceModel sentModel = openNLP.getModel(SentenceModel.class, "en-sent.bin", null);
    Assert.assertNotNull(sentModel);
    POSModel posModel = openNLP.getModel(POSModel.class, "en-pos-maxent.bin", null);
    Assert.assertNotNull(posModel);
    ChunkerModel chunkModel = openNLP.getModel(ChunkerModel.class, "en-chunker.bin", null);
    Assert.assertNotNull(chunkModel);
    TokenNameFinderModel nerModel = openNLP.getModel(TokenNameFinderModel.class, "en-ner-person.bin", null);
    Assert.assertNotNull(nerModel);
    //unavailable model
    tokenModel = openNLP.getModel(TokenizerModel.class, "ru-token.bin", null);
    Assert.assertNull(tokenModel);
}
Also used : TokenNameFinderModel(opennlp.tools.namefind.TokenNameFinderModel) ChunkerModel(opennlp.tools.chunker.ChunkerModel) SentenceModel(opennlp.tools.sentdetect.SentenceModel) POSModel(opennlp.tools.postag.POSModel) TokenizerModel(opennlp.tools.tokenize.TokenizerModel) Test(org.junit.Test)

Example 3 with TokenizerModel

use of opennlp.tools.tokenize.TokenizerModel in project stanbol by apache.

the class OpenNLPTest method testLoadMissingTokenizerModel.

@Test
public void testLoadMissingTokenizerModel() throws IOException {
    TokenizerModel model = openNLP.getTokenizerModel("ru");
    //there is not Russian model ...
    //so it is expected that the model is NULL
    Assert.assertNull(model);
}
Also used : TokenizerModel(opennlp.tools.tokenize.TokenizerModel) Test(org.junit.Test)

Example 4 with TokenizerModel

use of opennlp.tools.tokenize.TokenizerModel in project deeplearning4j by deeplearning4j.

the class ConcurrentTokenizer method initialize.

/**
     * Initializes the current instance with the given context.
     *
     * Note: Do all initialization in this method, do not use the constructor.
     */
public void initialize(UimaContext context) throws ResourceInitializationException {
    super.initialize(context);
    TokenizerModel model;
    try {
        TokenizerModelResource modelResource = (TokenizerModelResource) context.getResourceObject(UimaUtil.MODEL_PARAMETER);
        model = modelResource.getModel();
    } catch (ResourceAccessException e) {
        throw new ResourceInitializationException(e);
    }
    tokenizer = new TokenizerME(model);
}
Also used : ResourceInitializationException(org.apache.uima.resource.ResourceInitializationException) TokenizerModelResource(opennlp.uima.tokenize.TokenizerModelResource) TokenizerME(opennlp.tools.tokenize.TokenizerME) TokenizerModel(opennlp.tools.tokenize.TokenizerModel) ResourceAccessException(org.apache.uima.resource.ResourceAccessException)

Example 5 with TokenizerModel

use of opennlp.tools.tokenize.TokenizerModel in project textdb by TextDB.

the class NameFinderExample method Tokenize.

public static String[] Tokenize(String sentence) throws InvalidFormatException, IOException {
    InputStream is = new FileInputStream("./src/main/java/edu/uci/ics/textdb/sandbox/OpenNLPexample/en-token.bin");
    TokenizerModel model = new TokenizerModel(is);
    Tokenizer tokenizer = new TokenizerME(model);
    String[] tokens = tokenizer.tokenize(sentence);
    is.close();
    return tokens;
}
Also used : FileInputStream(java.io.FileInputStream) InputStream(java.io.InputStream) TokenizerME(opennlp.tools.tokenize.TokenizerME) TokenizerModel(opennlp.tools.tokenize.TokenizerModel) Tokenizer(opennlp.tools.tokenize.Tokenizer) FileInputStream(java.io.FileInputStream)

Aggregations

TokenizerModel (opennlp.tools.tokenize.TokenizerModel)7 Tokenizer (opennlp.tools.tokenize.Tokenizer)4 TokenizerME (opennlp.tools.tokenize.TokenizerME)4 Test (org.junit.Test)3 FileInputStream (java.io.FileInputStream)2 InputStream (java.io.InputStream)2 SimpleTokenizer (opennlp.tools.tokenize.SimpleTokenizer)2 IOException (java.io.IOException)1 ChunkerModel (opennlp.tools.chunker.ChunkerModel)1 TokenNameFinderModel (opennlp.tools.namefind.TokenNameFinderModel)1 POSModel (opennlp.tools.postag.POSModel)1 SentenceModel (opennlp.tools.sentdetect.SentenceModel)1 InvalidFormatException (opennlp.tools.util.InvalidFormatException)1 TokenizerModelResource (opennlp.uima.tokenize.TokenizerModelResource)1 ResourceAccessException (org.apache.uima.resource.ResourceAccessException)1 ResourceInitializationException (org.apache.uima.resource.ResourceInitializationException)1