Search in sources :

Example 6 with NameFinderME

use of opennlp.tools.namefind.NameFinderME in project textdb by TextDB.

the class NameFinderExample method main.

public static void main(String[] args) throws IOException {
    String dataFile = "./src/main/resources/abstract_100.txt";
    Scanner scan = new Scanner(new File(dataFile));
    InputStream is = new FileInputStream("./src/main/java/edu/uci/ics/textdb/sandbox/OpenNLPexample/en-ner-location.bin");
    TokenNameFinderModel model = new TokenNameFinderModel(is);
    is.close();
    NameFinderME nameFinder = new NameFinderME(model);
    int counter = 0;
    PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
    perfMon.start();
    while (scan.hasNextLine()) {
        String[] sentence = Tokenize(scan.nextLine());
        Span[] spans = nameFinder.find(sentence);
        perfMon.incrementCounter();
        //Print out the tokens of the sentence
        if (spans.length != 0) {
            for (String s : sentence) {
                System.out.print("[" + s + "] ");
            }
            System.out.println("/n");
        }
        //Print out the offset of each 
        for (Span s : spans) {
            System.out.println(s.toString());
            for (int i = s.getStart(); i < s.getEnd(); i++) {
                System.out.println(sentence[i]);
                counter++;
            }
        }
        if (spans.length != 0)
            System.out.println();
    }
    perfMon.stopAndPrintFinalResult();
    System.out.println("Number of Results: " + counter);
    scan.close();
}
Also used : Scanner(java.util.Scanner) TokenNameFinderModel(opennlp.tools.namefind.TokenNameFinderModel) FileInputStream(java.io.FileInputStream) InputStream(java.io.InputStream) NameFinderME(opennlp.tools.namefind.NameFinderME) PerformanceMonitor(opennlp.tools.cmdline.PerformanceMonitor) File(java.io.File) Span(opennlp.tools.util.Span) FileInputStream(java.io.FileInputStream)

Example 7 with NameFinderME

use of opennlp.tools.namefind.NameFinderME in project tika by apache.

the class GeoParser method initialize.

/**
     * Initializes this parser
     * @param modelUrl the URL to NER model
     */
public void initialize(URL modelUrl) {
    try {
        if (this.modelUrl != null && this.modelUrl.toURI().equals(modelUrl.toURI())) {
            return;
        }
    } catch (URISyntaxException e1) {
        throw new RuntimeException(e1.getMessage());
    }
    this.modelUrl = modelUrl;
    gazetteerClient = new GeoGazetteerClient(config);
    // Check if the NER model is available, and if the
    //  lucene-geo-gazetteer is available
    this.available = modelUrl != null && gazetteerClient.checkAvail();
    if (this.available) {
        try {
            TokenNameFinderModel model = new TokenNameFinderModel(modelUrl);
            this.nameFinder = new NameFinderME(model);
        } catch (Exception e) {
            LOG.warn("Named Entity Extractor setup failed: {}", e.getMessage(), e);
            this.available = false;
        }
    }
    initialized = true;
}
Also used : TokenNameFinderModel(opennlp.tools.namefind.TokenNameFinderModel) GeoGazetteerClient(org.apache.tika.parser.geo.topic.gazetteer.GeoGazetteerClient) NameFinderME(opennlp.tools.namefind.NameFinderME) URISyntaxException(java.net.URISyntaxException) URISyntaxException(java.net.URISyntaxException) IOException(java.io.IOException) TikaException(org.apache.tika.exception.TikaException) SAXException(org.xml.sax.SAXException)

Aggregations

NameFinderME (opennlp.tools.namefind.NameFinderME)7 Span (opennlp.tools.util.Span)5 TokenNameFinderModel (opennlp.tools.namefind.TokenNameFinderModel)4 File (java.io.File)3 FileInputStream (java.io.FileInputStream)3 InputStream (java.io.InputStream)2 ArrayList (java.util.ArrayList)2 LinkedHashMap (java.util.LinkedHashMap)2 List (java.util.List)2 Scanner (java.util.Scanner)2 PerformanceMonitor (opennlp.tools.cmdline.PerformanceMonitor)2 NerTag (org.apache.stanbol.enhancer.nlp.ner.NerTag)2 PooledTokenNameFinderModel (org.elasticsearch.service.opennlp.models.PooledTokenNameFinderModel)2 IOException (java.io.IOException)1 URISyntaxException (java.net.URISyntaxException)1 SentenceDetectorME (opennlp.tools.sentdetect.SentenceDetectorME)1 Tokenizer (opennlp.tools.tokenize.Tokenizer)1 Chunk (org.apache.stanbol.enhancer.nlp.model.Chunk)1 Section (org.apache.stanbol.enhancer.nlp.model.Section)1 Token (org.apache.stanbol.enhancer.nlp.model.Token)1