Examples with Tokenizer - edu.stanford.nlp.process.Tokenizer

Example 1 with Tokenizer

use of edu.stanford.nlp.process.Tokenizer in project lucida by claritylab.

the class StanfordParser method getPCFGScore.

/**
	 * Parses a sentence and returns the PCFG score as a confidence measure.
	 * 
	 * @param sentence a sentence
	 * @return PCFG score
	 */
@SuppressWarnings("unchecked")
public static double getPCFGScore(String sentence) {
    if (tlp == null || parser == null)
        throw new RuntimeException("Parser has not been initialized");
    // parse the sentence to produce PCFG score
    log.debug("Parsing sentence");
    double score;
    synchronized (parser) {
        Tokenizer tokenizer = tlp.getTokenizerFactory().getTokenizer(new StringReader(sentence));
        List<Word> words = tokenizer.tokenize();
        log.debug("Tokenization: " + words);
        parser.parse(new Sentence(words));
        score = parser.getPCFGScore();
    }
    return score;
}

Also used : Word(edu.stanford.nlp.ling.Word) StringReader(java.io.StringReader) Tokenizer(edu.stanford.nlp.process.Tokenizer) Sentence(edu.stanford.nlp.ling.Sentence)

Example 2 with Tokenizer

use of edu.stanford.nlp.process.Tokenizer in project lucida by claritylab.

the class StanfordParser method parse.

/**
     * Parses a sentence and returns a string representation of the parse tree.
     * 
     * @param sentence a sentence
     * @return Tree whose Label is a MapLabel containing correct begin and end
     * character offsets in keys BEGIN_KEY and END_KEY
     */
@SuppressWarnings("unchecked")
public static String parse(String sentence) {
    if (tlp == null || parser == null)
        throw new RuntimeException("Parser has not been initialized");
    // parse the sentence to produce stanford Tree
    log.debug("Parsing sentence");
    Tree tree = null;
    synchronized (parser) {
        Tokenizer tokenizer = tlp.getTokenizerFactory().getTokenizer(new StringReader(sentence));
        List<Word> words = tokenizer.tokenize();
        log.debug("Tokenization: " + words);
        parser.parse(new Sentence(words));
        tree = parser.getBestParse();
    }
    return tree.toString().replaceAll(" \\[[\\S]+\\]", "");
}

Also used : Word(edu.stanford.nlp.ling.Word) StringReader(java.io.StringReader) Tree(edu.stanford.nlp.trees.Tree) Tokenizer(edu.stanford.nlp.process.Tokenizer) Sentence(edu.stanford.nlp.ling.Sentence)

Example 3 with Tokenizer

use of edu.stanford.nlp.process.Tokenizer in project CoreNLP by stanfordnlp.

the class NegraPennTokenizer method main.

public static void main(String[] args) throws IOException {
    Reader in = new FileReader(args[0]);
    Tokenizer st = new NegraPennTokenizer(in);
    while (st.hasNext()) {
        String s = (String) st.next();
        System.out.println(s);
    }
}

Also used : FileReader(java.io.FileReader) Reader(java.io.Reader) FileReader(java.io.FileReader) Tokenizer(edu.stanford.nlp.process.Tokenizer) LexerTokenizer(edu.stanford.nlp.process.LexerTokenizer)

Aggregations

Tokenizer (edu.stanford.nlp.process.Tokenizer)3 Sentence (edu.stanford.nlp.ling.Sentence)2 Word (edu.stanford.nlp.ling.Word)2 StringReader (java.io.StringReader)2 LexerTokenizer (edu.stanford.nlp.process.LexerTokenizer)1 Tree (edu.stanford.nlp.trees.Tree)1 FileReader (java.io.FileReader)1 Reader (java.io.Reader)1