Examples with PTBTokenizer - edu.stanford.nlp.process.PTBTokenizer

Example 1 with PTBTokenizer

use of edu.stanford.nlp.process.PTBTokenizer in project CoreNLP by stanfordnlp.

the class SemanticGraphPrinter method main.

public static void main(String[] args) {
    Treebank tb = new MemoryTreebank();
    Properties props = StringUtils.argsToProperties(args);
    String treeFileName = props.getProperty("treeFile");
    String sentFileName = props.getProperty("sentFile");
    String testGraph = props.getProperty("testGraph");
    if (testGraph == null) {
        testGraph = "false";
    }
    String load = props.getProperty("load");
    String save = props.getProperty("save");
    if (load != null) {
        log.info("Load not implemented!");
        return;
    }
    if (sentFileName == null && treeFileName == null) {
        log.info("Usage: java SemanticGraph [-sentFile file|-treeFile file] [-testGraph]");
        Tree t = Tree.valueOf("(ROOT (S (NP (NP (DT An) (NN attempt)) (PP (IN on) (NP (NP (NNP Andres) (NNP Pastrana) (POS 's)) (NN life)))) (VP (VBD was) (VP (VBN carried) (PP (IN out) (S (VP (VBG using) (NP (DT a) (JJ powerful) (NN bomb))))))) (. .)))");
        tb.add(t);
    } else if (treeFileName != null) {
        tb.loadPath(treeFileName);
    } else {
        String[] options = { "-retainNPTmpSubcategories" };
        LexicalizedParser lp = LexicalizedParser.loadModel("/u/nlp/data/lexparser/englishPCFG.ser.gz", options);
        BufferedReader reader = null;
        try {
            reader = IOUtils.readerFromString(sentFileName);
        } catch (IOException e) {
            throw new RuntimeIOException("Cannot find or open " + sentFileName, e);
        }
        try {
            System.out.println("Processing sentence file " + sentFileName);
            for (String line; (line = reader.readLine()) != null; ) {
                System.out.println("Processing sentence: " + line);
                PTBTokenizer<Word> ptb = PTBTokenizer.newPTBTokenizer(new StringReader(line));
                List<Word> words = ptb.tokenize();
                Tree parseTree = lp.parseTree(words);
                tb.add(parseTree);
            }
            reader.close();
        } catch (Exception e) {
            throw new RuntimeException("Exception reading key file " + sentFileName, e);
        }
    }
    for (Tree t : tb) {
        SemanticGraph sg = SemanticGraphFactory.generateUncollapsedDependencies(t);
        System.out.println(sg.toString());
        System.out.println(sg.toCompactString());
        if (testGraph.equals("true")) {
            SemanticGraph g1 = SemanticGraphFactory.generateCollapsedDependencies(t);
            System.out.println("TEST SEMANTIC GRAPH - graph ----------------------------");
            System.out.println(g1.toString());
            System.out.println("readable ----------------------------");
            System.out.println(g1.toString(SemanticGraph.OutputFormat.READABLE));
            System.out.println("List of dependencies ----------------------------");
            System.out.println(g1.toList());
            System.out.println("xml ----------------------------");
            System.out.println(g1.toString(SemanticGraph.OutputFormat.XML));
            System.out.println("dot ----------------------------");
            System.out.println(g1.toDotFormat());
            System.out.println("dot (simple) ----------------------------");
            System.out.println(g1.toDotFormat("Simple", CoreLabel.OutputFormat.VALUE));
        // System.out.println(" graph ----------------------------");
        // System.out.println(t.allTypedDependenciesCCProcessed(false));
        }
    }
    if (save != null) {
        log.info("Save not implemented!");
    }
}

Also used : RuntimeIOException(edu.stanford.nlp.io.RuntimeIOException) Treebank(edu.stanford.nlp.trees.Treebank) MemoryTreebank(edu.stanford.nlp.trees.MemoryTreebank) LexicalizedParser(edu.stanford.nlp.parser.lexparser.LexicalizedParser) IOException(java.io.IOException) RuntimeIOException(edu.stanford.nlp.io.RuntimeIOException) Properties(java.util.Properties) IOException(java.io.IOException) FileNotFoundException(java.io.FileNotFoundException) RuntimeIOException(edu.stanford.nlp.io.RuntimeIOException) PTBTokenizer(edu.stanford.nlp.process.PTBTokenizer) BufferedReader(java.io.BufferedReader) StringReader(java.io.StringReader) Tree(edu.stanford.nlp.trees.Tree) List(java.util.List) MemoryTreebank(edu.stanford.nlp.trees.MemoryTreebank)

Example 2 with PTBTokenizer

use of edu.stanford.nlp.process.PTBTokenizer in project varaha by thedatachef.

the class StanfordTokenize method exec.

public DataBag exec(Tuple input) throws IOException {
    if (input == null || input.size() < 1 || input.isNull(0))
        return null;
    // Output bag
    DataBag bagOfTokens = bagFactory.newDefaultBag();
    StringReader textInput = new StringReader(input.get(0).toString());
    PTBTokenizer ptbt = new PTBTokenizer(textInput, new CoreLabelTokenFactory(), "");
    for (CoreLabel label; ptbt.hasNext(); ) {
        label = (CoreLabel) ptbt.next();
        Tuple termText = tupleFactory.newTuple(label.toString());
        bagOfTokens.add(termText);
    }
    return bagOfTokens;
}

Also used : PTBTokenizer(edu.stanford.nlp.process.PTBTokenizer) CoreLabel(edu.stanford.nlp.ling.CoreLabel) DataBag(org.apache.pig.data.DataBag) CoreLabelTokenFactory(edu.stanford.nlp.process.CoreLabelTokenFactory) StringReader(java.io.StringReader) Tuple(org.apache.pig.data.Tuple)

Aggregations

PTBTokenizer (edu.stanford.nlp.process.PTBTokenizer)2 StringReader (java.io.StringReader)2 RuntimeIOException (edu.stanford.nlp.io.RuntimeIOException)1 CoreLabel (edu.stanford.nlp.ling.CoreLabel)1 LexicalizedParser (edu.stanford.nlp.parser.lexparser.LexicalizedParser)1 CoreLabelTokenFactory (edu.stanford.nlp.process.CoreLabelTokenFactory)1 MemoryTreebank (edu.stanford.nlp.trees.MemoryTreebank)1 Tree (edu.stanford.nlp.trees.Tree)1 Treebank (edu.stanford.nlp.trees.Treebank)1 BufferedReader (java.io.BufferedReader)1 FileNotFoundException (java.io.FileNotFoundException)1 IOException (java.io.IOException)1 List (java.util.List)1 Properties (java.util.Properties)1 DataBag (org.apache.pig.data.DataBag)1 Tuple (org.apache.pig.data.Tuple)1