Search in sources :

Example 6 with FrenchTreeReaderFactory

use of edu.stanford.nlp.trees.international.french.FrenchTreeReaderFactory in project CoreNLP by stanfordnlp.

the class FTBCorrector method main.

/**
   * @param args
   */
public static void main(String[] args) {
    if (args.length != 1) {
        log.info("Usage: java " + FTBCorrector.class.getName() + " filename\n");
        System.exit(-1);
    }
    TreeTransformer tt = new FTBCorrector();
    File f = new File(args[0]);
    try {
        //These bad trees in the Candito training set should be thrown out:
        //  (ROOT (SENT (" ") (. .)))
        //  (ROOT (SENT (. .)))
        TregexPattern pBadTree = TregexPattern.compile("@SENT <: @PUNC");
        TregexPattern pBadTree2 = TregexPattern.compile("@SENT <1 @PUNC <2 @PUNC !<3 __");
        BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(f), "UTF-8"));
        TreeReaderFactory trf = new FrenchTreeReaderFactory();
        TreeReader tr = trf.newTreeReader(br);
        int nTrees = 0;
        for (Tree t; (t = tr.readTree()) != null; nTrees++) {
            TregexMatcher m = pBadTree.matcher(t);
            TregexMatcher m2 = pBadTree2.matcher(t);
            if (m.find() || m2.find()) {
                log.info("Discarding tree: " + t.toString());
            } else {
                Tree fixedT = tt.transformTree(t);
                System.out.println(fixedT.toString());
            }
        }
        tr.close();
        System.err.printf("Wrote %d trees%n", nTrees);
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    } catch (TregexParseException e) {
        e.printStackTrace();
    }
}
Also used : TregexParseException(edu.stanford.nlp.trees.tregex.TregexParseException) TregexPattern(edu.stanford.nlp.trees.tregex.TregexPattern) TreeReader(edu.stanford.nlp.trees.TreeReader) FrenchTreeReaderFactory(edu.stanford.nlp.trees.international.french.FrenchTreeReaderFactory) Tree(edu.stanford.nlp.trees.Tree) FrenchTreeReaderFactory(edu.stanford.nlp.trees.international.french.FrenchTreeReaderFactory) TreeReaderFactory(edu.stanford.nlp.trees.TreeReaderFactory) TregexMatcher(edu.stanford.nlp.trees.tregex.TregexMatcher) TreeTransformer(edu.stanford.nlp.trees.TreeTransformer)

Example 7 with FrenchTreeReaderFactory

use of edu.stanford.nlp.trees.international.french.FrenchTreeReaderFactory in project CoreNLP by stanfordnlp.

the class MWEPreprocessor method resolveDummyTags.

private static void resolveDummyTags(File treeFile, TwoDimensionalCounter<String, String> pretermLabel, TwoDimensionalCounter<String, String> unigramTagger) {
    try {
        BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(treeFile), "UTF-8"));
        TreeReaderFactory trf = new FrenchTreeReaderFactory();
        TreeReader tr = trf.newTreeReader(br);
        PrintWriter pw = new PrintWriter(new PrintStream(new FileOutputStream(new File(treeFile + ".fixed")), false, "UTF-8"));
        int nTrees = 0;
        for (Tree t; (t = tr.readTree()) != null; nTrees++) {
            traverseAndFix(t, pretermLabel, unigramTagger);
            pw.println(t.toString());
        }
        pw.close();
        tr.close();
        System.out.println("Processed " + nTrees + " trees");
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}
Also used : TreeReader(edu.stanford.nlp.trees.TreeReader) FrenchXMLTreeReader(edu.stanford.nlp.trees.international.french.FrenchXMLTreeReader) FrenchTreeReaderFactory(edu.stanford.nlp.trees.international.french.FrenchTreeReaderFactory) Tree(edu.stanford.nlp.trees.Tree) FrenchTreeReaderFactory(edu.stanford.nlp.trees.international.french.FrenchTreeReaderFactory) TreeReaderFactory(edu.stanford.nlp.trees.TreeReaderFactory)

Aggregations

TreeReaderFactory (edu.stanford.nlp.trees.TreeReaderFactory)7 FrenchTreeReaderFactory (edu.stanford.nlp.trees.international.french.FrenchTreeReaderFactory)7 Tree (edu.stanford.nlp.trees.Tree)6 TreeReader (edu.stanford.nlp.trees.TreeReader)6 CoreLabel (edu.stanford.nlp.ling.CoreLabel)3 BufferedReader (java.io.BufferedReader)3 FileInputStream (java.io.FileInputStream)3 FileNotFoundException (java.io.FileNotFoundException)3 IOException (java.io.IOException)3 InputStreamReader (java.io.InputStreamReader)3 UnsupportedEncodingException (java.io.UnsupportedEncodingException)3 Label (edu.stanford.nlp.ling.Label)2 TwoDimensionalCounter (edu.stanford.nlp.stats.TwoDimensionalCounter)2 FrenchXMLTreeReader (edu.stanford.nlp.trees.international.french.FrenchXMLTreeReader)2 TregexMatcher (edu.stanford.nlp.trees.tregex.TregexMatcher)2 TregexParseException (edu.stanford.nlp.trees.tregex.TregexParseException)2 TregexPattern (edu.stanford.nlp.trees.tregex.TregexPattern)2 LabeledScoredTreeReaderFactory (edu.stanford.nlp.trees.LabeledScoredTreeReaderFactory)1 PennTreeReaderFactory (edu.stanford.nlp.trees.PennTreeReaderFactory)1 StringLabeledScoredTreeReaderFactory (edu.stanford.nlp.trees.StringLabeledScoredTreeReaderFactory)1