Search in sources :

Example 11 with Treebank

use of edu.stanford.nlp.trees.Treebank in project CoreNLP by stanfordnlp.

the class ChineseMaxentLexicon method main.

public static void main(String[] args) {
    TreebankLangParserParams tlpParams = new ChineseTreebankParserParams();
    TreebankLanguagePack ctlp = tlpParams.treebankLanguagePack();
    Options op = new Options(tlpParams);
    TreeAnnotator ta = new TreeAnnotator(tlpParams.headFinder(), tlpParams, op);
    log.info("Reading Trees...");
    FileFilter trainFilter = new NumberRangesFileFilter(args[1], true);
    Treebank trainTreebank = tlpParams.memoryTreebank();
    trainTreebank.loadPath(args[0], trainFilter);
    log.info("Annotating trees...");
    Collection<Tree> trainTrees = new ArrayList<>();
    for (Tree tree : trainTreebank) {
        trainTrees.add(ta.transformTree(tree));
    }
    // saves memory
    trainTreebank = null;
    log.info("Training lexicon...");
    Index<String> wordIndex = new HashIndex<>();
    Index<String> tagIndex = new HashIndex<>();
    int featureLevel = DEFAULT_FEATURE_LEVEL;
    if (args.length > 3) {
        featureLevel = Integer.parseInt(args[3]);
    }
    ChineseMaxentLexicon lex = new ChineseMaxentLexicon(op, wordIndex, tagIndex, featureLevel);
    lex.initializeTraining(trainTrees.size());
    lex.train(trainTrees);
    lex.finishTraining();
    log.info("Testing");
    FileFilter testFilter = new NumberRangesFileFilter(args[2], true);
    Treebank testTreebank = tlpParams.memoryTreebank();
    testTreebank.loadPath(args[0], testFilter);
    List<TaggedWord> testWords = new ArrayList<>();
    for (Tree t : testTreebank) {
        for (TaggedWord tw : t.taggedYield()) {
            testWords.add(tw);
        }
    //testWords.addAll(t.taggedYield());
    }
    int[] totalAndCorrect = lex.testOnTreebank(testWords);
    log.info("done.");
    System.out.println(totalAndCorrect[1] + " correct out of " + totalAndCorrect[0] + " -- ACC: " + ((double) totalAndCorrect[1]) / totalAndCorrect[0]);
}
Also used : NumberRangesFileFilter(edu.stanford.nlp.io.NumberRangesFileFilter) Treebank(edu.stanford.nlp.trees.Treebank) TaggedWord(edu.stanford.nlp.ling.TaggedWord) Tree(edu.stanford.nlp.trees.Tree) TreebankLanguagePack(edu.stanford.nlp.trees.TreebankLanguagePack) NumberRangesFileFilter(edu.stanford.nlp.io.NumberRangesFileFilter)

Example 12 with Treebank

use of edu.stanford.nlp.trees.Treebank in project CoreNLP by stanfordnlp.

the class ShiftReduceParser method readBinarizedTreebank.

public List<Tree> readBinarizedTreebank(String treebankPath, FileFilter treebankFilter) {
    Treebank treebank = readTreebank(treebankPath, treebankFilter);
    List<Tree> binarized = binarizeTreebank(treebank, op);
    log.info("Converted trees to binarized format");
    return binarized;
}
Also used : Treebank(edu.stanford.nlp.trees.Treebank) EvaluateTreebank(edu.stanford.nlp.parser.lexparser.EvaluateTreebank) Tree(edu.stanford.nlp.trees.Tree)

Example 13 with Treebank

use of edu.stanford.nlp.trees.Treebank in project CoreNLP by stanfordnlp.

the class ShiftReduceParser method main.

public static void main(String[] args) {
    List<String> remainingArgs = Generics.newArrayList();
    List<Pair<String, FileFilter>> trainTreebankPath = null;
    Pair<String, FileFilter> testTreebankPath = null;
    Pair<String, FileFilter> devTreebankPath = null;
    String serializedPath = null;
    String tlppClass = null;
    String continueTraining = null;
    for (int argIndex = 0; argIndex < args.length; ) {
        if (args[argIndex].equalsIgnoreCase("-trainTreebank")) {
            if (trainTreebankPath == null) {
                trainTreebankPath = Generics.newArrayList();
            }
            trainTreebankPath.add(ArgUtils.getTreebankDescription(args, argIndex, "-trainTreebank"));
            argIndex = argIndex + ArgUtils.numSubArgs(args, argIndex) + 1;
        } else if (args[argIndex].equalsIgnoreCase("-testTreebank")) {
            testTreebankPath = ArgUtils.getTreebankDescription(args, argIndex, "-testTreebank");
            argIndex = argIndex + ArgUtils.numSubArgs(args, argIndex) + 1;
        } else if (args[argIndex].equalsIgnoreCase("-devTreebank")) {
            devTreebankPath = ArgUtils.getTreebankDescription(args, argIndex, "-devTreebank");
            argIndex = argIndex + ArgUtils.numSubArgs(args, argIndex) + 1;
        } else if (args[argIndex].equalsIgnoreCase("-serializedPath") || args[argIndex].equalsIgnoreCase("-model")) {
            serializedPath = args[argIndex + 1];
            argIndex += 2;
        } else if (args[argIndex].equalsIgnoreCase("-tlpp")) {
            tlppClass = args[argIndex + 1];
            argIndex += 2;
        } else if (args[argIndex].equalsIgnoreCase("-continueTraining")) {
            continueTraining = args[argIndex + 1];
            argIndex += 2;
        } else {
            remainingArgs.add(args[argIndex]);
            ++argIndex;
        }
    }
    String[] newArgs = new String[remainingArgs.size()];
    newArgs = remainingArgs.toArray(newArgs);
    if (trainTreebankPath == null && serializedPath == null) {
        throw new IllegalArgumentException("Must specify a treebank to train from with -trainTreebank or a parser to load with -serializedPath");
    }
    ShiftReduceParser parser = null;
    if (trainTreebankPath != null) {
        log.info("Training ShiftReduceParser");
        log.info("Initial arguments:");
        log.info("   " + StringUtils.join(args));
        if (continueTraining != null) {
            parser = ShiftReduceParser.loadModel(continueTraining, ArrayUtils.concatenate(FORCE_TAGS, newArgs));
        } else {
            ShiftReduceOptions op = buildTrainingOptions(tlppClass, newArgs);
            parser = new ShiftReduceParser(op);
        }
        parser.train(trainTreebankPath, devTreebankPath, serializedPath);
        parser.saveModel(serializedPath);
    }
    if (serializedPath != null && parser == null) {
        parser = ShiftReduceParser.loadModel(serializedPath, ArrayUtils.concatenate(FORCE_TAGS, newArgs));
    }
    if (testTreebankPath != null) {
        log.info("Loading test trees from " + testTreebankPath.first());
        Treebank testTreebank = parser.op.tlpParams.memoryTreebank();
        testTreebank.loadPath(testTreebankPath.first(), testTreebankPath.second());
        log.info("Loaded " + testTreebank.size() + " trees");
        EvaluateTreebank evaluator = new EvaluateTreebank(parser.op, null, parser);
        evaluator.testOnTreebank(testTreebank);
    // log.info("Input tree: " + tree);
    // log.info("Debinarized tree: " + query.getBestParse());
    // log.info("Parsed binarized tree: " + query.getBestBinarizedParse());
    // log.info("Predicted transition sequence: " + query.getBestTransitionSequence());
    }
}
Also used : Treebank(edu.stanford.nlp.trees.Treebank) EvaluateTreebank(edu.stanford.nlp.parser.lexparser.EvaluateTreebank) EvaluateTreebank(edu.stanford.nlp.parser.lexparser.EvaluateTreebank) FileFilter(java.io.FileFilter) Pair(edu.stanford.nlp.util.Pair)

Example 14 with Treebank

use of edu.stanford.nlp.trees.Treebank in project CoreNLP by stanfordnlp.

the class ShiftReduceParser method readTreebank.

public Treebank readTreebank(String treebankPath, FileFilter treebankFilter) {
    log.info("Loading trees from " + treebankPath);
    Treebank treebank = op.tlpParams.memoryTreebank();
    treebank.loadPath(treebankPath, treebankFilter);
    log.info("Read in " + treebank.size() + " trees from " + treebankPath);
    return treebank;
}
Also used : Treebank(edu.stanford.nlp.trees.Treebank) EvaluateTreebank(edu.stanford.nlp.parser.lexparser.EvaluateTreebank)

Example 15 with Treebank

use of edu.stanford.nlp.trees.Treebank in project CoreNLP by stanfordnlp.

the class ReorderingOracleTest method setUp.

public void setUp() {
    Options op = new Options();
    Treebank treebank = op.tlpParams.memoryTreebank();
    treebank.addAll(Arrays.asList(correctTrees));
    binarizedTrees = ShiftReduceParser.binarizeTreebank(treebank, op);
}
Also used : Options(edu.stanford.nlp.parser.lexparser.Options) Treebank(edu.stanford.nlp.trees.Treebank)

Aggregations

Treebank (edu.stanford.nlp.trees.Treebank)27 Tree (edu.stanford.nlp.trees.Tree)16 TreeTransformer (edu.stanford.nlp.trees.TreeTransformer)10 ArrayList (java.util.ArrayList)8 Language (edu.stanford.nlp.international.Language)7 EvaluateTreebank (edu.stanford.nlp.parser.lexparser.EvaluateTreebank)7 TreebankLangParserParams (edu.stanford.nlp.parser.lexparser.TreebankLangParserParams)7 Pair (edu.stanford.nlp.util.Pair)7 PrintWriter (java.io.PrintWriter)7 Label (edu.stanford.nlp.ling.Label)6 LexicalizedParser (edu.stanford.nlp.parser.lexparser.LexicalizedParser)6 FileFilter (java.io.FileFilter)6 Map (java.util.Map)4 CoreLabel (edu.stanford.nlp.ling.CoreLabel)3 EnglishTreebankParserParams (edu.stanford.nlp.parser.lexparser.EnglishTreebankParserParams)3 DiskTreebank (edu.stanford.nlp.trees.DiskTreebank)3 MemoryTreebank (edu.stanford.nlp.trees.MemoryTreebank)3 ArabicMorphoFeatureSpecification (edu.stanford.nlp.international.arabic.ArabicMorphoFeatureSpecification)2 FrenchMorphoFeatureSpecification (edu.stanford.nlp.international.french.FrenchMorphoFeatureSpecification)2 MorphoFeatureSpecification (edu.stanford.nlp.international.morph.MorphoFeatureSpecification)2