Search in sources :

Example 1 with SpanishTreeNormalizer

use of edu.stanford.nlp.trees.international.spanish.SpanishTreeNormalizer in project CoreNLP by stanfordnlp.

the class MultiWordPreprocessor method main.

/**
   *
   * @param args
   */
public static void main(String[] args) {
    Properties options = StringUtils.argsToProperties(args, argOptionDefs);
    if (!options.containsKey("") || options.containsKey("help")) {
        log.info(usage());
        return;
    }
    boolean retainNER = PropertiesUtils.getBool(options, "ner", false);
    boolean normalize = PropertiesUtils.getBool(options, "normalize", true);
    final File treeFile = new File(options.getProperty(""));
    TwoDimensionalCounter<String, String> labelTerm = new TwoDimensionalCounter<>();
    TwoDimensionalCounter<String, String> termLabel = new TwoDimensionalCounter<>();
    TwoDimensionalCounter<String, String> labelPreterm = new TwoDimensionalCounter<>();
    TwoDimensionalCounter<String, String> pretermLabel = new TwoDimensionalCounter<>();
    TwoDimensionalCounter<String, String> unigramTagger = new TwoDimensionalCounter<>();
    try {
        BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(treeFile), "UTF-8"));
        TreeReaderFactory trf = new SpanishTreeReaderFactory();
        TreeReader tr = trf.newTreeReader(br);
        for (Tree t; (t = tr.readTree()) != null; ) {
            updateTagger(unigramTagger, t);
        }
        //Closes the underlying reader
        tr.close();
        System.out.println("Resolving DUMMY tags");
        resolveDummyTags(treeFile, unigramTagger, retainNER, normalize ? new SpanishTreeNormalizer(true, false, false) : null);
        System.out.println("#Unknown Word Types: " + ManualUWModel.nUnknownWordTypes);
        System.out.println(String.format("#Missing POS: %d (fixed: %d, %.2f%%)", nMissingPOS, nFixedPOS, (double) nFixedPOS / nMissingPOS * 100));
        System.out.println(String.format("#Missing Phrasal: %d (fixed: %d, %.2f%%)", nMissingPhrasal, nFixedPhrasal, (double) nFixedPhrasal / nMissingPhrasal * 100));
        System.out.println("Done!");
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}
Also used : TwoDimensionalCounter(edu.stanford.nlp.stats.TwoDimensionalCounter) TreeReader(edu.stanford.nlp.trees.TreeReader) SpanishTreeReaderFactory(edu.stanford.nlp.trees.international.spanish.SpanishTreeReaderFactory) Tree(edu.stanford.nlp.trees.Tree) SpanishTreeReaderFactory(edu.stanford.nlp.trees.international.spanish.SpanishTreeReaderFactory) TreeReaderFactory(edu.stanford.nlp.trees.TreeReaderFactory) SpanishTreeNormalizer(edu.stanford.nlp.trees.international.spanish.SpanishTreeNormalizer)

Aggregations

TwoDimensionalCounter (edu.stanford.nlp.stats.TwoDimensionalCounter)1 Tree (edu.stanford.nlp.trees.Tree)1 TreeReader (edu.stanford.nlp.trees.TreeReader)1 TreeReaderFactory (edu.stanford.nlp.trees.TreeReaderFactory)1 SpanishTreeNormalizer (edu.stanford.nlp.trees.international.spanish.SpanishTreeNormalizer)1 SpanishTreeReaderFactory (edu.stanford.nlp.trees.international.spanish.SpanishTreeReaderFactory)1