Examples with BasicDocument - edu.stanford.nlp.ling.BasicDocument

Example 1 with BasicDocument

use of edu.stanford.nlp.ling.BasicDocument in project CoreNLP by stanfordnlp.

the class WordToTaggedWordProcessor method main.

/**
   * This will print out some text, recognizing tags.  It can be used to
   * test tag breaking.  <br>  Usage: <code>
   * java edu.stanford.nlp.process.WordToTaggedWordProcessor fileOrUrl
   * </code>
   *
   * @param args Command line argument: a file or URL
   */
public static void main(String[] args) {
    if (args.length != 1) {
        System.out.println("usage: java edu.stanford.nlp.process.WordToTaggedWordProcessor fileOrUrl");
        System.exit(0);
    }
    String filename = args[0];
    try {
        Document<HasWord, Word, Word> d;
        if (filename.startsWith("http://")) {
            Document<HasWord, Word, Word> dpre = new BasicDocument<HasWord>().init(new URL(filename));
            DocumentProcessor<Word, Word, HasWord, Word> notags = new StripTagsProcessor<>();
            d = notags.processDocument(dpre);
        } else {
            d = new BasicDocument<HasWord>().init(new File(filename));
        }
        DocumentProcessor<Word, HasWord, HasWord, Word> proc = new WordToTaggedWordProcessor<>();
        Document<HasWord, Word, HasWord> sentd = proc.processDocument(d);
        // System.out.println(sentd);
        int i = 0;
        for (HasWord w : sentd) {
            System.out.println(i + ": " + w);
            i++;
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

Also used : HasWord(edu.stanford.nlp.ling.HasWord) Word(edu.stanford.nlp.ling.Word) HasWord(edu.stanford.nlp.ling.HasWord) TaggedWord(edu.stanford.nlp.ling.TaggedWord) BasicDocument(edu.stanford.nlp.ling.BasicDocument) URL(java.net.URL) File(java.io.File)

Example 2 with BasicDocument

use of edu.stanford.nlp.ling.BasicDocument in project CoreNLP by stanfordnlp.

the class PTBEscapingProcessor method main.

/**
   * This will do the escaping on an input file. Input file should already be tokenized,
   * with tokens separated by whitespace. <br>
   * Usage: java edu.stanford.nlp.process.PTBEscapingProcessor fileOrUrl
   *
   * @param args Command line argument: a file or URL
   */
public static void main(String[] args) {
    if (args.length != 1) {
        System.out.println("usage: java edu.stanford.nlp.process.PTBEscapingProcessor fileOrUrl");
        return;
    }
    String filename = args[0];
    try {
        // initialized below
        Document<String, Word, Word> d;
        if (filename.startsWith("http://")) {
            Document<String, Word, Word> dpre = new BasicDocument<String>(WhitespaceTokenizer.factory()).init(new URL(filename));
            DocumentProcessor<Word, Word, String, Word> notags = new StripTagsProcessor<>();
            d = notags.processDocument(dpre);
        } else {
            d = new BasicDocument<String>(WhitespaceTokenizer.factory()).init(new File(filename));
        }
        DocumentProcessor<Word, HasWord, String, Word> proc = new PTBEscapingProcessor<>();
        Document<String, Word, HasWord> newD = proc.processDocument(d);
        for (HasWord word : newD) {
            System.out.println(word);
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

Also used : HasWord(edu.stanford.nlp.ling.HasWord) Word(edu.stanford.nlp.ling.Word) HasWord(edu.stanford.nlp.ling.HasWord) BasicDocument(edu.stanford.nlp.ling.BasicDocument) URL(java.net.URL) File(java.io.File)

Aggregations

BasicDocument (edu.stanford.nlp.ling.BasicDocument)2 HasWord (edu.stanford.nlp.ling.HasWord)2 Word (edu.stanford.nlp.ling.Word)2 File (java.io.File)2 URL (java.net.URL)2 TaggedWord (edu.stanford.nlp.ling.TaggedWord)1