Search in sources :

Example 56 with HasWord

use of edu.stanford.nlp.ling.HasWord in project CoreNLP by stanfordnlp.

the class DocumentPreprocessorTest method compareXMLResults.

private static void compareXMLResults(String input, String element, String... expectedResults) {
    ArrayList<String> results = new ArrayList<>();
    DocumentPreprocessor document = new DocumentPreprocessor(new BufferedReader(new StringReader(input)), DocumentPreprocessor.DocType.XML);
    document.setElementDelimiter(element);
    for (List<HasWord> sentence : document) {
        results.add(SentenceUtils.listToString(sentence));
    }
    assertEquals(expectedResults.length, results.size());
    for (int i = 0; i < results.size(); ++i) {
        assertEquals(expectedResults[i], results.get(i));
    }
}
Also used : HasWord(edu.stanford.nlp.ling.HasWord) ArrayList(java.util.ArrayList) BufferedReader(java.io.BufferedReader) StringReader(java.io.StringReader)

Example 57 with HasWord

use of edu.stanford.nlp.ling.HasWord in project textdb by TextDB.

the class NlpSplitOperator method computeSentenceList.

private List<Span> computeSentenceList(Tuple inputTuple) {
    String inputText = inputTuple.<IField>getField(predicate.getInputAttributeName()).getValue().toString();
    Reader reader = new StringReader(inputText);
    DocumentPreprocessor documentPreprocessor = new DocumentPreprocessor(reader);
    List<Span> sentenceList = new ArrayList<Span>();
    int start = 0;
    int end = 0;
    String key = PropertyNameConstants.NLP_SPLIT_KEY;
    String attributeName = predicate.getInputAttributeName();
    for (List<HasWord> sentence : documentPreprocessor) {
        String sentenceText = Sentence.listToString(sentence);
        //Make span
        end = start + sentenceText.length();
        Span span = new Span(attributeName, start, end, key, sentenceText);
        sentenceList.add(span);
        start = end + 1;
    }
    return sentenceList;
}
Also used : HasWord(edu.stanford.nlp.ling.HasWord) StringReader(java.io.StringReader) ArrayList(java.util.ArrayList) Reader(java.io.Reader) StringReader(java.io.StringReader) IField(edu.uci.ics.textdb.api.field.IField) DocumentPreprocessor(edu.stanford.nlp.process.DocumentPreprocessor) Span(edu.uci.ics.textdb.api.span.Span)

Aggregations

HasWord (edu.stanford.nlp.ling.HasWord)57 CoreLabel (edu.stanford.nlp.ling.CoreLabel)17 TaggedWord (edu.stanford.nlp.ling.TaggedWord)15 ArrayList (java.util.ArrayList)14 HasTag (edu.stanford.nlp.ling.HasTag)13 Tree (edu.stanford.nlp.trees.Tree)13 DocumentPreprocessor (edu.stanford.nlp.process.DocumentPreprocessor)11 StringReader (java.io.StringReader)11 Label (edu.stanford.nlp.ling.Label)10 Word (edu.stanford.nlp.ling.Word)10 List (java.util.List)8 BufferedReader (java.io.BufferedReader)6 MaxentTagger (edu.stanford.nlp.tagger.maxent.MaxentTagger)5 File (java.io.File)5 PrintWriter (java.io.PrintWriter)5 ParserConstraint (edu.stanford.nlp.parser.common.ParserConstraint)4 Pair (edu.stanford.nlp.util.Pair)4 CoreAnnotations (edu.stanford.nlp.ling.CoreAnnotations)3 HasIndex (edu.stanford.nlp.ling.HasIndex)3 Sentence (edu.stanford.nlp.ling.Sentence)3