Search in sources :

Example 1 with WordTokenFactory

use of edu.stanford.nlp.process.WordTokenFactory in project uuusa by aghie.

the class Processor method process.

public List<SentimentDependencyGraph> process(String text) {
    // HashMap<String, String> emoLookupTable = new HashMap<String,String>();
    // for (String emoticon : emoticons){
    // System.out.println(emoticon);
    // String emouuid = UUID.randomUUID().toString();
    // text.replaceAll(emoticon, emouuid);
    // emoLookupTable.put(emouuid, emoticon);
    // }
    List<SentimentDependencyGraph> sdgs = new ArrayList<SentimentDependencyGraph>();
    DocumentPreprocessor dp = new DocumentPreprocessor(new StringReader(text.concat(" ")));
    dp.setTokenizerFactory(PTBTokenizer.factory(new WordTokenFactory(), "ptb3Escaping=false"));
    for (List<HasWord> sentence : dp) {
        List<String> words = sentence.stream().map(w -> w.toString()).collect(Collectors.toList());
        // System.out.println("text: "+text);
        List<String> tokens = this.tokenizer.tokenize(String.join(" ", words));
        // System.out.println("tokens: "+tokens);
        List<TaggedTokenInformation> ttis = this.tagger.tag(tokens);
        sdgs.add(this.parser.parse(ttis));
    }
    // this.parser.parse(ttis);
    return sdgs;
}
Also used : HasWord(edu.stanford.nlp.ling.HasWord) PTBTokenizer(edu.stanford.nlp.process.PTBTokenizer) TreeTokenizerFactory(edu.stanford.nlp.trees.TreeTokenizerFactory) HashMap(java.util.HashMap) LexedTokenFactory(edu.stanford.nlp.process.LexedTokenFactory) ParserI(org.grupolys.samulan.processor.parser.ParserI) ArrayList(java.util.ArrayList) TokenizeI(org.grupolys.samulan.processor.tokenizer.TokenizeI) Twokenize(cmu.arktweetnlp.Twokenize) CoreLabelTokenFactory(edu.stanford.nlp.process.CoreLabelTokenFactory) DocumentPreprocessor(edu.stanford.nlp.process.DocumentPreprocessor) TokenizerFactory(edu.stanford.nlp.process.TokenizerFactory) WordTokenFactory(edu.stanford.nlp.process.WordTokenFactory) HasWord(edu.stanford.nlp.ling.HasWord) WhitespaceTokenizerFactory(edu.stanford.nlp.process.WhitespaceTokenizer.WhitespaceTokenizerFactory) Set(java.util.Set) UUID(java.util.UUID) LexerTokenizer(edu.stanford.nlp.process.LexerTokenizer) MaxentTagger(edu.stanford.nlp.tagger.maxent.MaxentTagger) Collectors(java.util.stream.Collectors) List(java.util.List) TaggerI(org.grupolys.samulan.processor.tagger.TaggerI) Stream(java.util.stream.Stream) StringReader(java.io.StringReader) SentimentDependencyGraph(org.grupolys.samulan.util.SentimentDependencyGraph) TaggedTokenInformation(org.grupolys.samulan.util.TaggedTokenInformation) SentimentDependencyGraph(org.grupolys.samulan.util.SentimentDependencyGraph) ArrayList(java.util.ArrayList) WordTokenFactory(edu.stanford.nlp.process.WordTokenFactory) StringReader(java.io.StringReader) DocumentPreprocessor(edu.stanford.nlp.process.DocumentPreprocessor) TaggedTokenInformation(org.grupolys.samulan.util.TaggedTokenInformation)

Aggregations

Twokenize (cmu.arktweetnlp.Twokenize)1 HasWord (edu.stanford.nlp.ling.HasWord)1 CoreLabelTokenFactory (edu.stanford.nlp.process.CoreLabelTokenFactory)1 DocumentPreprocessor (edu.stanford.nlp.process.DocumentPreprocessor)1 LexedTokenFactory (edu.stanford.nlp.process.LexedTokenFactory)1 LexerTokenizer (edu.stanford.nlp.process.LexerTokenizer)1 PTBTokenizer (edu.stanford.nlp.process.PTBTokenizer)1 TokenizerFactory (edu.stanford.nlp.process.TokenizerFactory)1 WhitespaceTokenizerFactory (edu.stanford.nlp.process.WhitespaceTokenizer.WhitespaceTokenizerFactory)1 WordTokenFactory (edu.stanford.nlp.process.WordTokenFactory)1 MaxentTagger (edu.stanford.nlp.tagger.maxent.MaxentTagger)1 TreeTokenizerFactory (edu.stanford.nlp.trees.TreeTokenizerFactory)1 StringReader (java.io.StringReader)1 ArrayList (java.util.ArrayList)1 HashMap (java.util.HashMap)1 List (java.util.List)1 Set (java.util.Set)1 UUID (java.util.UUID)1 Collectors (java.util.stream.Collectors)1 Stream (java.util.stream.Stream)1