Search in sources :

Example 1 with TaggedTokenInformation

use of org.grupolys.samulan.util.TaggedTokenInformation in project uuusa by aghie.

the class MaltParserWrapper method parse.

public SentimentDependencyGraph parse(List<TaggedTokenInformation> ttis) {
    SentimentDependencyGraph sdg = null;
    String[] tokens = new String[ttis.size()];
    // System.out.println("MaltParserWrapper parse");
    int i = 0;
    for (TaggedTokenInformation tti : ttis) {
        tokens[i] = tti.toConll();
        i += 1;
    }
    // Parses the Swedish sentence above
    String[] outputTokens;
    try {
        outputTokens = this.parser.parseTokens(tokens);
        sdg = new SentimentDependencyGraph(String.join("\n", outputTokens));
    // // Outputs the with the head index and dependency type information
    // for (int j = 0; j < outputTokens.length; j++) {
    // System.out.println(outputTokens[j]);
    // }
    } catch (MaltChainedException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    }
    // Terminates the parser model
    try {
        this.parser.terminateParserModel();
    } catch (MaltChainedException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    return sdg;
}
Also used : MaltChainedException(org.maltparser.core.exception.MaltChainedException) SentimentDependencyGraph(org.grupolys.samulan.util.SentimentDependencyGraph) TaggedTokenInformation(org.grupolys.samulan.util.TaggedTokenInformation)

Example 2 with TaggedTokenInformation

use of org.grupolys.samulan.util.TaggedTokenInformation in project uuusa by aghie.

the class MaxentStanfordTagger method tag.

// private String toConll(String taggedText){
// 
// }
@Override
public List<TaggedTokenInformation> tag(List<String> tokens) {
    // TODO Auto-generated method stub
    ArrayList<TaggedTokenInformation> ttis = new ArrayList<TaggedTokenInformation>();
    String tag, token;
    String tagged_text = this.tagger.tagTokenizedString(String.join(" ", tokens));
    short i = 1;
    for (String tag_token : tagged_text.split(" ")) {
        token = tag_token.substring(0, tag_token.lastIndexOf(STANFORD_SEPARATOR));
        tag = tag_token.substring(tag_token.lastIndexOf(STANFORD_SEPARATOR) + 1);
        ttis.add(new TaggedTokenInformation(i, token, null, tag, tag, null));
        i += 1;
    }
    return ttis;
}
Also used : ArrayList(java.util.ArrayList) TaggedTokenInformation(org.grupolys.samulan.util.TaggedTokenInformation)

Example 3 with TaggedTokenInformation

use of org.grupolys.samulan.util.TaggedTokenInformation in project uuusa by aghie.

the class Processor method process.

public List<SentimentDependencyGraph> process(String text) {
    // HashMap<String, String> emoLookupTable = new HashMap<String,String>();
    // for (String emoticon : emoticons){
    // System.out.println(emoticon);
    // String emouuid = UUID.randomUUID().toString();
    // text.replaceAll(emoticon, emouuid);
    // emoLookupTable.put(emouuid, emoticon);
    // }
    List<SentimentDependencyGraph> sdgs = new ArrayList<SentimentDependencyGraph>();
    DocumentPreprocessor dp = new DocumentPreprocessor(new StringReader(text.concat(" ")));
    dp.setTokenizerFactory(PTBTokenizer.factory(new WordTokenFactory(), "ptb3Escaping=false"));
    for (List<HasWord> sentence : dp) {
        List<String> words = sentence.stream().map(w -> w.toString()).collect(Collectors.toList());
        // System.out.println("text: "+text);
        List<String> tokens = this.tokenizer.tokenize(String.join(" ", words));
        // System.out.println("tokens: "+tokens);
        List<TaggedTokenInformation> ttis = this.tagger.tag(tokens);
        sdgs.add(this.parser.parse(ttis));
    }
    // this.parser.parse(ttis);
    return sdgs;
}
Also used : HasWord(edu.stanford.nlp.ling.HasWord) PTBTokenizer(edu.stanford.nlp.process.PTBTokenizer) TreeTokenizerFactory(edu.stanford.nlp.trees.TreeTokenizerFactory) HashMap(java.util.HashMap) LexedTokenFactory(edu.stanford.nlp.process.LexedTokenFactory) ParserI(org.grupolys.samulan.processor.parser.ParserI) ArrayList(java.util.ArrayList) TokenizeI(org.grupolys.samulan.processor.tokenizer.TokenizeI) Twokenize(cmu.arktweetnlp.Twokenize) CoreLabelTokenFactory(edu.stanford.nlp.process.CoreLabelTokenFactory) DocumentPreprocessor(edu.stanford.nlp.process.DocumentPreprocessor) TokenizerFactory(edu.stanford.nlp.process.TokenizerFactory) WordTokenFactory(edu.stanford.nlp.process.WordTokenFactory) HasWord(edu.stanford.nlp.ling.HasWord) WhitespaceTokenizerFactory(edu.stanford.nlp.process.WhitespaceTokenizer.WhitespaceTokenizerFactory) Set(java.util.Set) UUID(java.util.UUID) LexerTokenizer(edu.stanford.nlp.process.LexerTokenizer) MaxentTagger(edu.stanford.nlp.tagger.maxent.MaxentTagger) Collectors(java.util.stream.Collectors) List(java.util.List) TaggerI(org.grupolys.samulan.processor.tagger.TaggerI) Stream(java.util.stream.Stream) StringReader(java.io.StringReader) SentimentDependencyGraph(org.grupolys.samulan.util.SentimentDependencyGraph) TaggedTokenInformation(org.grupolys.samulan.util.TaggedTokenInformation) SentimentDependencyGraph(org.grupolys.samulan.util.SentimentDependencyGraph) ArrayList(java.util.ArrayList) WordTokenFactory(edu.stanford.nlp.process.WordTokenFactory) StringReader(java.io.StringReader) DocumentPreprocessor(edu.stanford.nlp.process.DocumentPreprocessor) TaggedTokenInformation(org.grupolys.samulan.util.TaggedTokenInformation)

Aggregations

TaggedTokenInformation (org.grupolys.samulan.util.TaggedTokenInformation)3 ArrayList (java.util.ArrayList)2 SentimentDependencyGraph (org.grupolys.samulan.util.SentimentDependencyGraph)2 Twokenize (cmu.arktweetnlp.Twokenize)1 HasWord (edu.stanford.nlp.ling.HasWord)1 CoreLabelTokenFactory (edu.stanford.nlp.process.CoreLabelTokenFactory)1 DocumentPreprocessor (edu.stanford.nlp.process.DocumentPreprocessor)1 LexedTokenFactory (edu.stanford.nlp.process.LexedTokenFactory)1 LexerTokenizer (edu.stanford.nlp.process.LexerTokenizer)1 PTBTokenizer (edu.stanford.nlp.process.PTBTokenizer)1 TokenizerFactory (edu.stanford.nlp.process.TokenizerFactory)1 WhitespaceTokenizerFactory (edu.stanford.nlp.process.WhitespaceTokenizer.WhitespaceTokenizerFactory)1 WordTokenFactory (edu.stanford.nlp.process.WordTokenFactory)1 MaxentTagger (edu.stanford.nlp.tagger.maxent.MaxentTagger)1 TreeTokenizerFactory (edu.stanford.nlp.trees.TreeTokenizerFactory)1 StringReader (java.io.StringReader)1 HashMap (java.util.HashMap)1 List (java.util.List)1 Set (java.util.Set)1 UUID (java.util.UUID)1