Examples with TextAnnotation - edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation

Example 76 with TextAnnotation

use of edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation in project cogcomp-nlp by CogComp.

the class StanfordOpenIEHandler method addView.

@Override
protected void addView(TextAnnotation ta) throws AnnotatorException {
    Annotation document = new Annotation(ta.text);
    pipeline.annotate(document);
    SpanLabelView vu = new SpanLabelView(viewName, ta);
    for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
        Collection<RelationTriple> triples = sentence.get(NaturalLogicAnnotations.RelationTriplesAnnotation.class);
        for (RelationTriple triple : triples) {
            Constituent subject = getConstituent(triple.subjectGloss(), triple.subjectTokenSpan(), sentence, ta);
            subject.addAttribute("subjectGloss", triple.subjectGloss());
            subject.addAttribute("subjectLemmaGloss", triple.subjectLemmaGloss());
            subject.addAttribute("subjectLink", triple.subjectLink());
            Constituent object = getConstituent(triple.objectGloss(), triple.objectTokenSpan(), sentence, ta);
            object.addAttribute("objectGloss", triple.objectGloss());
            object.addAttribute("objectLemmaGloss", triple.objectLemmaGloss());
            object.addAttribute("objectLink", triple.objectLink());
            Constituent relation = getConstituent(triple.relationGloss(), triple.relationTokenSpan(), sentence, ta);
            relation.addAttribute("relationGloss", triple.relationGloss());
            relation.addAttribute("relationLemmaGloss", triple.relationLemmaGloss());
            Relation subj = new Relation("subj", relation, subject, triple.confidence);
            Relation obj = new Relation("obj", relation, object, triple.confidence);
            vu.addRelation(subj);
            vu.addRelation(obj);
            vu.addConstituent(subject);
            vu.addConstituent(object);
            vu.addConstituent(relation);
        }
    }
    ta.addView(viewName, vu);
}

Also used : Relation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.Relation) RelationTriple(edu.stanford.nlp.ie.util.RelationTriple) CoreAnnotations(edu.stanford.nlp.ling.CoreAnnotations) SpanLabelView(edu.illinois.cs.cogcomp.core.datastructures.textannotation.SpanLabelView) NaturalLogicAnnotations(edu.stanford.nlp.naturalli.NaturalLogicAnnotations) CoreMap(edu.stanford.nlp.util.CoreMap) TextAnnotation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation) Annotation(edu.stanford.nlp.pipeline.Annotation) Constituent(edu.illinois.cs.cogcomp.core.datastructures.textannotation.Constituent)

Example 77 with TextAnnotation

use of edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation in project cogcomp-nlp by CogComp.

the class StanfordTrueCaseHandler method addView.

@Override
protected void addView(TextAnnotation ta) throws AnnotatorException {
    Annotation document = new Annotation(ta.text);
    pipeline.annotate(document);
    TokenLabelView vu = new TokenLabelView(viewName, ta);
    for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
        for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
            String trueCase = token.get(CoreAnnotations.TrueCaseTextAnnotation.class);
            int beginCharOffsetS = token.beginPosition();
            int endCharOffset = token.endPosition() - 1;
            List<Constituent> overlappingCons = ta.getView(ViewNames.TOKENS).getConstituentsOverlappingCharSpan(beginCharOffsetS, endCharOffset);
            int endIndex = overlappingCons.stream().max(Comparator.comparing(Constituent::getEndSpan)).get().getEndSpan();
            Constituent c = new Constituent(trueCase, viewName, ta, endIndex - 1, endIndex);
            vu.addConstituent(c);
        }
    }
    ta.addView(viewName, vu);
}

Also used : CoreLabel(edu.stanford.nlp.ling.CoreLabel) TokenLabelView(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TokenLabelView) CoreAnnotations(edu.stanford.nlp.ling.CoreAnnotations) CoreMap(edu.stanford.nlp.util.CoreMap) TextAnnotation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation) Annotation(edu.stanford.nlp.pipeline.Annotation) Constituent(edu.illinois.cs.cogcomp.core.datastructures.textannotation.Constituent)

Example 78 with TextAnnotation

use of edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation in project cogcomp-nlp by CogComp.

the class ThaiTokenizer method main.

public static void main(String[] args) {
    String text = "สตาร์คราฟต์   เป็นวิดีโอเกมประเภทวางแผนเรียลไทม์และบันเทิงคดีวิทยาศาสตร์การทหาร พัฒนาและจัดจำหน่ายโดยบลิซซาร์ด เอ็นเตอร์เทนเมนต์ ออกบนระบบปฏิบัติการไมโครซอฟท์ วินโดวส์เมื่อวันที่ 31 มีนาคม 2541 ต่อมา เกมขยายเป็นแฟรนไชส์ และเป็นเกมแรกของซีรีส์สตาร์คราฟต์ รุ่นแมคโอเอสออกในเดือนมีนาคม 2542 และรุ่นดัดแปลงนินเทนโด 64 ซึ่งพัฒนาร่วมกับแมสมีเดีย ออกในวันที่ 13 มิถุนายน 2543 การพัฒนาเกมนี้เริ่มขึ้นไม่นานหลังวอร์คราฟต์ 2: ไทด์สออฟดาร์กเนส ออกในปี 2538 สตาร์คราฟต์เปิดตัวในงานอี3 ปี 2539 ซึ่งเป็นที่ชื่นชอบน้อยกว่าวอร์คราฟต์ 2 ฉะนั้น โครงการจึงถูกพลิกโฉมทั้งหมดแล้วแสดงต่อสาธารณะในต้นปี 2540 ซึ่งได้รับการตอบรับดีกว่ามาก";
    text = "    2507  การสืบสวนของคณะกรรมการสมาชิกผู้แทนราษฎรสหรัฐว่าด้วยการลอบสังหารประธานาธิบดี (hsca) ระหว่าง - พศ 2522  และการสืบสวนของรัฐบาล สรุปว่าประธานาธิบดีถูกลอบสังหารโดยลี ฮาร์วีย์ ออสวอลด์ ซึ่งในเวล\n";
    ThaiTokenizer token = new ThaiTokenizer();
    TextAnnotation ta = token.getTextAnnotation(text);
    for (Sentence sen : ta.sentences()) {
        System.out.println(sen.getTokenizedText());
    }
}

Also used : TextAnnotation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation) Sentence(edu.illinois.cs.cogcomp.core.datastructures.textannotation.Sentence)

Example 79 with TextAnnotation

use of edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation in project cogcomp-nlp by CogComp.

the class ThaiTokenizer method getTextAnnotation.

public TextAnnotation getTextAnnotation(String text) {
    List<IntPair> offsets = new ArrayList<>();
    List<String> surfaces = new ArrayList<>();
    List<Integer> sen_ends = new ArrayList<>();
    BreakIterator boundary = BreakIterator.getWordInstance(new Locale("th", "TH", "TH"));
    boundary.setText(text);
    int start = boundary.first();
    for (int end = boundary.next(); end != BreakIterator.DONE; start = end, end = boundary.next()) {
        //            System.out.println(start+" "+end+" "+text.length());
        String sur = text.substring(start, end);
        if (sur.trim().isEmpty()) {
            //                    sen_ends.add(surfaces.size());
            continue;
        }
        surfaces.add(sur);
        offsets.add(new IntPair(start, end));
    }
    if (surfaces.size() > 0 && (sen_ends.size() == 0 || sen_ends.get(sen_ends.size() - 1) != surfaces.size()))
        sen_ends.add(surfaces.size());
    IntPair[] offs = new IntPair[offsets.size()];
    offs = offsets.toArray(offs);
    String[] surfs = new String[surfaces.size()];
    surfs = surfaces.toArray(surfs);
    int[] ends = new int[sen_ends.size()];
    for (int i = 0; i < sen_ends.size(); i++) ends[i] = sen_ends.get(i);
    //        System.out.println(text);
    //        System.out.println(offsets);
    //        System.out.println(sen_ends);
    TextAnnotation ta = new TextAnnotation("", "", text, offs, surfs, ends);
    return ta;
}

Also used : Locale(java.util.Locale) ArrayList(java.util.ArrayList) IntPair(edu.illinois.cs.cogcomp.core.datastructures.IntPair) BreakIterator(java.text.BreakIterator) TextAnnotation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation)

Example 80 with TextAnnotation

use of edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation in project cogcomp-nlp by CogComp.

the class WhiteSpaceTokenizer method getTextAnnotation.

//    public TextAnnotation getTextAnnotation(String text){
//        text = text.replaceAll("\n", " ");
//        String[] sentences = text.split("\\.");
//        String new_text = "";
//        List<IntPair> offsets = new ArrayList<>();
//        List<String> surfaces = new ArrayList<>();
//        List<Integer> sen_ends = new ArrayList<>();
//        for(String sen: sentences){
//            String[] tokens = sen.trim().split("\\s+");
//            for(String token: tokens) {
//                offsets.add(new IntPair(text.length(), text.length()+token.length()));
//                surfaces.add(token);
//                new_text += token+" ";
//            }
//            sen_ends.add(offsets.size());
//        }
//
//        IntPair[] offs = new IntPair[offsets.size()];
//        offs = offsets.toArray(offs);
//        String[] surfs = new String[surfaces.size()];
//        surfs = surfaces.toArray(surfs);
//        int[] ends = new int[sen_ends.size()];
//        for(int i = 0; i < sen_ends.size(); i++)
//            ends[i] = sen_ends.get(i);
//
//
//        TextAnnotation ta = new TextAnnotation("", "", text, offs,
//                surfs, ends);
//        return ta;
//
//    }
public TextAnnotation getTextAnnotation(String text) {
    List<IntPair> offsets = new ArrayList<>();
    List<String> surfaces = new ArrayList<>();
    List<Integer> sen_ends = new ArrayList<>();
    String t = "";
    int t_start = -1;
    int i;
    for (i = 0; i < text.length(); i++) {
        String c = text.substring(i, i + 1);
        if (c.trim().isEmpty()) {
            if (!t.isEmpty()) {
                surfaces.add(t);
                offsets.add(new IntPair(t_start, i));
                t = "";
            }
        } else if (c.equals(".") || c.equals("\n")) {
            if (!t.isEmpty()) {
                surfaces.add(t);
                offsets.add(new IntPair(t_start, i));
            }
            surfaces.add(c);
            offsets.add(new IntPair(i, i + 1));
            t = "";
            sen_ends.add(surfaces.size());
        } else {
            if (t.isEmpty())
                t_start = i;
            t += c;
        }
    }
    if (!t.isEmpty()) {
        surfaces.add(t);
        offsets.add(new IntPair(t_start, i));
        sen_ends.add(surfaces.size());
    }
    if (sen_ends.size() == 0 || sen_ends.get(sen_ends.size() - 1) != surfaces.size()) {
        sen_ends.add(surfaces.size());
    }
    IntPair[] offs = new IntPair[offsets.size()];
    offs = offsets.toArray(offs);
    String[] surfs = new String[surfaces.size()];
    surfs = surfaces.toArray(surfs);
    int[] ends = new int[sen_ends.size()];
    for (i = 0; i < sen_ends.size(); i++) ends[i] = sen_ends.get(i);
    if (ends[ends.length - 1] != surfaces.size()) {
        System.out.println(ends[ends.length - 1]);
        System.out.println(surfaces.size());
        System.exit(-1);
    }
    if (offs.length == 0 || surfs.length == 0)
        return null;
    TextAnnotation ta = new TextAnnotation("", "", text, offs, surfs, ends);
    return ta;
}

Also used : ArrayList(java.util.ArrayList) TextAnnotation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation) IntPair(edu.illinois.cs.cogcomp.core.datastructures.IntPair)

Aggregations

TextAnnotation (edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation)218 Constituent (edu.illinois.cs.cogcomp.core.datastructures.textannotation.Constituent)95 Test (org.junit.Test)65 View (edu.illinois.cs.cogcomp.core.datastructures.textannotation.View)49 Feature (edu.illinois.cs.cogcomp.edison.features.Feature)48 AnnotatorException (edu.illinois.cs.cogcomp.annotation.AnnotatorException)29 DiscreteFeature (edu.illinois.cs.cogcomp.edison.features.DiscreteFeature)28 TreeView (edu.illinois.cs.cogcomp.core.datastructures.textannotation.TreeView)25 ArrayList (java.util.ArrayList)23 EdisonException (edu.illinois.cs.cogcomp.edison.utilities.EdisonException)22 LinkedHashSet (java.util.LinkedHashSet)21 IntPair (edu.illinois.cs.cogcomp.core.datastructures.IntPair)16 Relation (edu.illinois.cs.cogcomp.core.datastructures.textannotation.Relation)16 FeatureExtractor (edu.illinois.cs.cogcomp.edison.features.FeatureExtractor)16 ProjectedPath (edu.illinois.cs.cogcomp.edison.features.lrec.ProjectedPath)16 FeatureManifest (edu.illinois.cs.cogcomp.edison.features.manifest.FeatureManifest)16 FileInputStream (java.io.FileInputStream)16 TokenLabelView (edu.illinois.cs.cogcomp.core.datastructures.textannotation.TokenLabelView)14 SpanLabelView (edu.illinois.cs.cogcomp.core.datastructures.textannotation.SpanLabelView)12 PredicateArgumentView (edu.illinois.cs.cogcomp.core.datastructures.textannotation.PredicateArgumentView)11