Search in sources :

Example 41 with Mention

use of edu.stanford.nlp.coref.data.Mention in project CoreNLP by stanfordnlp.

the class RuleBasedCorefMentionFinder method findMentions.

/** Main method of mention detection.
   *  Extract all NP, PRP or NE, and filter out by manually written patterns.
   */
@Override
public List<List<Mention>> findMentions(Annotation doc, Dictionaries dict, Properties props) {
    List<List<Mention>> predictedMentions = new ArrayList<>();
    Set<String> neStrings = Generics.newHashSet();
    List<Set<IntPair>> mentionSpanSetList = Generics.newArrayList();
    List<CoreMap> sentences = doc.get(CoreAnnotations.SentencesAnnotation.class);
    // extract premarked mentions, NP/PRP, named entity, enumerations
    for (CoreMap s : sentences) {
        List<Mention> mentions = new ArrayList<>();
        predictedMentions.add(mentions);
        Set<IntPair> mentionSpanSet = Generics.newHashSet();
        Set<IntPair> namedEntitySpanSet = Generics.newHashSet();
        extractPremarkedEntityMentions(s, mentions, mentionSpanSet, namedEntitySpanSet);
        extractNamedEntityMentions(s, mentions, mentionSpanSet, namedEntitySpanSet);
        extractNPorPRP(s, mentions, mentionSpanSet, namedEntitySpanSet);
        extractEnumerations(s, mentions, mentionSpanSet, namedEntitySpanSet);
        addNamedEntityStrings(s, neStrings, namedEntitySpanSet);
        mentionSpanSetList.add(mentionSpanSet);
    }
    if (lang == Locale.CHINESE && CorefProperties.liberalChineseMD(props)) {
        extractNamedEntityModifiers(sentences, mentionSpanSetList, predictedMentions, neStrings);
    }
    // find head
    for (int i = 0, sz = sentences.size(); i < sz; i++) {
        findHead(sentences.get(i), predictedMentions.get(i));
        setBarePlural(predictedMentions.get(i));
    }
    // mention selection based on document-wise info
    if (lang == Locale.ENGLISH) {
        removeSpuriousMentionsEn(doc, predictedMentions, dict);
    } else if (lang == Locale.CHINESE) {
        if (CorefProperties.liberalChineseMD(props)) {
            removeSpuriousMentionsZhSimple(doc, predictedMentions, dict);
        } else {
            removeSpuriousMentionsZh(doc, predictedMentions, dict, CorefProperties.removeNestedMentions(props));
        }
    }
    return predictedMentions;
}
Also used : Set(java.util.Set) HashSet(java.util.HashSet) ArrayList(java.util.ArrayList) IntPair(edu.stanford.nlp.util.IntPair) Mention(edu.stanford.nlp.coref.data.Mention) TreeCoreAnnotations(edu.stanford.nlp.trees.TreeCoreAnnotations) CoreAnnotations(edu.stanford.nlp.ling.CoreAnnotations) SemanticGraphCoreAnnotations(edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations) ArrayList(java.util.ArrayList) List(java.util.List) CoreMap(edu.stanford.nlp.util.CoreMap)

Example 42 with Mention

use of edu.stanford.nlp.coref.data.Mention in project CoreNLP by stanfordnlp.

the class RuleBasedCorefMentionFinder method removeSpuriousMentionsZhSimple.

private static void removeSpuriousMentionsZhSimple(Annotation doc, List<List<Mention>> predictedMentions, Dictionaries dict) {
    for (int i = 0; i < predictedMentions.size(); i++) {
        List<Mention> mentions = predictedMentions.get(i);
        Set<Mention> remove = Generics.newHashSet();
        for (Mention m : mentions) {
            if (m.originalSpan.size() == 1 && m.headWord.tag().equals("CD")) {
                remove.add(m);
            }
            if (m.spanToString().contains("quot")) {
                remove.add(m);
            }
        }
        mentions.removeAll(remove);
    }
}
Also used : Mention(edu.stanford.nlp.coref.data.Mention)

Example 43 with Mention

use of edu.stanford.nlp.coref.data.Mention in project CoreNLP by stanfordnlp.

the class MentionDetectionEvaluator method process.

@Override
public void process(int id, Document document) {
    for (CorefCluster gold : document.goldCorefClusters.values()) {
        for (Mention m : gold.corefMentions) {
            if (document.predictedMentionsByID.containsKey(m.mentionID)) {
                correctSystemMentions += 1;
            }
            goldMentions += 1;
        }
    }
    systemMentions += document.predictedMentionsByID.size();
    double precision = correctSystemMentions / (double) systemMentions;
    double recall = correctSystemMentions / (double) goldMentions;
    log.info("Precision: " + correctSystemMentions + " / " + systemMentions + " = " + String.format("%.4f", precision));
    log.info("Recall: " + correctSystemMentions + " / " + goldMentions + " = " + String.format("%.4f", recall));
    log.info(String.format("F1: %.4f", 2 * precision * recall / (precision + recall)));
}
Also used : CorefCluster(edu.stanford.nlp.coref.data.CorefCluster) Mention(edu.stanford.nlp.coref.data.Mention)

Example 44 with Mention

use of edu.stanford.nlp.coref.data.Mention in project CoreNLP by stanfordnlp.

the class ProtobufAnnotationSerializer method toProtoBuilder.

/**
   * <p>
   *   The method to extend by subclasses of the Protobuf Annotator if custom additions are added to Tokens.
   *   In contrast to {@link ProtobufAnnotationSerializer#toProto(edu.stanford.nlp.ling.CoreLabel)}, this function
   *   returns a builder that can be extended.
   * </p>
   *
   * @param sentence The sentence to save to a protocol buffer
   * @param keysToSerialize A set tracking which keys have been saved. It's important to remove any keys added to the proto
   *                        from this set, as the code tracks annotations to ensure lossless serialization.
   */
@SuppressWarnings("deprecation")
protected CoreNLPProtos.Sentence.Builder toProtoBuilder(CoreMap sentence, Set<Class<?>> keysToSerialize) {
    // Error checks
    if (sentence instanceof CoreLabel) {
        throw new IllegalArgumentException("CoreMap is actually a CoreLabel");
    }
    CoreNLPProtos.Sentence.Builder builder = CoreNLPProtos.Sentence.newBuilder();
    // Remove items serialized elsewhere from the required list
    keysToSerialize.remove(TextAnnotation.class);
    keysToSerialize.remove(NumerizedTokensAnnotation.class);
    // Required fields
    builder.setTokenOffsetBegin(getAndRegister(sentence, keysToSerialize, TokenBeginAnnotation.class));
    builder.setTokenOffsetEnd(getAndRegister(sentence, keysToSerialize, TokenEndAnnotation.class));
    // Get key set of CoreMap
    Set<Class<?>> keySet;
    if (sentence instanceof ArrayCoreMap) {
        keySet = ((ArrayCoreMap) sentence).keySetNotNull();
    } else {
        keySet = new IdentityHashSet<>(sentence.keySet());
    }
    // Tokens
    if (sentence.containsKey(TokensAnnotation.class)) {
        for (CoreLabel tok : sentence.get(TokensAnnotation.class)) {
            builder.addToken(toProto(tok));
        }
        keysToSerialize.remove(TokensAnnotation.class);
    }
    // Characters
    if (sentence.containsKey(SegmenterCoreAnnotations.CharactersAnnotation.class)) {
        for (CoreLabel c : sentence.get(SegmenterCoreAnnotations.CharactersAnnotation.class)) {
            builder.addCharacter(toProto(c));
        }
        keysToSerialize.remove(SegmenterCoreAnnotations.CharactersAnnotation.class);
    }
    // Optional fields
    if (keySet.contains(SentenceIndexAnnotation.class)) {
        builder.setSentenceIndex(getAndRegister(sentence, keysToSerialize, SentenceIndexAnnotation.class));
    }
    if (keySet.contains(CharacterOffsetBeginAnnotation.class)) {
        builder.setCharacterOffsetBegin(getAndRegister(sentence, keysToSerialize, CharacterOffsetBeginAnnotation.class));
    }
    if (keySet.contains(CharacterOffsetEndAnnotation.class)) {
        builder.setCharacterOffsetEnd(getAndRegister(sentence, keysToSerialize, CharacterOffsetEndAnnotation.class));
    }
    if (keySet.contains(TreeAnnotation.class)) {
        builder.setParseTree(toProto(getAndRegister(sentence, keysToSerialize, TreeAnnotation.class)));
    }
    if (keySet.contains(BinarizedTreeAnnotation.class)) {
        builder.setBinarizedParseTree(toProto(getAndRegister(sentence, keysToSerialize, BinarizedTreeAnnotation.class)));
    }
    if (keySet.contains(KBestTreesAnnotation.class)) {
        for (Tree tree : sentence.get(KBestTreesAnnotation.class)) {
            builder.addKBestParseTrees(toProto(tree));
            keysToSerialize.remove(KBestTreesAnnotation.class);
        }
    }
    if (keySet.contains(SentimentCoreAnnotations.SentimentAnnotatedTree.class)) {
        builder.setAnnotatedParseTree(toProto(getAndRegister(sentence, keysToSerialize, SentimentCoreAnnotations.SentimentAnnotatedTree.class)));
    }
    if (keySet.contains(SentimentCoreAnnotations.SentimentClass.class)) {
        builder.setSentiment(getAndRegister(sentence, keysToSerialize, SentimentCoreAnnotations.SentimentClass.class));
    }
    if (keySet.contains(BasicDependenciesAnnotation.class)) {
        builder.setBasicDependencies(toProto(getAndRegister(sentence, keysToSerialize, BasicDependenciesAnnotation.class)));
    }
    if (keySet.contains(CollapsedDependenciesAnnotation.class)) {
        builder.setCollapsedDependencies(toProto(getAndRegister(sentence, keysToSerialize, CollapsedDependenciesAnnotation.class)));
    }
    if (keySet.contains(CollapsedCCProcessedDependenciesAnnotation.class)) {
        builder.setCollapsedCCProcessedDependencies(toProto(getAndRegister(sentence, keysToSerialize, CollapsedCCProcessedDependenciesAnnotation.class)));
    }
    if (keySet.contains(AlternativeDependenciesAnnotation.class)) {
        builder.setAlternativeDependencies(toProto(getAndRegister(sentence, keysToSerialize, AlternativeDependenciesAnnotation.class)));
    }
    if (keySet.contains(EnhancedDependenciesAnnotation.class)) {
        builder.setEnhancedDependencies(toProto(getAndRegister(sentence, keysToSerialize, EnhancedDependenciesAnnotation.class)));
    }
    if (keySet.contains(EnhancedPlusPlusDependenciesAnnotation.class)) {
        builder.setEnhancedPlusPlusDependencies(toProto(getAndRegister(sentence, keysToSerialize, EnhancedPlusPlusDependenciesAnnotation.class)));
    }
    if (keySet.contains(TokensAnnotation.class) && getAndRegister(sentence, keysToSerialize, TokensAnnotation.class).size() > 0 && getAndRegister(sentence, keysToSerialize, TokensAnnotation.class).get(0).containsKey(ParagraphAnnotation.class)) {
        builder.setParagraph(getAndRegister(sentence, keysToSerialize, TokensAnnotation.class).get(0).get(ParagraphAnnotation.class));
    }
    if (keySet.contains(NumerizedTokensAnnotation.class)) {
        builder.setHasNumerizedTokensAnnotation(true);
    } else {
        builder.setHasNumerizedTokensAnnotation(false);
    }
    if (keySet.contains(NaturalLogicAnnotations.EntailedSentencesAnnotation.class)) {
        for (SentenceFragment entailedSentence : getAndRegister(sentence, keysToSerialize, NaturalLogicAnnotations.EntailedSentencesAnnotation.class)) {
            builder.addEntailedSentence(toProto(entailedSentence));
        }
    }
    if (keySet.contains(NaturalLogicAnnotations.EntailedClausesAnnotation.class)) {
        for (SentenceFragment entailedClause : getAndRegister(sentence, keysToSerialize, NaturalLogicAnnotations.EntailedClausesAnnotation.class)) {
            builder.addEntailedClause(toProto(entailedClause));
        }
    }
    if (keySet.contains(NaturalLogicAnnotations.RelationTriplesAnnotation.class)) {
        for (RelationTriple triple : getAndRegister(sentence, keysToSerialize, NaturalLogicAnnotations.RelationTriplesAnnotation.class)) {
            builder.addOpenieTriple(toProto(triple));
        }
    }
    if (keySet.contains(KBPTriplesAnnotation.class)) {
        for (RelationTriple triple : getAndRegister(sentence, keysToSerialize, KBPTriplesAnnotation.class)) {
            builder.addKbpTriple(toProto(triple));
        }
    }
    // Non-default annotators
    if (keySet.contains(EntityMentionsAnnotation.class)) {
        builder.setHasRelationAnnotations(true);
        for (EntityMention entity : getAndRegister(sentence, keysToSerialize, EntityMentionsAnnotation.class)) {
            builder.addEntity(toProto(entity));
        }
    } else {
        builder.setHasRelationAnnotations(false);
    }
    if (keySet.contains(RelationMentionsAnnotation.class)) {
        if (!builder.getHasRelationAnnotations()) {
            throw new IllegalStateException("Registered entity mentions without relation mentions");
        }
        for (RelationMention relation : getAndRegister(sentence, keysToSerialize, RelationMentionsAnnotation.class)) {
            builder.addRelation(toProto(relation));
        }
    }
    // add each of the mentions in the List<Mentions> for this sentence
    if (keySet.contains(CorefMentionsAnnotation.class)) {
        builder.setHasCorefMentionsAnnotation(true);
        for (Mention m : sentence.get(CorefMentionsAnnotation.class)) {
            builder.addMentionsForCoref(toProto(m));
        }
        keysToSerialize.remove(CorefMentionsAnnotation.class);
    }
    // Entity mentions
    if (keySet.contains(MentionsAnnotation.class)) {
        for (CoreMap mention : sentence.get(MentionsAnnotation.class)) {
            builder.addMentions(toProtoMention(mention));
        }
        keysToSerialize.remove(MentionsAnnotation.class);
    }
    // add a sentence id if it exists
    if (keySet.contains(SentenceIDAnnotation.class))
        builder.setSentenceID(getAndRegister(sentence, keysToSerialize, SentenceIDAnnotation.class));
    // Return
    return builder;
}
Also used : RelationMention(edu.stanford.nlp.ie.machinereading.structure.RelationMention) EntityMention(edu.stanford.nlp.ie.machinereading.structure.EntityMention) RelationTriple(edu.stanford.nlp.ie.util.RelationTriple) RelationMention(edu.stanford.nlp.ie.machinereading.structure.RelationMention) Mention(edu.stanford.nlp.coref.data.Mention) EntityMention(edu.stanford.nlp.ie.machinereading.structure.EntityMention) Tree(edu.stanford.nlp.trees.Tree) SegmenterCoreAnnotations(edu.stanford.nlp.ling.SegmenterCoreAnnotations) SentimentCoreAnnotations(edu.stanford.nlp.sentiment.SentimentCoreAnnotations) CoreLabel(edu.stanford.nlp.ling.CoreLabel)

Example 45 with Mention

use of edu.stanford.nlp.coref.data.Mention in project CoreNLP by stanfordnlp.

the class ProtobufAnnotationSerializer method loadSentenceMentions.

protected void loadSentenceMentions(CoreNLPProtos.Sentence proto, CoreMap sentence) {
    // add all Mentions for this sentence
    if (proto.getHasCorefMentionsAnnotation()) {
        sentence.set(CorefMentionsAnnotation.class, new ArrayList<>());
    }
    if (proto.getMentionsForCorefList().size() != 0) {
        HashMap<Integer, Mention> idToMention = new HashMap<>();
        List<Mention> sentenceMentions = sentence.get(CorefMentionsAnnotation.class);
        // initial set up of all mentions
        for (CoreNLPProtos.Mention protoMention : proto.getMentionsForCorefList()) {
            Mention m = fromProtoNoTokens(protoMention);
            sentenceMentions.add(m);
            idToMention.put(m.mentionID, m);
        }
        // populate sets of Mentions for each Mention
        for (CoreNLPProtos.Mention protoMention : proto.getMentionsForCorefList()) {
            Mention m = idToMention.get(protoMention.getMentionID());
            if (protoMention.getAppositionsList().size() != 0) {
                m.appositions = new HashSet<>();
                m.appositions.addAll(protoMention.getAppositionsList().stream().map(idToMention::get).collect(Collectors.toList()));
            }
            if (protoMention.getPredicateNominativesList().size() != 0) {
                m.predicateNominatives = new HashSet<>();
                m.predicateNominatives.addAll(protoMention.getPredicateNominativesList().stream().map(idToMention::get).collect(Collectors.toList()));
            }
            if (protoMention.getRelativePronounsList().size() != 0) {
                m.relativePronouns = new HashSet<>();
                m.relativePronouns.addAll(protoMention.getRelativePronounsList().stream().map(idToMention::get).collect(Collectors.toList()));
            }
            if (protoMention.getListMembersList().size() != 0) {
                m.listMembers = new HashSet<>();
                m.listMembers.addAll(protoMention.getListMembersList().stream().map(idToMention::get).collect(Collectors.toList()));
            }
            if (protoMention.getBelongToListsList().size() != 0) {
                m.belongToLists = new HashSet<>();
                m.belongToLists.addAll(protoMention.getBelongToListsList().stream().map(idToMention::get).collect(Collectors.toList()));
            }
        }
    }
}
Also used : RelationMention(edu.stanford.nlp.ie.machinereading.structure.RelationMention) Mention(edu.stanford.nlp.coref.data.Mention) EntityMention(edu.stanford.nlp.ie.machinereading.structure.EntityMention)

Aggregations

Mention (edu.stanford.nlp.coref.data.Mention)62 CoreAnnotations (edu.stanford.nlp.ling.CoreAnnotations)27 CoreLabel (edu.stanford.nlp.ling.CoreLabel)27 SemanticGraphCoreAnnotations (edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations)21 ArrayList (java.util.ArrayList)20 TreeCoreAnnotations (edu.stanford.nlp.trees.TreeCoreAnnotations)17 CoreMap (edu.stanford.nlp.util.CoreMap)17 List (java.util.List)15 Tree (edu.stanford.nlp.trees.Tree)14 IntPair (edu.stanford.nlp.util.IntPair)14 CorefCluster (edu.stanford.nlp.coref.data.CorefCluster)12 SemanticGraph (edu.stanford.nlp.semgraph.SemanticGraph)10 ClassicCounter (edu.stanford.nlp.stats.ClassicCounter)9 EntityMention (edu.stanford.nlp.ie.machinereading.structure.EntityMention)7 RelationMention (edu.stanford.nlp.ie.machinereading.structure.RelationMention)7 ParserConstraint (edu.stanford.nlp.parser.common.ParserConstraint)7 HashMap (java.util.HashMap)7 HashSet (java.util.HashSet)7 SemanticGraphEdge (edu.stanford.nlp.semgraph.SemanticGraphEdge)6 Map (java.util.Map)6