Search in sources :

Example 11 with Feature

use of edu.illinois.cs.cogcomp.edison.features.Feature in project cogcomp-nlp by CogComp.

the class ParsePhraseType method getFeatures.

@Override
public Set<Feature> getFeatures(Constituent c) throws EdisonException {
    TextAnnotation ta = c.getTextAnnotation();
    TreeView tree = (TreeView) ta.getView(parseViewname);
    Constituent phrase;
    try {
        phrase = tree.getParsePhrase(c);
    } catch (Exception e) {
        throw new EdisonException(e);
    }
    Set<Feature> features = new LinkedHashSet<>();
    if (phrase != null) {
        features.add(DiscreteFeature.create(phrase.getLabel()));
        String parentLabel = "ROOT";
        if (phrase.getIncomingRelations().size() > 0) {
            Constituent parent = phrase.getIncomingRelations().get(0).getSource();
            parentLabel = parent.getLabel();
            int parentHead = CollinsHeadFinder.getInstance().getHeadWordPosition(parent);
            features.add(DiscreteFeature.create("pt:h:" + ta.getToken(parentHead).toLowerCase().trim()));
            features.add(DiscreteFeature.create("pt:h-pos:" + WordHelpers.getPOS(ta, parentHead)));
        }
        features.add(DiscreteFeature.create("pt:" + parentLabel));
    }
    return features;
}
Also used : LinkedHashSet(java.util.LinkedHashSet) TreeView(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TreeView) TextAnnotation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation) EdisonException(edu.illinois.cs.cogcomp.edison.utilities.EdisonException) DiscreteFeature(edu.illinois.cs.cogcomp.edison.features.DiscreteFeature) Feature(edu.illinois.cs.cogcomp.edison.features.Feature) Constituent(edu.illinois.cs.cogcomp.core.datastructures.textannotation.Constituent) EdisonException(edu.illinois.cs.cogcomp.edison.utilities.EdisonException)

Example 12 with Feature

use of edu.illinois.cs.cogcomp.edison.features.Feature in project cogcomp-nlp by CogComp.

the class ParseSiblings method getFeatures.

@Override
public Set<Feature> getFeatures(Constituent c) throws EdisonException {
    TextAnnotation ta = c.getTextAnnotation();
    TreeView parse = (TreeView) ta.getView(parseViewName);
    Constituent phrase;
    try {
        phrase = parse.getParsePhrase(c);
    } catch (Exception e) {
        throw new EdisonException(e);
    }
    Set<Feature> features = new LinkedHashSet<>();
    if (phrase.getIncomingRelations().size() == 0) {
        features.add(DiscreteFeature.create("ONLY_CHILD"));
    } else {
        Relation incomingEdge = phrase.getIncomingRelations().get(0);
        Constituent parent = incomingEdge.getSource();
        int position = -1;
        for (int i = 0; i < parent.getOutgoingRelations().size(); i++) {
            if (parent.getOutgoingRelations().get(i) == incomingEdge) {
                position = i;
                break;
            }
        }
        assert position >= 0;
        if (position == 0)
            features.add(DiscreteFeature.create("FIRST_CHILD"));
        else if (position == parent.getOutgoingRelations().size() - 1)
            features.add(DiscreteFeature.create("LAST_CHILD"));
        if (position != 0) {
            Constituent sibling = parent.getOutgoingRelations().get(position - 1).getTarget();
            String phraseType = sibling.getLabel();
            int headWord = CollinsHeadFinder.getInstance().getHeadWordPosition(sibling);
            String token = ta.getToken(headWord).toLowerCase().trim();
            String pos = WordHelpers.getPOS(ta, headWord);
            features.add(DiscreteFeature.create("lsis.pt:" + phraseType));
            features.add(DiscreteFeature.create("lsis.hw:" + token));
            features.add(DiscreteFeature.create("lsis.hw.pos:" + pos));
        }
        if (position != parent.getOutgoingRelations().size() - 1) {
            Constituent sibling = parent.getOutgoingRelations().get(position + 1).getTarget();
            String phraseType = sibling.getLabel();
            int headWord = CollinsHeadFinder.getInstance().getHeadWordPosition(sibling);
            String token = ta.getToken(headWord).toLowerCase().trim();
            String pos = WordHelpers.getPOS(ta, headWord);
            features.add(DiscreteFeature.create("rsis.pt:" + phraseType));
            features.add(DiscreteFeature.create("rsis.hw:" + token));
            features.add(DiscreteFeature.create("rsis.hw.pos:" + pos));
        }
    }
    return features;
}
Also used : LinkedHashSet(java.util.LinkedHashSet) Relation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.Relation) TreeView(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TreeView) TextAnnotation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation) EdisonException(edu.illinois.cs.cogcomp.edison.utilities.EdisonException) Feature(edu.illinois.cs.cogcomp.edison.features.Feature) DiscreteFeature(edu.illinois.cs.cogcomp.edison.features.DiscreteFeature) Constituent(edu.illinois.cs.cogcomp.core.datastructures.textannotation.Constituent) EdisonException(edu.illinois.cs.cogcomp.edison.utilities.EdisonException)

Example 13 with Feature

use of edu.illinois.cs.cogcomp.edison.features.Feature in project cogcomp-nlp by CogComp.

the class RogetThesaurusFeatures method getFeatures.

@Override
public Set<Feature> getFeatures(Constituent c) throws EdisonException {
    if (!loaded) {
        try {
            // not load the data from classpath; instead using the datastore
            // loadFromClassPath();
            loadFromDatastore();
        } catch (Exception e) {
            throw new EdisonException(e);
        }
    }
    String s = c.getTokenizedSurfaceForm().trim();
    Set<Feature> features = new LinkedHashSet<>();
    if (map.containsKey(s)) {
        for (int i : map.get(s)) {
            features.add(DiscreteFeature.create(this.id2ClassName.get(i)));
        }
    } else if (map.containsKey(s.toLowerCase())) {
        for (int i : map.get(s.toLowerCase())) {
            features.add(DiscreteFeature.create(this.id2ClassName.get(i)));
        }
    }
    return features;
}
Also used : EdisonException(edu.illinois.cs.cogcomp.edison.utilities.EdisonException) Feature(edu.illinois.cs.cogcomp.edison.features.Feature) DiscreteFeature(edu.illinois.cs.cogcomp.edison.features.DiscreteFeature) EdisonException(edu.illinois.cs.cogcomp.edison.utilities.EdisonException)

Example 14 with Feature

use of edu.illinois.cs.cogcomp.edison.features.Feature in project cogcomp-nlp by CogComp.

the class DependencyPath method getFeatures.

@Override
public Set<Feature> getFeatures(Constituent c) throws EdisonException {
    TextAnnotation ta = c.getTextAnnotation();
    TreeView parse = (TreeView) ta.getView(dependencyViewName);
    Constituent c1 = parse.getConstituentsCoveringToken(c.getIncomingRelations().get(0).getSource().getStartSpan()).get(0);
    Constituent c2 = parse.getConstituentsCoveringToken(c.getStartSpan()).get(0);
    Pair<List<Constituent>, List<Constituent>> paths = PathFeatureHelper.getPathsToCommonAncestor(c1, c2, 400);
    int length = paths.getFirst().size() + paths.getSecond().size() - 1;
    StringBuilder path = new StringBuilder();
    StringBuilder pos = new StringBuilder();
    for (int i = 0; i < paths.getFirst().size() - 1; i++) {
        Constituent cc = paths.getFirst().get(i);
        path.append(cc.getIncomingRelations().get(0).getRelationName()).append(PathFeatureHelper.PATH_UP_STRING);
        pos.append(WordHelpers.getPOS(ta, cc.getStartSpan()));
        pos.append(cc.getIncomingRelations().get(0).getRelationName()).append(PathFeatureHelper.PATH_UP_STRING);
    }
    Constituent top = paths.getFirst().get(paths.getFirst().size() - 1);
    pos.append(WordHelpers.getPOS(ta, top.getStartSpan()));
    pos.append("*");
    path.append("*");
    if (paths.getSecond().size() > 1) {
        for (int i = paths.getSecond().size() - 2; i >= 0; i--) {
            Constituent cc = paths.getSecond().get(i);
            pos.append(WordHelpers.getPOS(ta, cc.getStartSpan()));
            pos.append(PathFeatureHelper.PATH_DOWN_STRING);
            path.append(PathFeatureHelper.PATH_DOWN_STRING);
        }
    }
    Set<Feature> features = new LinkedHashSet<>();
    features.add(DiscreteFeature.create(path.toString()));
    features.add(DiscreteFeature.create("pos" + pos.toString()));
    features.add(RealFeature.create("l", length));
    return features;
}
Also used : LinkedHashSet(java.util.LinkedHashSet) TreeView(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TreeView) List(java.util.List) TextAnnotation(edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation) Feature(edu.illinois.cs.cogcomp.edison.features.Feature) DiscreteFeature(edu.illinois.cs.cogcomp.edison.features.DiscreteFeature) RealFeature(edu.illinois.cs.cogcomp.edison.features.RealFeature) Constituent(edu.illinois.cs.cogcomp.core.datastructures.textannotation.Constituent)

Example 15 with Feature

use of edu.illinois.cs.cogcomp.edison.features.Feature in project cogcomp-nlp by CogComp.

the class LinearDistance method getFeatures.

@Override
public Set<Feature> getFeatures(Constituent c) throws EdisonException {
    Constituent predicate = c.getIncomingRelations().get(0).getSource();
    Set<Feature> features = new LinkedHashSet<>();
    if (Queries.before(predicate).transform(c)) {
        int first = c.getEndSpan() - 1;
        int second = predicate.getStartSpan();
        int diff = second - first;
        assert diff > 0;
        switch(diff) {
            case 1:
                features.add(MINUS_ONE);
                break;
            case 2:
                features.add(MINUS_TWO);
                break;
            case 3:
                features.add(MINUS_THREE);
                break;
            default:
                features.add(MINUS_MANY);
        }
    } else if (Queries.after(predicate).transform(c)) {
        int first = predicate.getEndSpan() - 1;
        int second = c.getStartSpan();
        int diff = second - first;
        assert diff > 0;
        switch(diff) {
            case 1:
                features.add(ONE);
                break;
            case 2:
                features.add(TWO);
                break;
            case 3:
                features.add(THREE);
                break;
            default:
                features.add(MANY);
        }
    } else
        features.add(ZERO);
    return features;
}
Also used : LinkedHashSet(java.util.LinkedHashSet) DiscreteFeature(edu.illinois.cs.cogcomp.edison.features.DiscreteFeature) Feature(edu.illinois.cs.cogcomp.edison.features.Feature) Constituent(edu.illinois.cs.cogcomp.core.datastructures.textannotation.Constituent)

Aggregations

Feature (edu.illinois.cs.cogcomp.edison.features.Feature)71 TextAnnotation (edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation)48 Constituent (edu.illinois.cs.cogcomp.core.datastructures.textannotation.Constituent)44 DiscreteFeature (edu.illinois.cs.cogcomp.edison.features.DiscreteFeature)41 LinkedHashSet (java.util.LinkedHashSet)24 View (edu.illinois.cs.cogcomp.core.datastructures.textannotation.View)22 EdisonException (edu.illinois.cs.cogcomp.edison.utilities.EdisonException)17 Test (org.junit.Test)13 TreeView (edu.illinois.cs.cogcomp.core.datastructures.textannotation.TreeView)12 HashSet (java.util.HashSet)11 Relation (edu.illinois.cs.cogcomp.core.datastructures.textannotation.Relation)10 ArrayList (java.util.ArrayList)9 PredicateArgumentView (edu.illinois.cs.cogcomp.core.datastructures.textannotation.PredicateArgumentView)8 RealFeature (edu.illinois.cs.cogcomp.edison.features.RealFeature)8 Set (java.util.Set)6 TokenLabelView (edu.illinois.cs.cogcomp.core.datastructures.textannotation.TokenLabelView)5 POSBaseLineCounter (edu.illinois.cs.cogcomp.edison.utilities.POSBaseLineCounter)5 POSMikheevCounter (edu.illinois.cs.cogcomp.edison.utilities.POSMikheevCounter)5 ModelInfo (edu.illinois.cs.cogcomp.verbsense.core.ModelInfo)3 List (java.util.List)3