Search in sources :

Example 1 with FeatureExtractor

use of org.dkpro.tc.api.features.FeatureExtractor in project dkpro-tc by dkpro.

the class InstanceExtractor method getUnitInstances.

public List<Instance> getUnitInstances(JCas jcas, boolean supportSparseFeatures) throws TextClassificationException {
    List<Instance> instances = new ArrayList<Instance>();
    int jcasId = JCasUtil.selectSingle(jcas, JCasId.class).getId();
    Collection<TextClassificationTarget> targets = JCasUtil.select(jcas, TextClassificationTarget.class);
    for (TextClassificationTarget aTarget : targets) {
        Instance instance = new Instance();
        if (addInstanceId) {
            Feature feat = InstanceIdFeature.retrieve(jcas, aTarget);
            instance.addFeature(feat);
        }
        for (FeatureExtractorResource_ImplBase featExt : featureExtractors) {
            if (!(featExt instanceof FeatureExtractor)) {
                throw new TextClassificationException("Feature extractor does not implement interface [" + FeatureExtractor.class.getName() + "]: " + featExt.getResourceName());
            }
            if (supportSparseFeatures) {
                instance.addFeatures(getSparse(jcas, aTarget, featExt));
            } else {
                instance.addFeatures(getDense(jcas, aTarget, featExt));
            }
        }
        // set and write outcome label(s)
        instance.setOutcomes(getOutcomes(jcas, aTarget));
        instance.setWeight(getWeight(jcas, aTarget));
        instance.setJcasId(jcasId);
        // instance.setSequenceId(sequenceId);
        instance.setSequencePosition(aTarget.getId());
        instances.add(instance);
    }
    return instances;
}
Also used : JCasId(org.dkpro.tc.api.type.JCasId) FeatureExtractor(org.dkpro.tc.api.features.FeatureExtractor) PairFeatureExtractor(org.dkpro.tc.api.features.PairFeatureExtractor) Instance(org.dkpro.tc.api.features.Instance) TextClassificationException(org.dkpro.tc.api.exception.TextClassificationException) ArrayList(java.util.ArrayList) TextClassificationTarget(org.dkpro.tc.api.type.TextClassificationTarget) Feature(org.dkpro.tc.api.features.Feature) InstanceIdFeature(org.dkpro.tc.core.feature.InstanceIdFeature) FeatureExtractorResource_ImplBase(org.dkpro.tc.api.features.FeatureExtractorResource_ImplBase)

Example 2 with FeatureExtractor

use of org.dkpro.tc.api.features.FeatureExtractor in project dkpro-tc by dkpro.

the class InstanceExtractor method getSparse.

private Set<Feature> getSparse(JCas jcas, TextClassificationTarget unit, FeatureExtractorResource_ImplBase featExt) throws TextClassificationException {
    Set<Feature> features = ((FeatureExtractor) featExt).extract(jcas, unit);
    Set<Feature> filtered = new HashSet<>();
    for (Feature f : features) {
        if (!f.isDefaultValue()) {
            filtered.add(f);
        }
    }
    return filtered;
}
Also used : FeatureExtractor(org.dkpro.tc.api.features.FeatureExtractor) PairFeatureExtractor(org.dkpro.tc.api.features.PairFeatureExtractor) Feature(org.dkpro.tc.api.features.Feature) InstanceIdFeature(org.dkpro.tc.core.feature.InstanceIdFeature) HashSet(java.util.HashSet)

Example 3 with FeatureExtractor

use of org.dkpro.tc.api.features.FeatureExtractor in project dkpro-tc by dkpro.

the class InstanceExtractor method getSingleInstanceDocument.

private Instance getSingleInstanceDocument(Instance instance, JCas jcas, boolean supportSparseFeatures) throws TextClassificationException {
    int jcasId = JCasUtil.selectSingle(jcas, JCasId.class).getId();
    TextClassificationTarget documentTcu = JCasUtil.selectSingle(jcas, TextClassificationTarget.class);
    if (addInstanceId) {
        instance.addFeature(InstanceIdFeature.retrieve(jcas));
    }
    for (FeatureExtractorResource_ImplBase featExt : featureExtractors) {
        if (!(featExt instanceof FeatureExtractor)) {
            throw new TextClassificationException("Using incompatible feature in document mode: " + featExt.getResourceName());
        }
        if (supportSparseFeatures) {
            instance.addFeatures(getSparse(jcas, documentTcu, featExt));
        } else {
            instance.addFeatures(getDense(jcas, documentTcu, featExt));
        }
        instance.setOutcomes(getOutcomes(jcas, null));
        instance.setWeight(getWeight(jcas, null));
        instance.setJcasId(jcasId);
    }
    return instance;
}
Also used : JCasId(org.dkpro.tc.api.type.JCasId) FeatureExtractor(org.dkpro.tc.api.features.FeatureExtractor) PairFeatureExtractor(org.dkpro.tc.api.features.PairFeatureExtractor) TextClassificationException(org.dkpro.tc.api.exception.TextClassificationException) TextClassificationTarget(org.dkpro.tc.api.type.TextClassificationTarget) FeatureExtractorResource_ImplBase(org.dkpro.tc.api.features.FeatureExtractorResource_ImplBase)

Aggregations

FeatureExtractor (org.dkpro.tc.api.features.FeatureExtractor)3 PairFeatureExtractor (org.dkpro.tc.api.features.PairFeatureExtractor)3 TextClassificationException (org.dkpro.tc.api.exception.TextClassificationException)2 Feature (org.dkpro.tc.api.features.Feature)2 FeatureExtractorResource_ImplBase (org.dkpro.tc.api.features.FeatureExtractorResource_ImplBase)2 JCasId (org.dkpro.tc.api.type.JCasId)2 TextClassificationTarget (org.dkpro.tc.api.type.TextClassificationTarget)2 InstanceIdFeature (org.dkpro.tc.core.feature.InstanceIdFeature)2 ArrayList (java.util.ArrayList)1 HashSet (java.util.HashSet)1 Instance (org.dkpro.tc.api.features.Instance)1