Search in sources :

Example 1 with POSRatioFeatureExtractor

use of org.dkpro.tc.features.syntax.POSRatioFeatureExtractor in project dkpro-tc by dkpro.

the class POSRatioTest method posContextFeatureExtractorTest.

@Test
public void posContextFeatureExtractorTest() throws Exception {
    AnalysisEngineDescription desc = createEngineDescription(createEngineDescription(BreakIteratorSegmenter.class), createEngineDescription(OpenNlpPosTagger.class, OpenNlpPosTagger.PARAM_LANGUAGE, "en"));
    AnalysisEngine engine = createEngine(desc);
    JCas jcas = engine.newJCas();
    jcas.setDocumentLanguage("en");
    jcas.setDocumentText("As the emeritus pope leaves the Vatican for the papal residence of Castel Gandolfo – and becomes the first pontiff to resign in 600 years – the operation to choose his successor begins. With the throne of St Peter declared empty and the interregnum formally begun, as many of the 208 cardinals who can make the journey will be expected to travel to the Vatican to help run the church in the absence of a pope.");
    engine.process(jcas);
    TextClassificationTarget aTarget = new TextClassificationTarget(jcas, 0, jcas.getDocumentText().length());
    aTarget.addToIndexes();
    POSRatioFeatureExtractor extractor = new POSRatioFeatureExtractor();
    List<Feature> features = new ArrayList<Feature>(extractor.extract(jcas, aTarget));
    Assert.assertEquals(11, features.size());
    for (Feature feature : features) {
        if (feature.getName().equals(FN_N_RATIO)) {
            assertFeature(FN_N_RATIO, 0.2658, feature, 0.0001);
        } else if (feature.getName().equals(FN_PUNC_RATIO)) {
            assertFeature(FN_PUNC_RATIO, 0.0380, feature, 0.0001);
        }
    }
}
Also used : POSRatioFeatureExtractor(org.dkpro.tc.features.syntax.POSRatioFeatureExtractor) BreakIteratorSegmenter(de.tudarmstadt.ukp.dkpro.core.tokit.BreakIteratorSegmenter) AnalysisEngineDescription(org.apache.uima.analysis_engine.AnalysisEngineDescription) TextClassificationTarget(org.dkpro.tc.api.type.TextClassificationTarget) ArrayList(java.util.ArrayList) JCas(org.apache.uima.jcas.JCas) OpenNlpPosTagger(de.tudarmstadt.ukp.dkpro.core.opennlp.OpenNlpPosTagger) FeatureTestUtil.assertFeature(org.dkpro.tc.testing.FeatureTestUtil.assertFeature) Feature(org.dkpro.tc.api.features.Feature) AnalysisEngine(org.apache.uima.analysis_engine.AnalysisEngine) Test(org.junit.Test)

Aggregations

OpenNlpPosTagger (de.tudarmstadt.ukp.dkpro.core.opennlp.OpenNlpPosTagger)1 BreakIteratorSegmenter (de.tudarmstadt.ukp.dkpro.core.tokit.BreakIteratorSegmenter)1 ArrayList (java.util.ArrayList)1 AnalysisEngine (org.apache.uima.analysis_engine.AnalysisEngine)1 AnalysisEngineDescription (org.apache.uima.analysis_engine.AnalysisEngineDescription)1 JCas (org.apache.uima.jcas.JCas)1 Feature (org.dkpro.tc.api.features.Feature)1 TextClassificationTarget (org.dkpro.tc.api.type.TextClassificationTarget)1 POSRatioFeatureExtractor (org.dkpro.tc.features.syntax.POSRatioFeatureExtractor)1 FeatureTestUtil.assertFeature (org.dkpro.tc.testing.FeatureTestUtil.assertFeature)1 Test (org.junit.Test)1