Search in sources :

Example 6 with CrfSuiteAdapter

use of org.dkpro.tc.ml.crfsuite.CrfSuiteAdapter in project dkpro-tc by dkpro.

the class CRFSuiteSaveAndLoadModelTest method loadModelArow.

@Test
public void loadModelArow() throws Exception {
    Map<String, Object> config = new HashMap<>();
    config.put(DIM_CLASSIFICATION_ARGS, new Object[] { new CrfSuiteAdapter(), CrfSuiteAdapter.ALGORITHM_ADAPTIVE_REGULARIZATION_OF_WEIGHT_VECTOR, "-p", "max_iterations=2" });
    config.put(DIM_DATA_WRITER, new CrfSuiteAdapter().getDataWriterClass().getName());
    config.put(DIM_FEATURE_USE_SPARSE, new WekaAdapter().useSparseFeatures());
    Dimension<Map<String, Object>> mlas = Dimension.createBundle("config", config);
    // create a model
    File modelFolder = folder.newFolder();
    ParameterSpace pSpace = getParameterSpace(mlas);
    executeSaveModelIntoTemporyFolder(pSpace, modelFolder);
    JCas jcas = JCasFactory.createJCas();
    jcas.setDocumentText("This is an example text. It has 2 sentences.");
    jcas.setDocumentLanguage("en");
    AnalysisEngine tokenizer = AnalysisEngineFactory.createEngine(BreakIteratorSegmenter.class);
    AnalysisEngine tcAnno = AnalysisEngineFactory.createEngine(TcAnnotator.class, TcAnnotator.PARAM_TC_MODEL_LOCATION, modelFolder.getAbsolutePath(), TcAnnotator.PARAM_NAME_SEQUENCE_ANNOTATION, Sentence.class.getName(), TcAnnotator.PARAM_NAME_UNIT_ANNOTATION, Token.class.getName());
    tokenizer.process(jcas);
    tcAnno.process(jcas);
    List<TextClassificationOutcome> outcomes = new ArrayList<>(JCasUtil.select(jcas, TextClassificationOutcome.class));
    // 9 token + 2 punctuation marks
    assertEquals(11, outcomes.size());
    for (TextClassificationOutcome o : outcomes) {
        String label = o.getOutcome();
        assertTrue(postags.contains(label));
    }
}
Also used : HashMap(java.util.HashMap) ArrayList(java.util.ArrayList) JCas(org.apache.uima.jcas.JCas) Token(de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token) CrfSuiteAdapter(org.dkpro.tc.ml.crfsuite.CrfSuiteAdapter) WekaAdapter(org.dkpro.tc.ml.weka.WekaAdapter) ParameterSpace(org.dkpro.lab.task.ParameterSpace) TextClassificationOutcome(org.dkpro.tc.api.type.TextClassificationOutcome) HashMap(java.util.HashMap) Map(java.util.Map) File(java.io.File) Sentence(de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence) AnalysisEngine(org.apache.uima.analysis_engine.AnalysisEngine) Test(org.junit.Test)

Example 7 with CrfSuiteAdapter

use of org.dkpro.tc.ml.crfsuite.CrfSuiteAdapter in project dkpro-tc by dkpro.

the class CRFSuiteBrownPosDemoSimpleDkproReader method main.

public static void main(String[] args) throws Exception {
    // This is used to ensure that the required DKPRO_HOME environment
    // variable is set.
    // Ensures that people can run the experiments even if they haven't read
    // the setup
    // instructions first :)
    DemoUtils.setDkproHome(CRFSuiteBrownPosDemoSimpleDkproReader.class.getSimpleName());
    Map<String, Object> config = new HashMap<>();
    config.put(DIM_CLASSIFICATION_ARGS, new Object[] { new CrfSuiteAdapter() });
    config.put(DIM_DATA_WRITER, new CrfSuiteAdapter().getDataWriterClass().getName());
    config.put(DIM_FEATURE_USE_SPARSE, new CrfSuiteAdapter().useSparseFeatures());
    Dimension<Map<String, Object>> mlas = Dimension.createBundle("config", config);
    ParameterSpace pSpace = getParameterSpace(Constants.FM_SEQUENCE, Constants.LM_SINGLE_LABEL, mlas, null);
    CRFSuiteBrownPosDemoSimpleDkproReader experiment = new CRFSuiteBrownPosDemoSimpleDkproReader();
    experiment.runTrainTest(pSpace);
}
Also used : HashMap(java.util.HashMap) ParameterSpace(org.dkpro.lab.task.ParameterSpace) HashMap(java.util.HashMap) Map(java.util.Map) CrfSuiteAdapter(org.dkpro.tc.ml.crfsuite.CrfSuiteAdapter)

Aggregations

HashMap (java.util.HashMap)7 Map (java.util.Map)7 ParameterSpace (org.dkpro.lab.task.ParameterSpace)7 CrfSuiteAdapter (org.dkpro.tc.ml.crfsuite.CrfSuiteAdapter)7 Test (org.junit.Test)5 File (java.io.File)3 WekaAdapter (org.dkpro.tc.ml.weka.WekaAdapter)3 Sentence (de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence)2 Token (de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token)2 ArrayList (java.util.ArrayList)2 AnalysisEngine (org.apache.uima.analysis_engine.AnalysisEngine)2 JCas (org.apache.uima.jcas.JCas)2 TextClassificationOutcome (org.dkpro.tc.api.type.TextClassificationOutcome)2 Arrays.asList (java.util.Arrays.asList)1 List (java.util.List)1 CollectionReaderDescription (org.apache.uima.collection.CollectionReaderDescription)1 TcFeatureSet (org.dkpro.tc.api.features.TcFeatureSet)1