Search in sources :

Example 61 with AnalysisEngineDescription

use of org.apache.uima.analysis_engine.AnalysisEngineDescription in project dkpro-tc by dkpro.

the class FeatureResourceLoader method configureOverrides.

private void configureOverrides(File tcModelLocation, ExternalResourceDescription exRes, Map<String, String> overrides) throws IOException {
    // We assume for the moment that we only have primitive analysis engines
    // for meta
    // collection, not aggregates. If there were aggregates, we'd have to do
    // this
    // recursively
    ResourceSpecifier aDesc = exRes.getResourceSpecifier();
    if (aDesc instanceof AnalysisEngineDescription) {
        // Analysis engines are ok
        if (!((AnalysisEngineDescription) aDesc).isPrimitive()) {
            throw new IllegalArgumentException("Only primitive meta collectors currently supported.");
        }
    } else if (aDesc instanceof CustomResourceSpecifier_impl) {
    // Feature extractors are ok
    } else {
        throw new IllegalArgumentException("Descriptors of type " + aDesc.getClass() + " not supported.");
    }
    for (Entry<String, String> e : overrides.entrySet()) {
        // We generate a storage location from the feature extractor
        // discriminator value
        // and the preferred value specified by the meta collector
        String parameterName = e.getKey();
        ConfigurationParameterFactory.setParameter(aDesc, parameterName, new File(tcModelLocation, e.getValue()).getAbsolutePath());
    }
}
Also used : AnalysisEngineDescription(org.apache.uima.analysis_engine.AnalysisEngineDescription) CustomResourceSpecifier_impl(org.apache.uima.resource.impl.CustomResourceSpecifier_impl) ResourceSpecifier(org.apache.uima.resource.ResourceSpecifier) File(java.io.File)

Example 62 with AnalysisEngineDescription

use of org.apache.uima.analysis_engine.AnalysisEngineDescription in project dkpro-tc by dkpro.

the class FoldUtil method createMinimalSplit.

/**
 * Takes the available CAS and creates more cases from them to conform to the minimal requested
 * amount of CAS objects to have sufficient for running a cross-validation. Computes a
 * rule-of-thumb value to split each of the found cas into N sub-cases and the end the total
 * created number is compared to the requested number of CAS and an exception thrown if too few
 * CAS were created.
 *
 * @param inputFolder
 *            the input folder
 * @param numFolds
 *            number of folds to create
 * @param numAvailableJCas
 *            number available cas'
 * @param isSequence
 *            is sequence model
 * @return returns folder with sufficient folds
 * @throws Exception
 *             if not enough data is available for creating the required number of folds
 */
public static File createMinimalSplit(String inputFolder, int numFolds, int numAvailableJCas, boolean isSequence) throws Exception {
    File outputFolder = new File(inputFolder, "output");
    int splitNum = (int) Math.ceil(numFolds / (double) numAvailableJCas);
    CollectionReaderDescription createReader = CollectionReaderFactory.createReaderDescription(BinaryCasReader.class, BinaryCasReader.PARAM_SOURCE_LOCATION, inputFolder, BinaryCasReader.PARAM_PATTERNS, "*.bin", BinaryCasReader.PARAM_ADD_DOCUMENT_METADATA, false);
    AnalysisEngineDescription multiplier = AnalysisEngineFactory.createEngineDescription(FoldClassificationUnitCasMultiplier.class, FoldClassificationUnitCasMultiplier.PARAM_REQUESTED_SPLITS, splitNum, FoldClassificationUnitCasMultiplier.PARAM_USE_SEQUENCES, isSequence);
    AnalysisEngineDescription xmiWriter = AnalysisEngineFactory.createEngineDescription(BinaryCasWriter.class, BinaryCasWriter.PARAM_TARGET_LOCATION, outputFolder.getAbsolutePath(), BinaryCasWriter.PARAM_FORMAT, "6+");
    AnalysisEngineDescription both = AnalysisEngineFactory.createEngineDescription(multiplier, xmiWriter);
    SimplePipeline.runPipeline(createReader, both);
    // final check - do we have at least as many folds as requested by "numFolds"?
    isNumberOfCasCreatedLargerEqualNumFolds(outputFolder, numFolds);
    return outputFolder;
}
Also used : CollectionReaderDescription(org.apache.uima.collection.CollectionReaderDescription) AnalysisEngineDescription(org.apache.uima.analysis_engine.AnalysisEngineDescription) File(java.io.File)

Aggregations

AnalysisEngineDescription (org.apache.uima.analysis_engine.AnalysisEngineDescription)62 Test (org.junit.Test)32 File (java.io.File)27 CollectionReaderDescription (org.apache.uima.collection.CollectionReaderDescription)25 ArrayList (java.util.ArrayList)22 AnalysisEngine (org.apache.uima.analysis_engine.AnalysisEngine)18 JCas (org.apache.uima.jcas.JCas)16 Feature (org.dkpro.tc.api.features.Feature)13 FeatureTestUtil.assertFeature (org.dkpro.tc.testing.FeatureTestUtil.assertFeature)11 ExternalResourceDescription (org.apache.uima.resource.ExternalResourceDescription)10 AggregateBuilder (org.apache.uima.fit.factory.AggregateBuilder)8 ResourceInitializationException (org.apache.uima.resource.ResourceInitializationException)8 JsonDataWriter (org.dkpro.tc.core.io.JsonDataWriter)8 TextClassificationTarget (org.dkpro.tc.api.type.TextClassificationTarget)7 Gson (com.google.gson.Gson)6 IOException (java.io.IOException)6 Instance (org.dkpro.tc.api.features.Instance)6 OpenNlpPosTagger (de.tudarmstadt.ukp.dkpro.core.opennlp.OpenNlpPosTagger)4 BreakIteratorSegmenter (de.tudarmstadt.ukp.dkpro.core.tokit.BreakIteratorSegmenter)4 CAS (org.apache.uima.cas.CAS)4