Search in sources :

Example 11 with ExternalResourceDescription

use of org.apache.uima.resource.ExternalResourceDescription in project dkpro-tc by dkpro.

the class ExtractFeaturesConnectorTest method extractFeaturesConnectorSingleLabelTest.

@Test
public void extractFeaturesConnectorSingleLabelTest() throws Exception {
    File outputPath = folder.newFolder();
    // we do not need parameters here, but in case we do :)
    Object[] parameters = new Object[] { NoopFeatureExtractor.PARAM_UNIQUE_EXTRACTOR_NAME, "123" };
    ExternalResourceDescription featureExtractor = ExternalResourceFactory.createExternalResourceDescription(NoopFeatureExtractor.class, parameters);
    List<ExternalResourceDescription> fes = new ArrayList<>();
    fes.add(featureExtractor);
    CollectionReaderDescription reader = CollectionReaderFactory.createReaderDescription(TestReaderSingleLabel.class, TestReaderSingleLabel.PARAM_SOURCE_LOCATION, "src/test/resources/data/*.txt");
    AnalysisEngineDescription segmenter = AnalysisEngineFactory.createEngineDescription(BreakIteratorSegmenter.class);
    AnalysisEngineDescription doc = AnalysisEngineFactory.createEngineDescription(DocumentModeAnnotator.class, DocumentModeAnnotator.PARAM_FEATURE_MODE, Constants.FM_DOCUMENT);
    AnalysisEngineDescription featExtractorConnector = TaskUtils.getFeatureExtractorConnector(outputPath.getAbsolutePath(), JsonDataWriter.class.getName(), Constants.LM_REGRESSION, Constants.FM_DOCUMENT, false, false, false, false, Collections.emptyList(), fes, new String[] {});
    SimplePipeline.runPipeline(reader, segmenter, doc, featExtractorConnector);
    Gson gson = new Gson();
    System.out.println(FileUtils.readFileToString(new File(outputPath, JsonDataWriter.JSON_FILE_NAME), "utf-8"));
    List<String> lines = FileUtils.readLines(new File(outputPath, JsonDataWriter.JSON_FILE_NAME), "utf-8");
    List<Instance> instances = new ArrayList<>();
    for (String l : lines) {
        instances.add(gson.fromJson(l, Instance.class));
    }
    assertEquals(2, instances.size());
    assertEquals(1, getUniqueOutcomes(instances));
}
Also used : JsonDataWriter(org.dkpro.tc.core.io.JsonDataWriter) Instance(org.dkpro.tc.api.features.Instance) ArrayList(java.util.ArrayList) Gson(com.google.gson.Gson) CollectionReaderDescription(org.apache.uima.collection.CollectionReaderDescription) AnalysisEngineDescription(org.apache.uima.analysis_engine.AnalysisEngineDescription) File(java.io.File) ExternalResourceDescription(org.apache.uima.resource.ExternalResourceDescription) Test(org.junit.Test)

Example 12 with ExternalResourceDescription

use of org.apache.uima.resource.ExternalResourceDescription in project dkpro-tc by dkpro.

the class ExtractFeaturesConnectorTest method extractFeaturesConnectorMultiLabelTest.

@Test
public void extractFeaturesConnectorMultiLabelTest() throws Exception {
    File outputPath = folder.newFolder();
    // we do not need parameters here, but in case we do :)
    Object[] parameters = new Object[] { NoopFeatureExtractor.PARAM_UNIQUE_EXTRACTOR_NAME, "123" };
    ExternalResourceDescription featureExtractor = ExternalResourceFactory.createExternalResourceDescription(NoopFeatureExtractor.class, parameters);
    List<ExternalResourceDescription> fes = new ArrayList<>();
    fes.add(featureExtractor);
    CollectionReaderDescription reader = CollectionReaderFactory.createReaderDescription(TestReaderMultiLabel.class, TestReaderMultiLabel.PARAM_SOURCE_LOCATION, "src/test/resources/data/*.txt");
    AnalysisEngineDescription segmenter = AnalysisEngineFactory.createEngineDescription(BreakIteratorSegmenter.class);
    AnalysisEngineDescription doc = AnalysisEngineFactory.createEngineDescription(DocumentModeAnnotator.class, DocumentModeAnnotator.PARAM_FEATURE_MODE, Constants.FM_DOCUMENT);
    AnalysisEngineDescription featExtractorConnector = TaskUtils.getFeatureExtractorConnector(outputPath.getAbsolutePath(), JsonDataWriter.class.getName(), Constants.LM_REGRESSION, Constants.FM_DOCUMENT, false, false, false, false, Collections.emptyList(), fes, new String[] {});
    SimplePipeline.runPipeline(reader, segmenter, doc, featExtractorConnector);
    Gson gson = new Gson();
    List<String> lines = FileUtils.readLines(new File(outputPath, JsonDataWriter.JSON_FILE_NAME), "utf-8");
    List<Instance> instances = new ArrayList<>();
    for (String l : lines) {
        instances.add(gson.fromJson(l, Instance.class));
    }
    assertEquals(2, instances.size());
    assertEquals(3, getUniqueOutcomes(instances));
}
Also used : JsonDataWriter(org.dkpro.tc.core.io.JsonDataWriter) Instance(org.dkpro.tc.api.features.Instance) ArrayList(java.util.ArrayList) Gson(com.google.gson.Gson) CollectionReaderDescription(org.apache.uima.collection.CollectionReaderDescription) AnalysisEngineDescription(org.apache.uima.analysis_engine.AnalysisEngineDescription) File(java.io.File) ExternalResourceDescription(org.apache.uima.resource.ExternalResourceDescription) Test(org.junit.Test)

Example 13 with ExternalResourceDescription

use of org.apache.uima.resource.ExternalResourceDescription in project dkpro-tc by dkpro.

the class ModelSerializationTask method copyParameters.

private StringBuilder copyParameters(TcFeature f, StringBuilder sb, File aOutputFolder) throws IOException {
    sb.append(f.getFeatureName() + "\t");
    ExternalResourceDescription feDesc = f.getActualValue();
    Map<String, Object> parameterSettings = ConfigurationParameterFactory.getParameterSettings(feDesc.getResourceSpecifier());
    List<String> keySet = new ArrayList<>(parameterSettings.keySet());
    for (int i = 0; i < keySet.size(); i++) {
        String key = keySet.get(i);
        String value = parameterSettings.get(key).toString();
        if (valueExistAsFileOrFolderInTheFileSystem(value)) {
            String name = new File(value).getName();
            String destination = aOutputFolder + "/" + name;
            copyToTargetLocation(new File(value), new File(destination));
            sb = record(i, keySet, name, sb);
            continue;
        }
        sb = record(i, keySet, parameterSettings, sb);
    }
    sb.append("\n");
    return sb;
}
Also used : ArrayList(java.util.ArrayList) File(java.io.File) ExternalResourceDescription(org.apache.uima.resource.ExternalResourceDescription)

Example 14 with ExternalResourceDescription

use of org.apache.uima.resource.ExternalResourceDescription in project dkpro-tc by dkpro.

the class FeatureResourceLoader method loadExternalResourceDescriptionOfFeatures.

public List<ExternalResourceDescription> loadExternalResourceDescriptionOfFeatures() throws Exception {
    List<ExternalResourceDescription> erd = new ArrayList<>();
    File file = new File(tcModelLocation, MODEL_FEATURE_EXTRACTOR_CONFIGURATION);
    assertModelFolderExists(file);
    for (String l : FileUtils.readLines(file, "utf-8")) {
        String[] split = l.split("\t");
        String name = split[0];
        Object[] parameters = getParameters(split);
        Class<? extends Resource> feClass = urlClassLoader.loadClass(name).asSubclass(Resource.class);
        List<Object> idRemovedParameters = filterId(parameters);
        String id = getId(parameters);
        idRemovedParameters = addModelPathAsPrefixIfParameterIsExistingFile(idRemovedParameters, tcModelLocation.getAbsolutePath());
        TcFeature feature = TcFeatureFactory.create(id, feClass, idRemovedParameters.toArray());
        ExternalResourceDescription exRes = feature.getActualValue();
        // Skip feature extractors that are not dependent on meta collectors
        if (!MetaDependent.class.isAssignableFrom(feClass)) {
            erd.add(exRes);
            continue;
        }
        Map<String, String> overrides = loadOverrides(tcModelLocation, META_COLLECTOR_OVERRIDE);
        configureOverrides(tcModelLocation, exRes, overrides);
        overrides = loadOverrides(tcModelLocation, META_EXTRACTOR_OVERRIDE);
        configureOverrides(tcModelLocation, exRes, overrides);
        erd.add(exRes);
    }
    urlClassLoader.close();
    return erd;
}
Also used : TcFeature(org.dkpro.tc.api.features.TcFeature) ArrayList(java.util.ArrayList) MetaDependent(org.dkpro.tc.api.features.meta.MetaDependent) File(java.io.File) ExternalResourceDescription(org.apache.uima.resource.ExternalResourceDescription)

Example 15 with ExternalResourceDescription

use of org.apache.uima.resource.ExternalResourceDescription in project dkpro-tc by dkpro.

the class MetaInfoTask method getAnalysisEngineDescription.

@Override
public AnalysisEngineDescription getAnalysisEngineDescription(TaskContext aContext) throws ResourceInitializationException, IOException {
    featureExtractorNames = new HashSet<>();
    // check for error conditions
    if (featureExtractors == null) {
        throw new ResourceInitializationException(new TextClassificationException("No feature extractors have been added to the experiment."));
    }
    List<AnalysisEngineDescription> metaCollectors = new ArrayList<>();
    if (recordContext) {
        AnalysisEngineDescription aed = injectContextMetaCollector(aContext);
        if (aed == null) {
            throw new NullPointerException("Initializing a ContextMetaCollector returned an AnalysisEngineDescription which was [NULL]");
        }
        metaCollectors.add(aed);
    }
    try {
        // Configure the meta collectors for each feature extractor individually
        for (TcFeature feClosure : featureExtractors) {
            ExternalResourceDescription feDesc = feClosure.getActualValue();
            Class<?> feClass = getClass(feDesc);
            // Skip feature extractors that are not dependent on meta collectors
            if (!MetaDependent.class.isAssignableFrom(feClass)) {
                continue;
            }
            MetaDependent feInstance = (MetaDependent) feClass.newInstance();
            Map<String, Object> parameterSettings = ConfigurationParameterFactory.getParameterSettings(feDesc.getResourceSpecifier());
            validateUniqueFeatureExtractorNames(parameterSettings);
            // Tell the meta collectors where to store their data
            for (MetaCollectorConfiguration conf : feInstance.getMetaCollectorClasses(parameterSettings)) {
                configureStorageLocations(aContext, conf.descriptor, (String) feClosure.getId(), conf.collectorOverrides, AccessMode.READWRITE);
                metaCollectors.add(conf.descriptor);
            }
        }
    } catch (ClassNotFoundException | InstantiationException | IllegalAccessException e) {
        throw new ResourceInitializationException(e);
    }
    // make sure that the meta key import can be resolved (even when no meta features have been
    // extracted, as in the regression demo)
    aContext.getFolder(META_KEY, AccessMode.READONLY);
    AggregateBuilder builder = new AggregateBuilder();
    for (AnalysisEngineDescription metaCollector : metaCollectors) {
        if (operativeViews != null) {
            for (String viewName : operativeViews) {
                builder.add(metaCollector, CAS.NAME_DEFAULT_SOFA, viewName);
            }
        } else {
            builder.add(metaCollector);
        }
    }
    return builder.createAggregateDescription();
}
Also used : TcFeature(org.dkpro.tc.api.features.TcFeature) TextClassificationException(org.dkpro.tc.api.exception.TextClassificationException) ArrayList(java.util.ArrayList) MetaDependent(org.dkpro.tc.api.features.meta.MetaDependent) AggregateBuilder(org.apache.uima.fit.factory.AggregateBuilder) ResourceInitializationException(org.apache.uima.resource.ResourceInitializationException) AnalysisEngineDescription(org.apache.uima.analysis_engine.AnalysisEngineDescription) MetaCollectorConfiguration(org.dkpro.tc.api.features.meta.MetaCollectorConfiguration) ExternalResourceDescription(org.apache.uima.resource.ExternalResourceDescription)

Aggregations

ExternalResourceDescription (org.apache.uima.resource.ExternalResourceDescription)27 ArrayList (java.util.ArrayList)17 File (java.io.File)10 AnalysisEngineDescription (org.apache.uima.analysis_engine.AnalysisEngineDescription)10 JsonDataWriter (org.dkpro.tc.core.io.JsonDataWriter)8 CollectionReaderDescription (org.apache.uima.collection.CollectionReaderDescription)7 Instance (org.dkpro.tc.api.features.Instance)5 Test (org.junit.Test)5 Gson (com.google.gson.Gson)4 MetaDependent (org.dkpro.tc.api.features.meta.MetaDependent)4 CustomResourceSpecifier (org.apache.uima.resource.CustomResourceSpecifier)3 TcFeature (org.dkpro.tc.api.features.TcFeature)3 MetaCollectorConfiguration (org.dkpro.tc.api.features.meta.MetaCollectorConfiguration)3 HashMap (java.util.HashMap)2 UimaContextAdmin (org.apache.uima.UimaContextAdmin)2 RootUimaContext_impl (org.apache.uima.impl.RootUimaContext_impl)2 ResourceInitializationException (org.apache.uima.resource.ResourceInitializationException)2 ResourceManager (org.apache.uima.resource.ResourceManager)2 ResourceManager_impl (org.apache.uima.resource.impl.ResourceManager_impl)2 ResourceManagerConfiguration (org.apache.uima.resource.metadata.ResourceManagerConfiguration)2