Search in sources :

Example 1 with TypeSystemDescription

use of org.apache.uima.resource.metadata.TypeSystemDescription in project dkpro-lab by dkpro.

the class SimpleExecutionEngineTest method testInit.

@Test
public void testInit() throws Exception {
    File repo = new File("target/repository");
    FileUtils.deleteDirectory(repo);
    ((FileSystemStorageService) storageService).setStorageRoot(repo);
    assertNotNull(executionService);
    assertNotNull(contextFactory);
    TypeSystemDescription tsd = createTypeSystemDescription(new String[0]);
    AnalysisEngineDescription desc = createEngineDescription(DummyAE.class, tsd);
    DefaultUimaTask cfg = new DefaultUimaTask();
    cfg.setReaderDescription(createReaderDescription(TestReader.class, tsd));
    cfg.setAnalysisEngineDescription(desc);
    TaskExecutionEngine runner = executionService.createEngine(cfg);
    String uuid = runner.run(cfg);
    System.out.println("=== Experiments in repository ===");
    List<TaskContextMetadata> experiments = storageService.getContexts();
    for (TaskContextMetadata e : experiments) {
        System.out.println(e);
    }
    final StringBuilder sb = new StringBuilder();
    storageService.retrieveBinary(uuid, "test", new StreamReader() {

        @Override
        public void read(InputStream aInputStream) throws IOException {
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            Util.shoveAndClose(aInputStream, bos);
            sb.append(new String(bos.toByteArray(), "UTF-8"));
        }
    });
    assertEquals("works", sb.toString());
}
Also used : TypeSystemDescription(org.apache.uima.resource.metadata.TypeSystemDescription) TypeSystemDescriptionFactory.createTypeSystemDescription(org.apache.uima.fit.factory.TypeSystemDescriptionFactory.createTypeSystemDescription) ByteArrayInputStream(java.io.ByteArrayInputStream) InputStream(java.io.InputStream) TaskExecutionEngine(org.dkpro.lab.engine.TaskExecutionEngine) IOException(java.io.IOException) ByteArrayOutputStream(java.io.ByteArrayOutputStream) DefaultUimaTask(org.dkpro.lab.uima.task.impl.DefaultUimaTask) TaskContextMetadata(org.dkpro.lab.task.TaskContextMetadata) StreamReader(org.dkpro.lab.storage.StreamReader) AnalysisEngineDescription(org.apache.uima.analysis_engine.AnalysisEngineDescription) File(java.io.File) FileSystemStorageService(org.dkpro.lab.storage.filesystem.FileSystemStorageService) Test(org.junit.Test)

Example 2 with TypeSystemDescription

use of org.apache.uima.resource.metadata.TypeSystemDescription in project webanno by webanno.

the class ImportExportServiceImpl method importCasFromFile.

@Override
@SuppressWarnings({ "rawtypes", "unchecked" })
public JCas importCasFromFile(File aFile, Project aProject, String aFormat) throws UIMAException, IOException {
    Class readerClass = getReadableFormats().get(aFormat);
    if (readerClass == null) {
        throw new IOException("No reader available for format [" + aFormat + "]");
    }
    // Prepare a CAS with the project type system
    TypeSystemDescription builtInTypes = TypeSystemDescriptionFactory.createTypeSystemDescription();
    TypeSystemDescription projectTypes = annotationService.getProjectTypes(aProject);
    TypeSystemDescription allTypes = CasCreationUtils.mergeTypeSystems(asList(projectTypes, builtInTypes));
    CAS cas = JCasFactory.createJCas(allTypes).getCas();
    // Convert the source document to CAS
    CollectionReader reader = CollectionReaderFactory.createReader(readerClass, ResourceCollectionReaderBase.PARAM_SOURCE_LOCATION, aFile.getParentFile().getAbsolutePath(), ResourceCollectionReaderBase.PARAM_PATTERNS, new String[] { "[+]" + aFile.getName() });
    if (!reader.hasNext()) {
        throw new FileNotFoundException("Source file [" + aFile.getName() + "] not found in [" + aFile.getPath() + "]");
    }
    reader.getNext(cas);
    JCas jCas = cas.getJCas();
    // Create sentence / token annotations if they are missing
    boolean hasTokens = JCasUtil.exists(jCas, Token.class);
    boolean hasSentences = JCasUtil.exists(jCas, Sentence.class);
    if (!hasSentences) {
        splitSentences(jCas);
    }
    if (!hasTokens) {
        tokenize(jCas);
    }
    if (!JCasUtil.exists(jCas, Token.class) || !JCasUtil.exists(jCas, Sentence.class)) {
        throw new IOException("The document appears to be empty. Unable to detect any " + "tokens or sentences. Empty documents cannot be imported.");
    }
    return jCas;
}
Also used : TypeSystemDescription(org.apache.uima.resource.metadata.TypeSystemDescription) CollectionReader(org.apache.uima.collection.CollectionReader) CAS(org.apache.uima.cas.CAS) FileNotFoundException(java.io.FileNotFoundException) JCas(org.apache.uima.jcas.JCas) IOException(java.io.IOException)

Example 3 with TypeSystemDescription

use of org.apache.uima.resource.metadata.TypeSystemDescription in project webanno by webanno.

the class ConstraintsGeneratorTest method makeJCasOneSentence.

private JCas makeJCasOneSentence() throws UIMAException {
    TypeSystemDescription global = TypeSystemDescriptionFactory.createTypeSystemDescription();
    TypeSystemDescription local = TypeSystemDescriptionFactory.createTypeSystemDescriptionFromPath("src/test/resources/desc/types/webannoTestTypes.xml");
    TypeSystemDescription merged = CasCreationUtils.mergeTypeSystems(asList(global, local));
    JCas jcas = JCasFactory.createJCas(merged);
    DocumentMetaData.create(jcas).setDocumentId("doc");
    TokenBuilder<Token, Sentence> tb = new TokenBuilder<>(Token.class, Sentence.class);
    tb.buildTokens(jcas, "This is a test .");
    return jcas;
}
Also used : TokenBuilder(org.apache.uima.fit.testing.factory.TokenBuilder) TypeSystemDescription(org.apache.uima.resource.metadata.TypeSystemDescription) JCas(org.apache.uima.jcas.JCas) Token(de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token) Sentence(de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence)

Example 4 with TypeSystemDescription

use of org.apache.uima.resource.metadata.TypeSystemDescription in project webanno by webanno.

the class DiffUtils method readWebAnnoTSV.

public static JCas readWebAnnoTSV(String aPath, TypeSystemDescription aType) throws UIMAException, IOException {
    CollectionReader reader = createReader(WebannoTsv2Reader.class, WebannoTsv2Reader.PARAM_SOURCE_LOCATION, "src/test/resources/" + aPath);
    JCas jcas;
    if (aType != null) {
        TypeSystemDescription builtInTypes = TypeSystemDescriptionFactory.createTypeSystemDescription();
        List<TypeSystemDescription> allTypes = new ArrayList<>();
        allTypes.add(builtInTypes);
        allTypes.add(aType);
        jcas = JCasFactory.createJCas(CasCreationUtils.mergeTypeSystems(allTypes));
    } else {
        jcas = JCasFactory.createJCas();
    }
    reader.getNext(jcas.getCas());
    return jcas;
}
Also used : CollectionReader(org.apache.uima.collection.CollectionReader) TypeSystemDescription(org.apache.uima.resource.metadata.TypeSystemDescription) ArrayList(java.util.ArrayList) JCas(org.apache.uima.jcas.JCas)

Example 5 with TypeSystemDescription

use of org.apache.uima.resource.metadata.TypeSystemDescription in project webanno by webanno.

the class ComplexTypeTest method testCountryType.

@Test
public void testCountryType() throws Exception {
    TypeSystemDescription tsd = TypeSystemDescriptionFactory.createTypeSystemDescription("desc.types.TestTypeSystemDescriptor");
    CAS cas = CasCreationUtils.createCas(tsd, null, null);
    cas.setDocumentText("Asia is the largest continent on Earth. Asia is subdivided into 48 countries, two of them (Russia and Turkey) having part of their land in Europe. The most active place on Earth for tropical cyclone activity lies northeast of the Philippines and south of Japan. The Gobi Desert is in Mongolia and the Arabian Desert stretches across much of the Middle East. The Yangtze River in China is the longest river in the continent. The Himalayas between Nepal and China is the tallest mountain range in the world. Tropical rainforests stretch across much of southern Asia and coniferous and deciduous forests lie farther north.");
    TypeSystem ts = cas.getTypeSystem();
    Type continentType = ts.getType("de.Continent");
    Feature continentName = continentType.getFeatureByBaseName("name");
    AnnotationFS asiaContinent = cas.createAnnotation(continentType, 0, 4);
    asiaContinent.setStringValue(continentName, "Asia");
    cas.addFsToIndexes(asiaContinent);
    Type countryType = ts.getType("de.Country");
    Feature countryName = countryType.getFeatureByBaseName("name");
    AnnotationFS russia = cas.createAnnotation(countryType, 56, 62);
    russia.setStringValue(countryName, "Russian Federation");
    Feature continentFeature = countryType.getFeatureByBaseName("continent");
    russia.setFeatureValue(continentFeature, asiaContinent);
    cas.addFsToIndexes(russia);
    ConstraintsGrammar parser = new ConstraintsGrammar(new FileInputStream("src/test/resources/rules/region.rules"));
    Parse p = parser.Parse();
    ParsedConstraints constraints = p.accept(new ParserVisitor());
    Evaluator constraintsEvaluator = new ValuesGenerator();
    List<PossibleValue> possibleValues = constraintsEvaluator.generatePossibleValues(russia, "regionType", constraints);
    List<PossibleValue> exValues = new LinkedList<>();
    exValues.add(new PossibleValue("cold", true));
    assertEquals(possibleValues, exValues);
}
Also used : TypeSystem(org.apache.uima.cas.TypeSystem) TypeSystemDescription(org.apache.uima.resource.metadata.TypeSystemDescription) Parse(de.tudarmstadt.ukp.clarin.webanno.constraints.grammar.syntaxtree.Parse) ParserVisitor(de.tudarmstadt.ukp.clarin.webanno.constraints.visitor.ParserVisitor) ParsedConstraints(de.tudarmstadt.ukp.clarin.webanno.constraints.model.ParsedConstraints) ValuesGenerator(de.tudarmstadt.ukp.clarin.webanno.constraints.evaluator.ValuesGenerator) Evaluator(de.tudarmstadt.ukp.clarin.webanno.constraints.evaluator.Evaluator) Feature(org.apache.uima.cas.Feature) FileInputStream(java.io.FileInputStream) LinkedList(java.util.LinkedList) AnnotationFS(org.apache.uima.cas.text.AnnotationFS) Type(org.apache.uima.cas.Type) CAS(org.apache.uima.cas.CAS) PossibleValue(de.tudarmstadt.ukp.clarin.webanno.constraints.evaluator.PossibleValue) ConstraintsGrammar(de.tudarmstadt.ukp.clarin.webanno.constraints.grammar.ConstraintsGrammar) Test(org.junit.Test)

Aggregations

TypeSystemDescription (org.apache.uima.resource.metadata.TypeSystemDescription)34 Test (org.junit.Test)23 JCas (org.apache.uima.jcas.JCas)13 ArrayList (java.util.ArrayList)11 TypeSystemDescriptionFactory.createTypeSystemDescription (org.apache.uima.fit.factory.TypeSystemDescriptionFactory.createTypeSystemDescription)10 SoftAssertions (org.assertj.core.api.SoftAssertions)9 CAS (org.apache.uima.cas.CAS)8 Type (org.apache.uima.cas.Type)7 Token (de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token)6 AnnotationFS (org.apache.uima.cas.text.AnnotationFS)6 DiffResult (de.tudarmstadt.ukp.clarin.webanno.curation.casdiff.CasDiff2.DiffResult)5 AnnotationLayer (de.tudarmstadt.ukp.clarin.webanno.model.AnnotationLayer)5 Arrays.asList (java.util.Arrays.asList)5 List (java.util.List)5 SpanDiffAdapter (de.tudarmstadt.ukp.clarin.webanno.curation.casdiff.CasDiff2.SpanDiffAdapter)4 AnnotationFeature (de.tudarmstadt.ukp.clarin.webanno.model.AnnotationFeature)4 Sentence (de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence)4 TypeSystem (org.apache.uima.cas.TypeSystem)4 TypeDescription (org.apache.uima.resource.metadata.TypeDescription)4 ArcDiffAdapter (de.tudarmstadt.ukp.clarin.webanno.curation.casdiff.CasDiff2.ArcDiffAdapter)3