Search in sources :

Example 6 with TsvDocument

use of de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvDocument in project webanno by webanno.

the class WebannoTsv3XWriter method process.

@Override
public void process(JCas aJCas) throws AnalysisEngineProcessException {
    TsvSchema schema = Tsv3XCasSchemaAnalyzer.analyze(aJCas.getTypeSystem());
    TsvDocument doc = Tsv3XCasDocumentBuilder.of(schema, aJCas);
    try (PrintWriter docOS = new PrintWriter(new OutputStreamWriter(getOutputStream(aJCas, filenameSuffix), encoding))) {
        new Tsv3XSerializer().write(docOS, doc);
    } catch (IOException e) {
        throw new AnalysisEngineProcessException(e);
    }
}
Also used : TsvDocument(de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvDocument) TsvSchema(de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvSchema) OutputStreamWriter(java.io.OutputStreamWriter) Tsv3XSerializer(de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.Tsv3XSerializer) IOException(java.io.IOException) AnalysisEngineProcessException(org.apache.uima.analysis_engine.AnalysisEngineProcessException) PrintWriter(java.io.PrintWriter)

Example 7 with TsvDocument

use of de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvDocument in project webanno by webanno.

the class Tsv3XSerializerTest method testRelation.

@Test
public void testRelation() throws Exception {
    // Create test document
    JCas cas = makeJCasOneSentence("This is a test .");
    List<Token> tokens = new ArrayList<>(select(cas, Token.class));
    Dependency dep = new Dependency(cas);
    dep.setGovernor(tokens.get(0));
    dep.setDependent(tokens.get(1));
    dep.setDependencyType("dep");
    dep.setBegin(dep.getDependent().getBegin());
    dep.setEnd(dep.getDependent().getEnd());
    dep.addToIndexes();
    // Set up TSV schema
    TsvSchema schema = new TsvSchema();
    Type dependencyType = cas.getCasType(Dependency.type);
    schema.addColumn(new TsvColumn(dependencyType, LayerType.RELATION, "DependencyType", FeatureType.PRIMITIVE));
    schema.addColumn(new TsvColumn(dependencyType, LayerType.RELATION, "Governor", FeatureType.RELATION_REF));
    // Convert test document content to TSV model
    TsvDocument doc = Tsv3XCasDocumentBuilder.of(schema, cas);
    doc.getSentences().get(0).getTokens().get(1).addUimaAnnotation(dep, false);
    assertEquals(join(asList("1-1\t0-4\tThis\t_\t_\t", "1-2\t5-7\tis\tdep\t1-1\t"), "\n"), join(asList(doc.getToken(0, 0), doc.getToken(0, 1)), "\n"));
    String expectedSentence = "#Text=This is a test .\n" + "1-1\t0-4\tThis\t_\t_\t\n" + "1-2\t5-7\tis\tdep\t1-1\t\n" + "1-3\t8-9\ta\t_\t_\t\n" + "1-4\t10-14\ttest\t_\t_\t\n" + "1-5\t15-16\t.\t_\t_\t\n";
    assertEquals(expectedSentence, doc.getSentences().get(0).toString());
}
Also used : LayerType(de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.LayerType) Type(org.apache.uima.cas.Type) FeatureType(de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.FeatureType) TsvColumn(de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvColumn) ArrayList(java.util.ArrayList) TsvDocument(de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvDocument) JCas(org.apache.uima.jcas.JCas) TsvSchema(de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvSchema) Token(de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token) Dependency(de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency) Test(org.junit.Test)

Example 8 with TsvDocument

use of de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvDocument in project webanno by webanno.

the class Tsv3XSerializer method write.

public void write(PrintWriter aOut, TsvUnit aUnit) {
    TsvDocument doc = aUnit.getDocument();
    // Write unit ID
    aOut.print(aUnit.getId());
    aOut.print(FIELD_SEPARATOR);
    // Write unit offset
    aOut.printf("%d-%d", aUnit.getBegin(), aUnit.getEnd());
    aOut.print(FIELD_SEPARATOR);
    // Write unit text
    aOut.print(doc.getJCas().getDocumentText().substring(aUnit.getBegin(), aUnit.getEnd()));
    aOut.printf(FIELD_SEPARATOR);
    // Write the remaining columns according to the schema definition
    for (TsvColumn col : doc.getSchema().getHeaderColumns(doc.getActiveColumns())) {
        // Write all the values in this column - there could be multiple due to stacking
        writeValues(aOut, aUnit, col);
        aOut.printf(FIELD_SEPARATOR);
    }
}
Also used : TsvColumn(de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvColumn) TsvDocument(de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvDocument)

Aggregations

TsvDocument (de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvDocument)8 TsvColumn (de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvColumn)7 TsvSchema (de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvSchema)6 LayerType (de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.LayerType)5 Type (org.apache.uima.cas.Type)5 FeatureType (de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.FeatureType)4 JCas (org.apache.uima.jcas.JCas)4 Test (org.junit.Test)4 TsvFormatHeader (de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvFormatHeader)2 Token (de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token)2 ArrayList (java.util.ArrayList)2 FeatureStructure (org.apache.uima.cas.FeatureStructure)2 AnnotationFS (org.apache.uima.cas.text.AnnotationFS)2 Tsv3XSerializer (de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.Tsv3XSerializer)1 TsvChain (de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvChain)1 TsvSentence (de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvSentence)1 TsvSubToken (de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvSubToken)1 TsvToken (de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.model.TsvToken)1 NamedEntity (de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity)1 Sentence (de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence)1