Search in sources :

Example 1 with WLData

use of eu.clarin.weblicht.wlfxb.xb.WLData in project webanno by webanno.

the class TcfReader method getNext.

@Override
public void getNext(JCas aJCas) throws IOException, CollectionException {
    Resource res = nextFile();
    initCas(aJCas, res);
    InputStream is = null;
    try {
        is = new BufferedInputStream(res.getInputStream());
        WLData wLData = WLDObjector.read(is);
        TextCorpus aCorpusData = wLData.getTextCorpus();
        convertToCas(aJCas, aCorpusData);
    } catch (WLFormatException e) {
        throw new CollectionException(e);
    } finally {
        closeQuietly(is);
    }
}
Also used : BufferedInputStream(java.io.BufferedInputStream) BufferedInputStream(java.io.BufferedInputStream) InputStream(java.io.InputStream) CollectionException(org.apache.uima.collection.CollectionException) TextCorpus(eu.clarin.weblicht.wlfxb.tc.api.TextCorpus) WLData(eu.clarin.weblicht.wlfxb.xb.WLData) WLFormatException(eu.clarin.weblicht.wlfxb.io.WLFormatException)

Example 2 with WLData

use of eu.clarin.weblicht.wlfxb.xb.WLData in project webanno by webanno.

the class TcfWriter method casToTcfWriter.

/**
 * Create TCF File from scratch
 *
 * @param aJCas
 *            the JCas.
 * @param aOs
 *            the output stream.
 * @throws WLFormatException
 *             if a TCF problem occurs.
 */
public void casToTcfWriter(JCas aJCas, OutputStream aOs) throws WLFormatException {
    // create TextCorpus object, specifying its language from the aJcas Object
    TextCorpusStored textCorpus = new TextCorpusStored(aJCas.getDocumentLanguage());
    // create text annotation layer and add the string of the text into the layer
    textCorpus.createTextLayer().addText(aJCas.getDocumentText());
    write(aJCas, textCorpus);
    // write the annotated data object into the output stream
    WLData wldata = new WLData(textCorpus);
    WLDObjector.write(wldata, aOs);
}
Also used : TextCorpusStored(eu.clarin.weblicht.wlfxb.tc.xb.TextCorpusStored) WLData(eu.clarin.weblicht.wlfxb.xb.WLData)

Example 3 with WLData

use of eu.clarin.weblicht.wlfxb.xb.WLData in project webanno by webanno.

the class TcfReaderWriterTest method testOneWay.

public void testOneWay(String aInputFile, String aExpectedFile) throws Exception {
    CollectionReaderDescription reader = createReaderDescription(TcfReader.class, TcfReader.PARAM_SOURCE_LOCATION, "src/test/resources/", TcfReader.PARAM_PATTERNS, aInputFile);
    AnalysisEngineDescription writer = createEngineDescription(TcfWriter.class, TcfWriter.PARAM_TARGET_LOCATION, "target/test-output/oneway", TcfWriter.PARAM_FILENAME_SUFFIX, ".xml", TcfWriter.PARAM_STRIP_EXTENSION, true);
    AnalysisEngineDescription dumper = createEngineDescription(CasDumpWriter.class, CasDumpWriter.PARAM_OUTPUT_FILE, "target/test-output/oneway/dump.txt");
    runPipeline(reader, writer, dumper);
    InputStream isReference = new FileInputStream(new File("src/test/resources/" + aExpectedFile));
    InputStream isActual = new FileInputStream(new File("target/test-output/oneway/" + aInputFile));
    WLData wLDataReference = WLDObjector.read(isReference);
    TextCorpusStored aCorpusDataReference = wLDataReference.getTextCorpus();
    WLData wLDataActual = WLDObjector.read(isActual);
    TextCorpusStored aCorpusDataActual = wLDataActual.getTextCorpus();
    // check if layers maintained
    assertEquals(aCorpusDataReference.getLayers().size(), aCorpusDataActual.getLayers().size());
    // Check if every layers have the same number of annotations
    for (TextCorpusLayer layer : aCorpusDataReference.getLayers()) {
        assertEquals("Layer size mismatch in [" + layer.getClass().getName() + "]", layer.size(), getLayer(aCorpusDataActual, layer.getClass()).size());
    }
    XMLAssert.assertXMLEqual(new InputSource("src/test/resources/" + aExpectedFile), new InputSource(new File("target/test-output/oneway/" + aInputFile).getPath()));
}
Also used : CollectionReaderDescription(org.apache.uima.collection.CollectionReaderDescription) TextCorpusLayer(eu.clarin.weblicht.wlfxb.tc.api.TextCorpusLayer) InputSource(org.xml.sax.InputSource) FileInputStream(java.io.FileInputStream) InputStream(java.io.InputStream) TextCorpusStored(eu.clarin.weblicht.wlfxb.tc.xb.TextCorpusStored) AnalysisEngineDescription(org.apache.uima.analysis_engine.AnalysisEngineDescription) WLData(eu.clarin.weblicht.wlfxb.xb.WLData) File(java.io.File) FileInputStream(java.io.FileInputStream)

Aggregations

WLData (eu.clarin.weblicht.wlfxb.xb.WLData)3 TextCorpusStored (eu.clarin.weblicht.wlfxb.tc.xb.TextCorpusStored)2 InputStream (java.io.InputStream)2 WLFormatException (eu.clarin.weblicht.wlfxb.io.WLFormatException)1 TextCorpus (eu.clarin.weblicht.wlfxb.tc.api.TextCorpus)1 TextCorpusLayer (eu.clarin.weblicht.wlfxb.tc.api.TextCorpusLayer)1 BufferedInputStream (java.io.BufferedInputStream)1 File (java.io.File)1 FileInputStream (java.io.FileInputStream)1 AnalysisEngineDescription (org.apache.uima.analysis_engine.AnalysisEngineDescription)1 CollectionException (org.apache.uima.collection.CollectionException)1 CollectionReaderDescription (org.apache.uima.collection.CollectionReaderDescription)1 InputSource (org.xml.sax.InputSource)1