Search in sources :

Example 1 with TokensLayer

use of eu.clarin.weblicht.wlfxb.tc.api.TokensLayer in project webanno by webanno.

the class TcfWriter method writeOrthograph.

private void writeOrthograph(JCas aJCas, TextCorpus aTextCorpus) {
    if (!JCasUtil.exists(aJCas, SofaChangeAnnotation.class)) {
        // Do nothing if there are no SofaChangeAnnotation layer
        // (Which is equivalent to Orthography layer in TCF) in the CAS
        getLogger().debug("Layer [" + TextCorpusLayerTag.ORTHOGRAPHY.getXmlName() + "]: empty");
        return;
    }
    // Tokens layer must already exist
    TokensLayer tokensLayer = aTextCorpus.getTokensLayer();
    // create orthographyLayer annotation layer
    OrthographyLayer orthographyLayer = aTextCorpus.createOrthographyLayer();
    getLogger().debug("Layer [" + TextCorpusLayerTag.ORTHOGRAPHY.getXmlName() + "]: created");
    int j = 0;
    for (Token token : select(aJCas, Token.class)) {
        List<SofaChangeAnnotation> scas = selectCovered(aJCas, SofaChangeAnnotation.class, token.getBegin(), token.getEnd());
        if (scas.size() > 0 && orthographyLayer != null) {
            orthographyLayer.addCorrection(scas.get(0).getValue(), tokensLayer.getToken(j), CorrectionOperation.valueOf(scas.get(0).getOperation()));
        }
        j++;
    }
}
Also used : SofaChangeAnnotation(de.tudarmstadt.ukp.dkpro.core.api.transform.type.SofaChangeAnnotation) Token(de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token) TokensLayer(eu.clarin.weblicht.wlfxb.tc.api.TokensLayer) OrthographyLayer(eu.clarin.weblicht.wlfxb.tc.api.OrthographyLayer)

Example 2 with TokensLayer

use of eu.clarin.weblicht.wlfxb.tc.api.TokensLayer in project webanno by webanno.

the class TcfWriter method writeTokens.

private Map<Integer, eu.clarin.weblicht.wlfxb.tc.api.Token> writeTokens(JCas aJCas, TextCorpus aTextCorpus) {
    boolean tokensLayerCreated = false;
    // Create tokens layer if it does not exist
    TokensLayer tokensLayer = aTextCorpus.getTokensLayer();
    if (tokensLayer == null) {
        tokensLayer = aTextCorpus.createTokensLayer();
        tokensLayerCreated = true;
        getLogger().debug("Layer [" + TextCorpusLayerTag.TOKENS.getXmlName() + "]: created");
    } else {
        getLogger().debug("Layer [" + TextCorpusLayerTag.TOKENS.getXmlName() + "]: found");
    }
    Map<Integer, eu.clarin.weblicht.wlfxb.tc.api.Token> tokensBeginPositionMap = new HashMap<>();
    int j = 0;
    for (Token token : select(aJCas, Token.class)) {
        if (tokensLayerCreated) {
            tokensLayer.addToken(token.getCoveredText());
        }
        tokensBeginPositionMap.put(token.getBegin(), tokensLayer.getToken(j));
        j++;
    }
    return tokensBeginPositionMap;
}
Also used : HashMap(java.util.HashMap) Token(de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token) TokensLayer(eu.clarin.weblicht.wlfxb.tc.api.TokensLayer)

Example 3 with TokensLayer

use of eu.clarin.weblicht.wlfxb.tc.api.TokensLayer in project webanno by webanno.

the class TcfWriter method writeLemmas.

private void writeLemmas(JCas aJCas, TextCorpus aTextCorpus, Map<Integer, eu.clarin.weblicht.wlfxb.tc.api.Token> aTokensBeginPositionMap) {
    if (!JCasUtil.exists(aJCas, Lemma.class)) {
        // Do nothing if there are no lemmas in the CAS
        getLogger().debug("Layer [" + TextCorpusLayerTag.LEMMAS.getXmlName() + "]: empty");
        return;
    }
    // Tokens layer must already exist
    TokensLayer tokensLayer = aTextCorpus.getTokensLayer();
    // create lemma annotation layer
    LemmasLayer lemmasLayer = aTextCorpus.createLemmasLayer();
    getLogger().debug("Layer [" + TextCorpusLayerTag.LEMMAS.getXmlName() + "]: created");
    int j = 0;
    for (Token coveredToken : select(aJCas, Token.class)) {
        Lemma lemma = coveredToken.getLemma();
        if (lemma != null && lemmasLayer != null) {
            String lemmaValue = coveredToken.getLemma().getValue();
            lemmasLayer.addLemma(lemmaValue, tokensLayer.getToken(j));
        }
        j++;
    }
}
Also used : Lemma(de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Lemma) Token(de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token) TokensLayer(eu.clarin.weblicht.wlfxb.tc.api.TokensLayer) LemmasLayer(eu.clarin.weblicht.wlfxb.tc.api.LemmasLayer)

Example 4 with TokensLayer

use of eu.clarin.weblicht.wlfxb.tc.api.TokensLayer in project webanno by webanno.

the class TcfWriter method writePosTags.

private void writePosTags(JCas aJCas, TextCorpus aTextCorpus, Map<Integer, eu.clarin.weblicht.wlfxb.tc.api.Token> aTokensBeginPositionMap) {
    if (!JCasUtil.exists(aJCas, POS.class)) {
        // Do nothing if there are no part-of-speech tags in the CAS
        getLogger().debug("Layer [" + TextCorpusLayerTag.POSTAGS.getXmlName() + "]: empty");
        return;
    }
    // Tokens layer must already exist
    TokensLayer tokensLayer = aTextCorpus.getTokensLayer();
    // create POS tag annotation layer
    String posTagSet = "STTS";
    for (TagsetDescription tagSet : select(aJCas, TagsetDescription.class)) {
        if (tagSet.getLayer().equals(POS.class.getName())) {
            posTagSet = tagSet.getName();
            break;
        }
    }
    PosTagsLayer posLayer = aTextCorpus.createPosTagsLayer(posTagSet);
    getLogger().debug("Layer [" + TextCorpusLayerTag.POSTAGS.getXmlName() + "]: created");
    int j = 0;
    for (Token coveredToken : select(aJCas, Token.class)) {
        POS pos = coveredToken.getPos();
        if (pos != null && posLayer != null) {
            String posValue = coveredToken.getPos().getPosValue();
            posLayer.addTag(posValue, tokensLayer.getToken(j));
        }
        j++;
    }
}
Also used : POS(de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS) PosTagsLayer(eu.clarin.weblicht.wlfxb.tc.api.PosTagsLayer) Token(de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token) TokensLayer(eu.clarin.weblicht.wlfxb.tc.api.TokensLayer) TagsetDescription(de.tudarmstadt.ukp.dkpro.core.api.metadata.type.TagsetDescription)

Aggregations

Token (de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token)4 TokensLayer (eu.clarin.weblicht.wlfxb.tc.api.TokensLayer)4 POS (de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS)1 TagsetDescription (de.tudarmstadt.ukp.dkpro.core.api.metadata.type.TagsetDescription)1 Lemma (de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Lemma)1 SofaChangeAnnotation (de.tudarmstadt.ukp.dkpro.core.api.transform.type.SofaChangeAnnotation)1 LemmasLayer (eu.clarin.weblicht.wlfxb.tc.api.LemmasLayer)1 OrthographyLayer (eu.clarin.weblicht.wlfxb.tc.api.OrthographyLayer)1 PosTagsLayer (eu.clarin.weblicht.wlfxb.tc.api.PosTagsLayer)1 HashMap (java.util.HashMap)1