Search in sources :

Example 36 with Annotation

use of com.joliciel.talismane.Annotation in project talismane by joliciel-informatique.

the class AbstractRegexAnnotator method annotate.

@Override
public void annotate(Sentence annotatedText, String... labels) {
    List<Annotation<TokenPlaceholder>> placeholders = new ArrayList<>();
    List<Annotation<TokenAttribute<?>>> annotations = new ArrayList<>();
    Matcher matcher = this.getPattern().matcher(annotatedText.getText());
    int lastStart = -1;
    while (matcher.find()) {
        int start = matcher.start(groupIndex);
        if (start > lastStart) {
            int end = matcher.end(groupIndex);
            if (LOG.isTraceEnabled()) {
                LOG.trace("Regex: " + this.regex);
                LOG.trace("Next match: " + annotatedText.getText().subSequence(matcher.start(), matcher.end()).toString().replace('\n', '¶').replace('\r', '¶'));
                if (matcher.start() != start || matcher.end() != end) {
                    LOG.trace("But matching group: " + annotatedText.getText().subSequence(start, end).toString().replace('\n', '¶').replace('\r', '¶'));
                }
            }
            if (this.singleToken) {
                String replacement = this.findReplacement(annotatedText.getText(), matcher);
                TokenPlaceholder placeholder = new TokenPlaceholder(replacement, regex);
                Annotation<TokenPlaceholder> placeholderAnnotation = new Annotation<>(start, end, placeholder, labels);
                placeholders.add(placeholderAnnotation);
                if (LOG.isTraceEnabled())
                    LOG.trace("Added placeholder: " + placeholder.toString());
            }
            for (String key : attributes.keySet()) {
                TokenAttribute<?> attribute = attributes.get(key);
                Annotation<TokenAttribute<?>> annotation = new Annotation<>(start, end, attribute, labels);
                annotations.add(annotation);
                if (LOG.isTraceEnabled())
                    LOG.trace("Added attribute: " + attribute.toString());
            }
        }
        lastStart = start;
    }
    annotatedText.addAnnotations(placeholders);
    annotatedText.addAnnotations(annotations);
}
Also used : Matcher(java.util.regex.Matcher) ArrayList(java.util.ArrayList) TokenAttribute(com.joliciel.talismane.tokeniser.TokenAttribute) Annotation(com.joliciel.talismane.Annotation)

Aggregations

Annotation (com.joliciel.talismane.Annotation)36 TalismaneTest (com.joliciel.talismane.TalismaneTest)28 Test (org.junit.Test)28 ArrayList (java.util.ArrayList)23 Config (com.typesafe.config.Config)22 AnnotatedText (com.joliciel.talismane.AnnotatedText)20 Sentence (com.joliciel.talismane.rawText.Sentence)12 RawTextSkipMarker (com.joliciel.talismane.rawText.RawTextMarker.RawTextSkipMarker)11 List (java.util.List)7 RawTextNoSentenceBreakMarker (com.joliciel.talismane.rawText.RawTextMarker.RawTextNoSentenceBreakMarker)6 RawTextSentenceBreakMarker (com.joliciel.talismane.rawText.RawTextMarker.RawTextSentenceBreakMarker)6 RawTextReplaceMarker (com.joliciel.talismane.rawText.RawTextMarker.RawTextReplaceMarker)4 TokenPlaceholder (com.joliciel.talismane.sentenceAnnotators.TokenPlaceholder)4 SentenceBoundary (com.joliciel.talismane.sentenceDetector.SentenceBoundary)4 TokenAttribute (com.joliciel.talismane.tokeniser.TokenAttribute)4 Matcher (java.util.regex.Matcher)4 AnnotationObserver (com.joliciel.talismane.AnnotationObserver)3 Decision (com.joliciel.talismane.machineLearning.Decision)3 DecisionMaker (com.joliciel.talismane.machineLearning.DecisionMaker)3 SentenceDetectorFeature (com.joliciel.talismane.sentenceDetector.features.SentenceDetectorFeature)3