Examples with AnalyzedToken - org.languagetool.AnalyzedToken

Example 81 with AnalyzedToken

use of org.languagetool.AnalyzedToken in project languagetool by languagetool-org.

the class GermanTaggerTest method toSortedString.

/**
   * Returns a string representation like {@code toString()}, but sorts
   * the elements alphabetically.
   */
private String toSortedString(AnalyzedTokenReadings tokenReadings) {
    StringBuilder sb = new StringBuilder(tokenReadings.getToken());
    Set<String> elements = new TreeSet<>();
    sb.append('[');
    for (AnalyzedToken reading : tokenReadings) {
        if (!elements.contains(reading.toString())) {
            elements.add(reading.toString());
        }
    }
    sb.append(String.join(", ", elements));
    sb.append(']');
    return sb.toString();
}

Also used : AnalyzedToken(org.languagetool.AnalyzedToken)

Example 82 with AnalyzedToken

use of org.languagetool.AnalyzedToken in project languagetool by languagetool-org.

the class GreekTagger method additionalTags.

@Override
protected List<AnalyzedToken> additionalTags(String word, WordTagger wordTagger) {
    List<AnalyzedToken> tokens = new ArrayList<>();
    List<Lemma> lemma = tagger.getLemma(word, false);
    for (Lemma lm : lemma) {
        AnalyzedToken tk = new AnalyzedToken(word, lm.getTag(), lm.getLemma());
        tokens.add(tk);
    }
    return tokens;
}

Also used : AnalyzedToken(org.languagetool.AnalyzedToken) Lemma(org.ioperm.morphology.el.Lemma) ArrayList(java.util.ArrayList)

Example 83 with AnalyzedToken

use of org.languagetool.AnalyzedToken in project languagetool by languagetool-org.

the class AbstractEnglishSpellerRule method getIrregularFormsOrNull.

@Nullable
private IrregularForms getIrregularFormsOrNull(String word, String wordSuffix, List<String> suffixes, String posTag, String posName, String formName) {
    try {
        for (String suffix : suffixes) {
            if (word.endsWith(wordSuffix)) {
                String baseForm = word.substring(0, word.length() - suffix.length());
                String[] forms = synthesizer.synthesize(new AnalyzedToken(word, null, baseForm), posTag);
                List<String> result = new ArrayList<>();
                for (String form : forms) {
                    if (!speller1.isMisspelled(form)) {
                        // only accept suggestions that the spellchecker will accept
                        result.add(form);
                    }
                }
                // the internal dict might contain forms that the spell checker doesn't accept (e.g. 'criterions'),
                // but we trust the spell checker in this case:
                result.remove(word);
                // non-standard usage
                result.remove("badder");
                // non-standard usage
                result.remove("baddest");
                // can be removed after dict update
                result.remove("spake");
                if (result.size() > 0) {
                    return new IrregularForms(baseForm, posName, formName, result);
                }
            }
        }
        return null;
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

Also used : AnalyzedToken(org.languagetool.AnalyzedToken) IOException(java.io.IOException) Nullable(org.jetbrains.annotations.Nullable)

Example 84 with AnalyzedToken

use of org.languagetool.AnalyzedToken in project languagetool by languagetool-org.

the class VerbAgreementRule method getVerbSuggestions.

/**
   * @return a list of forms of @param verb which match @param expectedVerbPOS (person:number)
   * @param toUppercase true when the suggestions should be capitalized
   */
private List<String> getVerbSuggestions(AnalyzedTokenReadings verb, String expectedVerbPOS, boolean toUppercase) {
    // find the first verb reading
    AnalyzedToken verbToken = new AnalyzedToken("", "", "");
    for (AnalyzedToken token : verb.getReadings()) {
        //noinspection ConstantConditions
        if (token.getPOSTag().startsWith("VER:")) {
            verbToken = token;
            break;
        }
    }
    try {
        String[] synthesized = language.getSynthesizer().synthesize(verbToken, "VER.*:" + expectedVerbPOS + ".*", true);
        // remove duplicates
        Set<String> suggestionSet = new HashSet<>(Arrays.asList(synthesized));
        List<String> suggestions = new ArrayList<>(suggestionSet);
        if (toUppercase) {
            for (int i = 0; i < suggestions.size(); ++i) {
                suggestions.set(i, StringTools.uppercaseFirstChar(suggestions.get(i)));
            }
        }
        Collections.sort(suggestions);
        return suggestions;
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

Also used : AnalyzedToken(org.languagetool.AnalyzedToken) ArrayList(java.util.ArrayList) IOException(java.io.IOException) HashSet(java.util.HashSet)

Example 85 with AnalyzedToken

use of org.languagetool.AnalyzedToken in project languagetool by languagetool-org.

the class GermanSynthesizer method getCompoundForms.

@NotNull
private String[] getCompoundForms(AnalyzedToken token, String posTag, boolean posTagRegExp) throws IOException {
    List<String> parts = splitter.tokenize(token.getToken());
    String firstPart = String.join("", parts.subList(0, parts.size() - 1));
    String lastPart = StringTools.uppercaseFirstChar(parts.get(parts.size() - 1));
    AnalyzedToken lastPartToken = new AnalyzedToken(lastPart, posTag, lastPart);
    String[] lastPartForms;
    if (posTagRegExp) {
        lastPartForms = super.synthesize(lastPartToken, posTag, true);
    } else {
        lastPartForms = super.synthesize(lastPartToken, posTag);
    }
    // avoid dupes
    Set<String> results = new LinkedHashSet<>();
    for (String part : lastPartForms) {
        results.add(firstPart + StringTools.lowercaseFirstChar(part));
    }
    return results.toArray(new String[results.size()]);
}

Also used : AnalyzedToken(org.languagetool.AnalyzedToken) NotNull(org.jetbrains.annotations.NotNull)

Aggregations

AnalyzedToken (org.languagetool.AnalyzedToken)89 AnalyzedTokenReadings (org.languagetool.AnalyzedTokenReadings)48 ArrayList (java.util.ArrayList)43 Matcher (java.util.regex.Matcher)16 Test (org.junit.Test)16 IOException (java.io.IOException)9 Pattern (java.util.regex.Pattern)7 Nullable (org.jetbrains.annotations.Nullable)6 TaggedWord (org.languagetool.tagging.TaggedWord)6 RuleMatch (org.languagetool.rules.RuleMatch)4 Synthesizer (org.languagetool.synthesis.Synthesizer)4 InputStream (java.io.InputStream)2 HashMap (java.util.HashMap)2 LinkedHashSet (java.util.LinkedHashSet)2 Scanner (java.util.Scanner)2 TreeSet (java.util.TreeSet)2 DictionaryLookup (morfologik.stemming.DictionaryLookup)2 IStemmer (morfologik.stemming.IStemmer)2 AnalyzedSentence (org.languagetool.AnalyzedSentence)2 ChunkTag (org.languagetool.chunking.ChunkTag)2