Search in sources :

Example 6 with WordData

use of morfologik.stemming.WordData in project languagetool by languagetool-org.

the class TestTools method testDictionary.

public static void testDictionary(BaseTagger tagger, Language language) throws IOException {
    Dictionary dictionary = Dictionary.read(JLanguageTool.getDataBroker().getFromResourceDirAsUrl(tagger.getDictionaryPath()));
    DictionaryLookup lookup = new DictionaryLookup(dictionary);
    for (WordData wordData : lookup) {
        if (wordData.getTag() == null || wordData.getTag().length() == 0) {
            System.err.println("**** Warning: " + language + ": the word " + wordData.getWord() + "/" + wordData.getStem() + " lacks a POS tag in the dictionary.");
        }
    }
}
Also used : Dictionary(morfologik.stemming.Dictionary) WordData(morfologik.stemming.WordData) DictionaryLookup(morfologik.stemming.DictionaryLookup)

Example 7 with WordData

use of morfologik.stemming.WordData in project languagetool by languagetool-org.

the class PolishSynthesizer method getWordForms.

private List<String> getWordForms(final AnalyzedToken token, final String posTag, final boolean isNegated, final IStemmer synthesizer) {
    final List<String> forms = new ArrayList<>();
    final List<WordData> wordForms;
    if (isNegated) {
        wordForms = synthesizer.lookup(token.getLemma() + "|" + posTag.replaceFirst(NEGATION_TAG, POTENTIAL_NEGATION_TAG));
        if (wordForms != null) {
            for (WordData wd : wordForms) {
                forms.add("nie" + wd.getStem());
            }
        }
    } else {
        wordForms = synthesizer.lookup(token.getLemma() + "|" + posTag);
        for (WordData wd : wordForms) {
            if (wd.getStem() != null) {
                forms.add(wd.getStem().toString());
            }
        }
    }
    return forms;
}
Also used : WordData(morfologik.stemming.WordData) ArrayList(java.util.ArrayList)

Example 8 with WordData

use of morfologik.stemming.WordData in project lucene-solr by apache.

the class MorfologikFilter method popNextLemma.

private void popNextLemma() {
    // One tag (concatenated) per lemma.
    final WordData lemma = lemmaList.get(lemmaListIndex++);
    termAtt.setEmpty().append(lemma.getStem());
    CharSequence tag = lemma.getTag();
    if (tag != null) {
        String[] tags = lemmaSplitter.split(tag.toString());
        for (int i = 0; i < tags.length; i++) {
            if (tagsList.size() <= i) {
                tagsList.add(new StringBuilder());
            }
            StringBuilder buffer = tagsList.get(i);
            buffer.setLength(0);
            buffer.append(tags[i]);
        }
        tagsAtt.setTags(tagsList.subList(0, tags.length));
    } else {
        tagsAtt.setTags(Collections.<StringBuilder>emptyList());
    }
}
Also used : WordData(morfologik.stemming.WordData)

Aggregations

WordData (morfologik.stemming.WordData)8 DictionaryLookup (morfologik.stemming.DictionaryLookup)4 ArrayList (java.util.ArrayList)3 Dictionary (morfologik.stemming.Dictionary)3 IStemmer (morfologik.stemming.IStemmer)2 IOException (java.io.IOException)1 Matcher (java.util.regex.Matcher)1 Test (org.junit.Test)1 German (org.languagetool.language.German)1 Tagger (org.languagetool.tagging.Tagger)1