Search in sources :

Example 1 with DictionaryLookup

use of morfologik.stemming.DictionaryLookup in project languagetool by languagetool-org.

the class GermanTaggerEnhancer method run.

private void run() throws IOException {
    final Dictionary dictionary = Dictionary.read(JLanguageTool.getDataBroker().getFromResourceDirAsUrl("/de/german.dict"));
    final DictionaryLookup dl = new DictionaryLookup(dictionary);
    Tagger tagger = new German().getTagger();
    String prev = null;
    for (WordData wd : dl) {
        String word = wd.getWord().toString();
        if (word.endsWith("er") && StringTools.startsWithUppercase(word)) {
            if (!hasAdjReading(tagger, word) && isEigenname(tagger, word.substring(0, word.length() - 2)) && !word.equals(prev)) {
                for (String newTags : ADJ_READINGS) {
                    System.out.println(word + "\t" + word + "\t" + newTags + ":DEF");
                    System.out.println(word + "\t" + word + "\t" + newTags + ":IND");
                    System.out.println(word + "\t" + word + "\t" + newTags + ":SOL");
                }
                prev = word;
            }
        }
    }
}
Also used : Dictionary(morfologik.stemming.Dictionary) Tagger(org.languagetool.tagging.Tagger) WordData(morfologik.stemming.WordData) German(org.languagetool.language.German) DictionaryLookup(morfologik.stemming.DictionaryLookup)

Example 2 with DictionaryLookup

use of morfologik.stemming.DictionaryLookup in project languagetool by languagetool-org.

the class GermanTaggerTest method testDictionary.

@Test
public void testDictionary() throws IOException {
    Dictionary dictionary = Dictionary.read(JLanguageTool.getDataBroker().getFromResourceDirAsUrl("/de/german.dict"));
    DictionaryLookup dl = new DictionaryLookup(dictionary);
    for (WordData wd : dl) {
        if (wd.getTag() == null || wd.getTag().length() == 0) {
            System.err.println("**** Warning: the word " + wd.getWord() + "/" + wd.getStem() + " lacks a POS tag in the dictionary.");
        }
    }
}
Also used : Dictionary(morfologik.stemming.Dictionary) WordData(morfologik.stemming.WordData) DictionaryLookup(morfologik.stemming.DictionaryLookup) Test(org.junit.Test)

Example 3 with DictionaryLookup

use of morfologik.stemming.DictionaryLookup in project languagetool by languagetool-org.

the class CatalanTagger method additionalTags.

@Nullable
protected List<AnalyzedToken> additionalTags(String word, IStemmer stemmer) {
    final IStemmer dictLookup = new DictionaryLookup(getDictionary());
    List<AnalyzedToken> additionalTaggedTokens = new ArrayList<>();
    //Adjectiu femení singular o participi femení singular + -ment
    if (word.endsWith("ment")) {
        final String lowerWord = word.toLowerCase(conversionLocale);
        final String possibleAdj = lowerWord.replaceAll("^(.+)ment$", "$1");
        List<AnalyzedToken> taggerTokens;
        taggerTokens = asAnalyzedTokenList(possibleAdj, dictLookup.lookup(possibleAdj));
        for (AnalyzedToken taggerToken : taggerTokens) {
            final String posTag = taggerToken.getPOSTag();
            if (posTag != null) {
                final Matcher m = ADJ_PART_FS.matcher(posTag);
                if (m.matches()) {
                    additionalTaggedTokens.add(new AnalyzedToken(word, "RG", lowerWord));
                    return additionalTaggedTokens;
                }
            }
        }
    }
    //Any well-formed verb with prefixes is tagged as a verb copying the original tags
    Matcher matcher = PREFIXES_FOR_VERBS.matcher(word);
    if (matcher.matches()) {
        final String possibleVerb = matcher.group(2).toLowerCase();
        List<AnalyzedToken> taggerTokens;
        taggerTokens = asAnalyzedTokenList(possibleVerb, dictLookup.lookup(possibleVerb));
        for (AnalyzedToken taggerToken : taggerTokens) {
            final String posTag = taggerToken.getPOSTag();
            if (posTag != null) {
                final Matcher m = VERB.matcher(posTag);
                if (m.matches()) {
                    String lemma = matcher.group(1).toLowerCase().concat(taggerToken.getLemma());
                    additionalTaggedTokens.add(new AnalyzedToken(word, posTag, lemma));
                }
            }
        }
        return additionalTaggedTokens;
    }
    // U+0140 LATIN SMALL LETTER L WITH MIDDLE DOT
    if (word.contains("ŀ") || word.contains("Ŀ")) {
        final String lowerWord = word.toLowerCase(conversionLocale);
        final String possibleWord = lowerWord.replaceAll("ŀ", "l·");
        List<AnalyzedToken> taggerTokens = asAnalyzedTokenList(word, dictLookup.lookup(possibleWord));
        return taggerTokens;
    }
    return null;
}
Also used : AnalyzedToken(org.languagetool.AnalyzedToken) Matcher(java.util.regex.Matcher) IStemmer(morfologik.stemming.IStemmer) ArrayList(java.util.ArrayList) DictionaryLookup(morfologik.stemming.DictionaryLookup) Nullable(org.jetbrains.annotations.Nullable)

Example 4 with DictionaryLookup

use of morfologik.stemming.DictionaryLookup in project languagetool by languagetool-org.

the class MorfologikTagger method tag.

@Override
public List<TaggedWord> tag(String word) {
    List<TaggedWord> result = new ArrayList<>();
    try {
        IStemmer dictLookup = new DictionaryLookup(getDictionary());
        List<WordData> lookup = dictLookup.lookup(word);
        for (WordData wordData : lookup) {
            String tag = wordData.getTag() == null ? null : wordData.getTag().toString();
            // The frequency data is in the last byte (without a separator)
            if (dictionary.metadata.isFrequencyIncluded() && tag != null && tag.length() > 1) {
                tag = tag.substring(0, tag.length() - 1);
            }
            String stem = wordData.getStem() == null ? null : wordData.getStem().toString();
            TaggedWord taggedWord = new TaggedWord(stem, tag);
            result.add(taggedWord);
        }
    } catch (IOException e) {
        throw new RuntimeException("Could not tag word '" + word + "'", e);
    }
    return result;
}
Also used : IStemmer(morfologik.stemming.IStemmer) WordData(morfologik.stemming.WordData) ArrayList(java.util.ArrayList) IOException(java.io.IOException) DictionaryLookup(morfologik.stemming.DictionaryLookup)

Example 5 with DictionaryLookup

use of morfologik.stemming.DictionaryLookup in project languagetool by languagetool-org.

the class TestTools method testDictionary.

public static void testDictionary(BaseTagger tagger, Language language) throws IOException {
    Dictionary dictionary = Dictionary.read(JLanguageTool.getDataBroker().getFromResourceDirAsUrl(tagger.getDictionaryPath()));
    DictionaryLookup lookup = new DictionaryLookup(dictionary);
    for (WordData wordData : lookup) {
        if (wordData.getTag() == null || wordData.getTag().length() == 0) {
            System.err.println("**** Warning: " + language + ": the word " + wordData.getWord() + "/" + wordData.getStem() + " lacks a POS tag in the dictionary.");
        }
    }
}
Also used : Dictionary(morfologik.stemming.Dictionary) WordData(morfologik.stemming.WordData) DictionaryLookup(morfologik.stemming.DictionaryLookup)

Aggregations

DictionaryLookup (morfologik.stemming.DictionaryLookup)8 IStemmer (morfologik.stemming.IStemmer)5 ArrayList (java.util.ArrayList)4 WordData (morfologik.stemming.WordData)4 Dictionary (morfologik.stemming.Dictionary)3 Matcher (java.util.regex.Matcher)2 AnalyzedToken (org.languagetool.AnalyzedToken)2 IOException (java.io.IOException)1 InputStream (java.io.InputStream)1 HashSet (java.util.HashSet)1 Pattern (java.util.regex.Pattern)1 Nullable (org.jetbrains.annotations.Nullable)1 Test (org.junit.Test)1 AnalyzedTokenReadings (org.languagetool.AnalyzedTokenReadings)1 ChunkTag (org.languagetool.chunking.ChunkTag)1 German (org.languagetool.language.German)1 Tagger (org.languagetool.tagging.Tagger)1