Search in sources :

Example 1 with Dictionary

use of morfologik.stemming.Dictionary in project languagetool by languagetool-org.

the class GermanTaggerEnhancer method run.

private void run() throws IOException {
    final Dictionary dictionary = Dictionary.read(JLanguageTool.getDataBroker().getFromResourceDirAsUrl("/de/german.dict"));
    final DictionaryLookup dl = new DictionaryLookup(dictionary);
    Tagger tagger = new German().getTagger();
    String prev = null;
    for (WordData wd : dl) {
        String word = wd.getWord().toString();
        if (word.endsWith("er") && StringTools.startsWithUppercase(word)) {
            if (!hasAdjReading(tagger, word) && isEigenname(tagger, word.substring(0, word.length() - 2)) && !word.equals(prev)) {
                for (String newTags : ADJ_READINGS) {
                    System.out.println(word + "\t" + word + "\t" + newTags + ":DEF");
                    System.out.println(word + "\t" + word + "\t" + newTags + ":IND");
                    System.out.println(word + "\t" + word + "\t" + newTags + ":SOL");
                }
                prev = word;
            }
        }
    }
}
Also used : Dictionary(morfologik.stemming.Dictionary) Tagger(org.languagetool.tagging.Tagger) WordData(morfologik.stemming.WordData) German(org.languagetool.language.German) DictionaryLookup(morfologik.stemming.DictionaryLookup)

Example 2 with Dictionary

use of morfologik.stemming.Dictionary in project languagetool by languagetool-org.

the class GermanTaggerTest method testDictionary.

@Test
public void testDictionary() throws IOException {
    Dictionary dictionary = Dictionary.read(JLanguageTool.getDataBroker().getFromResourceDirAsUrl("/de/german.dict"));
    DictionaryLookup dl = new DictionaryLookup(dictionary);
    for (WordData wd : dl) {
        if (wd.getTag() == null || wd.getTag().length() == 0) {
            System.err.println("**** Warning: the word " + wd.getWord() + "/" + wd.getStem() + " lacks a POS tag in the dictionary.");
        }
    }
}
Also used : Dictionary(morfologik.stemming.Dictionary) WordData(morfologik.stemming.WordData) DictionaryLookup(morfologik.stemming.DictionaryLookup) Test(org.junit.Test)

Example 3 with Dictionary

use of morfologik.stemming.Dictionary in project languagetool by languagetool-org.

the class MorfologikMultiSpeller method getPlainTextDictSpellerOrNull.

@Nullable
private MorfologikSpeller getPlainTextDictSpellerOrNull(BufferedReader plainTextReader, String dictPath, int maxEditDistance) throws IOException {
    List<byte[]> lines = getLines(plainTextReader);
    if (lines.isEmpty()) {
        return null;
    }
    Dictionary dictionary = getDictionary(lines, dictPath);
    return new MorfologikSpeller(dictionary, maxEditDistance);
}
Also used : Dictionary(morfologik.stemming.Dictionary) Nullable(org.jetbrains.annotations.Nullable)

Example 4 with Dictionary

use of morfologik.stemming.Dictionary in project languagetool by languagetool-org.

the class TestTools method testDictionary.

public static void testDictionary(BaseTagger tagger, Language language) throws IOException {
    Dictionary dictionary = Dictionary.read(JLanguageTool.getDataBroker().getFromResourceDirAsUrl(tagger.getDictionaryPath()));
    DictionaryLookup lookup = new DictionaryLookup(dictionary);
    for (WordData wordData : lookup) {
        if (wordData.getTag() == null || wordData.getTag().length() == 0) {
            System.err.println("**** Warning: " + language + ": the word " + wordData.getWord() + "/" + wordData.getStem() + " lacks a POS tag in the dictionary.");
        }
    }
}
Also used : Dictionary(morfologik.stemming.Dictionary) WordData(morfologik.stemming.WordData) DictionaryLookup(morfologik.stemming.DictionaryLookup)

Example 5 with Dictionary

use of morfologik.stemming.Dictionary in project languagetool by languagetool-org.

the class MorfologikMultiSpeller method getDictionary.

private Dictionary getDictionary(List<byte[]> lines, String dictPath) throws IOException {
    Dictionary dictFromCache = dicPathToDict.get(dictPath);
    if (dictFromCache != null) {
        return dictFromCache;
    } else {
        // Creating the dictionary at runtime can easily take 50ms for spelling.txt files
        // that are ~50KB. We don't want that overhead for every check of a short sentence,
        // so we cache the result:
        Collections.sort(lines, FSABuilder.LEXICAL_ORDERING);
        FSA fsa = FSABuilder.build(lines);
        ByteArrayOutputStream fsaOutStream = new CFSA2Serializer().serialize(fsa, new ByteArrayOutputStream());
        ByteArrayInputStream fsaInStream = new ByteArrayInputStream(fsaOutStream.toByteArray());
        String infoFile = dictPath.replace(".dict", ".info");
        Dictionary dict = Dictionary.read(fsaInStream, JLanguageTool.getDataBroker().getFromResourceDirAsStream(infoFile));
        dicPathToDict.put(dictPath, dict);
        return dict;
    }
}
Also used : Dictionary(morfologik.stemming.Dictionary) CFSA2Serializer(morfologik.fsa.builders.CFSA2Serializer) FSA(morfologik.fsa.FSA)

Aggregations

Dictionary (morfologik.stemming.Dictionary)9 Test (org.junit.Test)4 URL (java.net.URL)3 Speller (morfologik.speller.Speller)3 DictionaryLookup (morfologik.stemming.DictionaryLookup)3 WordData (morfologik.stemming.WordData)3 Ignore (org.junit.Ignore)3 ByteArrayInputStream (java.io.ByteArrayInputStream)1 FSA (morfologik.fsa.FSA)1 CFSA2Serializer (morfologik.fsa.builders.CFSA2Serializer)1 Nullable (org.jetbrains.annotations.Nullable)1 German (org.languagetool.language.German)1 Tagger (org.languagetool.tagging.Tagger)1