Search in sources :

Example 6 with TurkishAlphabet

use of zemberek.core.turkish.TurkishAlphabet in project zemberek-nlp by ahmetaa.

the class DictionaryOperations method findAbbreviations.

public static void findAbbreviations() throws IOException {
    // TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
    RootLexicon lexicon = TurkishDictionaryLoader.loadFromResources("tr/non-tdk.dict");
    Set<String> set = new HashSet<>();
    for (DictionaryItem item : lexicon) {
        String lemma = item.lemma;
        if (item.attributes.contains(RootAttribute.Dummy)) {
            continue;
        }
        if (item.secondaryPos != SecondaryPos.ProperNoun) {
            continue;
        }
        TurkishAlphabet alphabet = TurkishAlphabet.INSTANCE;
        if (!alphabet.containsVowel(lemma) || (lemma.length() > 3 && !alphabet.containsVowel(lemma.substring(0, 3)))) {
            set.add(lemma + " [P:Abbrv]");
        }
    }
    List<String> list = new ArrayList<>(set);
    list.sort(Turkish.STRING_COMPARATOR_ASC);
    Files.write(Paths.get("zemberek.possible.abbrv2"), list);
}
Also used : DictionaryItem(zemberek.morphology.lexicon.DictionaryItem) TurkishAlphabet(zemberek.core.turkish.TurkishAlphabet) ArrayList(java.util.ArrayList) RootLexicon(zemberek.morphology.lexicon.RootLexicon) HashSet(java.util.HashSet) LinkedHashSet(java.util.LinkedHashSet)

Aggregations

TurkishAlphabet (zemberek.core.turkish.TurkishAlphabet)6 ArrayList (java.util.ArrayList)3 HashSet (java.util.HashSet)3 DictionaryItem (zemberek.morphology.lexicon.DictionaryItem)3 LinkedHashSet (java.util.LinkedHashSet)2 File (java.io.File)1 Locale (java.util.Locale)1 Ignore (org.junit.Ignore)1 Test (org.junit.Test)1 TurkishLetterSequence (zemberek.core.turkish.TurkishLetterSequence)1 TurkishMorphology (zemberek.morphology.TurkishMorphology)1 RootLexicon (zemberek.morphology.lexicon.RootLexicon)1 TurkishDictionaryLoader (zemberek.morphology.lexicon.tr.TurkishDictionaryLoader)1