Search in sources :

Example 1 with PrimaryPos

use of zemberek.core.turkish.PrimaryPos in project zemberek-nlp by ahmetaa.

the class TurkishSuffixes method defineSuccessorSuffixes.

@Override
public SuffixData[] defineSuccessorSuffixes(DictionaryItem item) {
    SuffixData original = new SuffixData();
    SuffixData modified = new SuffixData();
    PrimaryPos primaryPos = item.primaryPos;
    switch(primaryPos) {
        case Verb:
            getForVerb(item, original, modified);
            break;
        default:
            break;
    }
    return new SuffixData[] { original, modified };
}
Also used : PrimaryPos(zemberek.core.turkish.PrimaryPos) SuffixData(zemberek.morphology.lexicon.graph.SuffixData)

Example 2 with PrimaryPos

use of zemberek.core.turkish.PrimaryPos in project zemberek-nlp by ahmetaa.

the class TurkishMorphology method analyzeWordsWithApostrophe.

public List<SingleAnalysis> analyzeWordsWithApostrophe(String word) {
    int index = word.indexOf('\'');
    if (index <= 0 || index == word.length() - 1) {
        return Collections.emptyList();
    }
    StemAndEnding se = new StemAndEnding(word.substring(0, index), word.substring(index + 1));
    String stem = TurkishAlphabet.INSTANCE.normalize(se.stem);
    String withoutQuote = word.replace("'", "");
    List<SingleAnalysis> noQuotesParses = analyzer.analyze(withoutQuote);
    if (noQuotesParses.size() == 0) {
        return Collections.emptyList();
    }
    // words like "Hastanesi'ne". Should we accept Hastanesi or Hastane?
    return noQuotesParses.stream().filter(a -> a.getDictionaryItem().primaryPos == PrimaryPos.Noun && (a.containsMorpheme(TurkishMorphotactics.p3sg) || a.getStem().equals(stem))).collect(Collectors.toList());
}
Also used : StemAndEnding(zemberek.core.turkish.StemAndEnding) AmbiguityResolver(zemberek.morphology.ambiguity.AmbiguityResolver) TurkishMorphotactics(zemberek.morphology.morphotactics.TurkishMorphotactics) StemAndEnding(zemberek.core.turkish.StemAndEnding) TextUtil(zemberek.core.text.TextUtil) Stopwatch(com.google.common.base.Stopwatch) SentenceAnalysis(zemberek.morphology.analysis.SentenceAnalysis) ArrayList(java.util.ArrayList) Turkish(zemberek.core.turkish.Turkish) Token(zemberek.tokenization.Token) SingleAnalysis(zemberek.morphology.analysis.SingleAnalysis) PrimaryPos(zemberek.core.turkish.PrimaryPos) TurkishTokenizer(zemberek.tokenization.TurkishTokenizer) AnalysisCache(zemberek.morphology.analysis.AnalysisCache) Log(zemberek.core.logging.Log) InformalTurkishMorphotactics(zemberek.morphology.morphotactics.InformalTurkishMorphotactics) RuleBasedAnalyzer(zemberek.morphology.analysis.RuleBasedAnalyzer) WordGenerator(zemberek.morphology.generator.WordGenerator) IOException(java.io.IOException) PerceptronAmbiguityResolver(zemberek.morphology.ambiguity.PerceptronAmbiguityResolver) Collectors(java.util.stream.Collectors) WordAnalysis(zemberek.morphology.analysis.WordAnalysis) TimeUnit(java.util.concurrent.TimeUnit) List(java.util.List) TurkishAlphabet(zemberek.core.turkish.TurkishAlphabet) RootLexicon(zemberek.morphology.lexicon.RootLexicon) Collections(java.util.Collections) UnidentifiedTokenAnalyzer(zemberek.morphology.analysis.UnidentifiedTokenAnalyzer) SingleAnalysis(zemberek.morphology.analysis.SingleAnalysis)

Example 3 with PrimaryPos

use of zemberek.core.turkish.PrimaryPos in project zemberek-nlp by ahmetaa.

the class TurkishStopWords method generateFromDictionary.

static TurkishStopWords generateFromDictionary() throws IOException {
    Set<PrimaryPos> pos = Sets.newHashSet(PrimaryPos.Adverb, PrimaryPos.Conjunction, PrimaryPos.Determiner, PrimaryPos.Interjection, PrimaryPos.PostPositive, PrimaryPos.Numeral, PrimaryPos.Pronoun, PrimaryPos.Question);
    TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
    Set<String> set = new HashSet<>();
    RootLexicon lexicon = morphology.getLexicon();
    for (DictionaryItem item : lexicon) {
        if (pos.contains(item.primaryPos)) {
            set.add(item.lemma);
        }
    }
    List<String> str = new ArrayList<>(set);
    str.sort(Turkish.STRING_COMPARATOR_ASC);
    return new TurkishStopWords(new LinkedHashSet<>(str));
}
Also used : DictionaryItem(zemberek.morphology.lexicon.DictionaryItem) PrimaryPos(zemberek.core.turkish.PrimaryPos) ArrayList(java.util.ArrayList) RootLexicon(zemberek.morphology.lexicon.RootLexicon) TurkishMorphology(zemberek.morphology.TurkishMorphology) HashSet(java.util.HashSet) LinkedHashSet(java.util.LinkedHashSet)

Example 4 with PrimaryPos

use of zemberek.core.turkish.PrimaryPos in project zemberek-nlp by ahmetaa.

the class FindPOS method main.

public static void main(String[] args) {
    TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
    String sentence = "Keşke yarın hava güzel olsa.";
    Log.info("Sentence  = " + sentence);
    SentenceAnalysis analysis = morphology.analyzeAndDisambiguate(sentence);
    for (SentenceWordAnalysis a : analysis) {
        PrimaryPos primaryPos = a.getBestAnalysis().getPos();
        Log.info("%s : %s ", a.getWordAnalysis().getInput(), primaryPos);
    }
}
Also used : PrimaryPos(zemberek.core.turkish.PrimaryPos) SentenceAnalysis(zemberek.morphology.analysis.SentenceAnalysis) TurkishMorphology(zemberek.morphology.TurkishMorphology) SentenceWordAnalysis(zemberek.morphology.analysis.SentenceWordAnalysis)

Aggregations

PrimaryPos (zemberek.core.turkish.PrimaryPos)4 ArrayList (java.util.ArrayList)2 TurkishMorphology (zemberek.morphology.TurkishMorphology)2 SentenceAnalysis (zemberek.morphology.analysis.SentenceAnalysis)2 RootLexicon (zemberek.morphology.lexicon.RootLexicon)2 Stopwatch (com.google.common.base.Stopwatch)1 IOException (java.io.IOException)1 Collections (java.util.Collections)1 HashSet (java.util.HashSet)1 LinkedHashSet (java.util.LinkedHashSet)1 List (java.util.List)1 TimeUnit (java.util.concurrent.TimeUnit)1 Collectors (java.util.stream.Collectors)1 Log (zemberek.core.logging.Log)1 TextUtil (zemberek.core.text.TextUtil)1 StemAndEnding (zemberek.core.turkish.StemAndEnding)1 Turkish (zemberek.core.turkish.Turkish)1 TurkishAlphabet (zemberek.core.turkish.TurkishAlphabet)1 AmbiguityResolver (zemberek.morphology.ambiguity.AmbiguityResolver)1 PerceptronAmbiguityResolver (zemberek.morphology.ambiguity.PerceptronAmbiguityResolver)1