Search in sources :

Example 31 with SingleAnalysis

use of zemberek.morphology.analysis.SingleAnalysis in project zemberek-nlp by ahmetaa.

the class AddNewDictionaryItem method printResults.

private void printResults(WordAnalysis results) {
    int i = 1;
    if (results.analysisCount() == 0) {
        Log.info("No Analysis.");
    }
    for (SingleAnalysis result : results) {
        String str = result.formatLong();
        if (result.getDictionaryItem().attributes.contains(RootAttribute.Runtime)) {
            str = str + " (Generated by UnidentifiedTokenParser)";
        }
        Log.info(i + " - " + str);
        i++;
    }
}
Also used : SingleAnalysis(zemberek.morphology.analysis.SingleAnalysis)

Example 32 with SingleAnalysis

use of zemberek.morphology.analysis.SingleAnalysis in project zemberek-nlp by ahmetaa.

the class StemmingAndLemmatization method main.

public static void main(String[] args) {
    TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
    String word = "kutucuğumuz";
    Log.info("Word = " + word);
    Log.info("Results: ");
    WordAnalysis results = morphology.analyze(word);
    for (SingleAnalysis result : results) {
        Log.info(result.formatLong());
        Log.info("\tStems = " + result.getStems());
        Log.info("\tLemmas = " + result.getLemmas());
    }
}
Also used : SingleAnalysis(zemberek.morphology.analysis.SingleAnalysis) WordAnalysis(zemberek.morphology.analysis.WordAnalysis) TurkishMorphology(zemberek.morphology.TurkishMorphology)

Example 33 with SingleAnalysis

use of zemberek.morphology.analysis.SingleAnalysis in project zemberek-nlp by ahmetaa.

the class TurkishMorphology method analyzeWordsWithApostrophe.

public List<SingleAnalysis> analyzeWordsWithApostrophe(String word) {
    int index = word.indexOf('\'');
    if (index <= 0 || index == word.length() - 1) {
        return Collections.emptyList();
    }
    StemAndEnding se = new StemAndEnding(word.substring(0, index), word.substring(index + 1));
    String stem = TurkishAlphabet.INSTANCE.normalize(se.stem);
    String withoutQuote = word.replace("'", "");
    List<SingleAnalysis> noQuotesParses = analyzer.analyze(withoutQuote);
    if (noQuotesParses.size() == 0) {
        return Collections.emptyList();
    }
    // words like "Hastanesi'ne". Should we accept Hastanesi or Hastane?
    return noQuotesParses.stream().filter(a -> a.getDictionaryItem().primaryPos == PrimaryPos.Noun && (a.containsMorpheme(TurkishMorphotactics.p3sg) || a.getStem().equals(stem))).collect(Collectors.toList());
}
Also used : StemAndEnding(zemberek.core.turkish.StemAndEnding) AmbiguityResolver(zemberek.morphology.ambiguity.AmbiguityResolver) TurkishMorphotactics(zemberek.morphology.morphotactics.TurkishMorphotactics) StemAndEnding(zemberek.core.turkish.StemAndEnding) TextUtil(zemberek.core.text.TextUtil) Stopwatch(com.google.common.base.Stopwatch) SentenceAnalysis(zemberek.morphology.analysis.SentenceAnalysis) ArrayList(java.util.ArrayList) Turkish(zemberek.core.turkish.Turkish) Token(zemberek.tokenization.Token) SingleAnalysis(zemberek.morphology.analysis.SingleAnalysis) PrimaryPos(zemberek.core.turkish.PrimaryPos) TurkishTokenizer(zemberek.tokenization.TurkishTokenizer) AnalysisCache(zemberek.morphology.analysis.AnalysisCache) Log(zemberek.core.logging.Log) InformalTurkishMorphotactics(zemberek.morphology.morphotactics.InformalTurkishMorphotactics) RuleBasedAnalyzer(zemberek.morphology.analysis.RuleBasedAnalyzer) WordGenerator(zemberek.morphology.generator.WordGenerator) IOException(java.io.IOException) PerceptronAmbiguityResolver(zemberek.morphology.ambiguity.PerceptronAmbiguityResolver) Collectors(java.util.stream.Collectors) WordAnalysis(zemberek.morphology.analysis.WordAnalysis) TimeUnit(java.util.concurrent.TimeUnit) List(java.util.List) TurkishAlphabet(zemberek.core.turkish.TurkishAlphabet) RootLexicon(zemberek.morphology.lexicon.RootLexicon) Collections(java.util.Collections) UnidentifiedTokenAnalyzer(zemberek.morphology.analysis.UnidentifiedTokenAnalyzer) SingleAnalysis(zemberek.morphology.analysis.SingleAnalysis)

Example 34 with SingleAnalysis

use of zemberek.morphology.analysis.SingleAnalysis in project zemberek-nlp by ahmetaa.

the class AmbiguityResolutionTests method issue157ShouldNotThrowNPE.

@Test
public void issue157ShouldNotThrowNPE() {
    String input = "Yıldız Kızlar Dünya Şampiyonası FIVB'nin düzenlediği ve 18 " + "yaşının altındaki voleybolcuların katılabildiği bir şampiyonadır.";
    TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
    SentenceAnalysis analysis = morphology.analyzeAndDisambiguate(input);
    Assert.assertEquals(TurkishTokenizer.DEFAULT.tokenize(input).size(), analysis.size());
    for (SentenceWordAnalysis sentenceWordAnalysis : analysis) {
        String token = sentenceWordAnalysis.getWordAnalysis().getInput();
        SingleAnalysis an = sentenceWordAnalysis.getBestAnalysis();
        System.out.println(token + " = " + an.formatLong());
    }
}
Also used : SingleAnalysis(zemberek.morphology.analysis.SingleAnalysis) SentenceAnalysis(zemberek.morphology.analysis.SentenceAnalysis) TurkishMorphology(zemberek.morphology.TurkishMorphology) SentenceWordAnalysis(zemberek.morphology.analysis.SentenceWordAnalysis) Test(org.junit.Test)

Example 35 with SingleAnalysis

use of zemberek.morphology.analysis.SingleAnalysis in project zemberek-nlp by ahmetaa.

the class PerceptronAmbiguityResolverTrainer method test.

public static void test(DataSet set, PerceptronAmbiguityResolver resolver) {
    int hit = 0, total = 0;
    Stopwatch sw = Stopwatch.createStarted();
    for (SentenceAnalysis sentence : set.sentences) {
        DecodeResult result = resolver.getDecoder().bestPath(sentence.ambiguousAnalysis());
        int i = 0;
        List<SingleAnalysis> bestExpected = sentence.bestAnalysis();
        for (SingleAnalysis bestActual : result.bestParse) {
            if (bestExpected.get(i).equals(bestActual)) {
                hit++;
            }
            total++;
            i++;
        }
    }
    Log.info("Elapsed: " + sw.elapsed(TimeUnit.MILLISECONDS));
    Log.info("Word count:" + total + " hit=" + hit + String.format(Locale.ENGLISH, " Accuracy:%f", hit * 1.0 / total));
}
Also used : DecodeResult(zemberek.morphology.ambiguity.PerceptronAmbiguityResolver.DecodeResult) SingleAnalysis(zemberek.morphology.analysis.SingleAnalysis) Stopwatch(com.google.common.base.Stopwatch) SentenceAnalysis(zemberek.morphology.analysis.SentenceAnalysis)

Aggregations

SingleAnalysis (zemberek.morphology.analysis.SingleAnalysis)55 WordAnalysis (zemberek.morphology.analysis.WordAnalysis)38 ArrayList (java.util.ArrayList)25 SentenceAnalysis (zemberek.morphology.analysis.SentenceAnalysis)23 TurkishMorphology (zemberek.morphology.TurkishMorphology)21 SentenceWordAnalysis (zemberek.morphology.analysis.SentenceWordAnalysis)18 Test (org.junit.Test)15 LinkedHashSet (java.util.LinkedHashSet)13 PrintWriter (java.io.PrintWriter)10 Path (java.nio.file.Path)10 Histogram (zemberek.core.collections.Histogram)10 Token (zemberek.tokenization.Token)7 IOException (java.io.IOException)6 Ignore (org.junit.Ignore)6 Log (zemberek.core.logging.Log)6 HashSet (java.util.HashSet)5 List (java.util.List)5 Collectors (java.util.stream.Collectors)5 Paths (java.nio.file.Paths)4 Files (java.nio.file.Files)3