Search in sources :

Example 26 with SentenceAnalysis

use of zemberek.morphology.analysis.SentenceAnalysis in project zemberek-nlp by ahmetaa.

the class AmbiguityResolutionTests method issue157ShouldNotThrowNPE.

@Test
public void issue157ShouldNotThrowNPE() {
    String input = "Yıldız Kızlar Dünya Şampiyonası FIVB'nin düzenlediği ve 18 " + "yaşının altındaki voleybolcuların katılabildiği bir şampiyonadır.";
    TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
    SentenceAnalysis analysis = morphology.analyzeAndDisambiguate(input);
    Assert.assertEquals(TurkishTokenizer.DEFAULT.tokenize(input).size(), analysis.size());
    for (SentenceWordAnalysis sentenceWordAnalysis : analysis) {
        String token = sentenceWordAnalysis.getWordAnalysis().getInput();
        SingleAnalysis an = sentenceWordAnalysis.getBestAnalysis();
        System.out.println(token + " = " + an.formatLong());
    }
}
Also used : SingleAnalysis(zemberek.morphology.analysis.SingleAnalysis) SentenceAnalysis(zemberek.morphology.analysis.SentenceAnalysis) TurkishMorphology(zemberek.morphology.TurkishMorphology) SentenceWordAnalysis(zemberek.morphology.analysis.SentenceWordAnalysis) Test(org.junit.Test)

Example 27 with SentenceAnalysis

use of zemberek.morphology.analysis.SentenceAnalysis in project zemberek-nlp by ahmetaa.

the class PerceptronAmbiguityResolverTrainer method test.

public static void test(DataSet set, PerceptronAmbiguityResolver resolver) {
    int hit = 0, total = 0;
    Stopwatch sw = Stopwatch.createStarted();
    for (SentenceAnalysis sentence : set.sentences) {
        DecodeResult result = resolver.getDecoder().bestPath(sentence.ambiguousAnalysis());
        int i = 0;
        List<SingleAnalysis> bestExpected = sentence.bestAnalysis();
        for (SingleAnalysis bestActual : result.bestParse) {
            if (bestExpected.get(i).equals(bestActual)) {
                hit++;
            }
            total++;
            i++;
        }
    }
    Log.info("Elapsed: " + sw.elapsed(TimeUnit.MILLISECONDS));
    Log.info("Word count:" + total + " hit=" + hit + String.format(Locale.ENGLISH, " Accuracy:%f", hit * 1.0 / total));
}
Also used : DecodeResult(zemberek.morphology.ambiguity.PerceptronAmbiguityResolver.DecodeResult) SingleAnalysis(zemberek.morphology.analysis.SingleAnalysis) Stopwatch(com.google.common.base.Stopwatch) SentenceAnalysis(zemberek.morphology.analysis.SentenceAnalysis)

Example 28 with SentenceAnalysis

use of zemberek.morphology.analysis.SentenceAnalysis in project zemberek-nlp by ahmetaa.

the class MorphologyConsole method run.

@Override
public void run() {
    Builder b = TurkishMorphology.builder().setLexicon(RootLexicon.getDefault());
    if (disableUnknownAnalysis) {
        b.disableUnidentifiedTokenAnalyzer();
    }
    if (enableInformalWordAnalysis) {
        b.useInformalAnalysis();
    }
    TurkishMorphology morphology = b.build();
    String input;
    System.out.println("Enter word or sentence. Type `quit` or `Ctrl+C` to exit.:");
    Scanner sc = new Scanner(System.in);
    input = sc.nextLine();
    while (!input.equals("quit")) {
        if (input.trim().length() == 0) {
            System.out.println("Empty line cannot be processed.");
            input = sc.nextLine();
            continue;
        }
        SentenceAnalysis analysis = morphology.analyzeAndDisambiguate(input);
        System.out.format("%nS:%s%n", input);
        for (SentenceWordAnalysis sw : analysis) {
            WordAnalysis wa = sw.getWordAnalysis();
            System.out.println(wa.getInput());
            SingleAnalysis best = sw.getBestAnalysis();
            for (SingleAnalysis singleAnalysis : wa) {
                boolean isBest = singleAnalysis.equals(best);
                if (wa.analysisCount() == 1) {
                    System.out.println(singleAnalysis.formatLong());
                } else {
                    System.out.format("%s%s%n", singleAnalysis.formatLong(), isBest ? "*" : "");
                }
            }
        }
        System.out.println();
        input = sc.nextLine();
    }
}
Also used : Scanner(java.util.Scanner) SingleAnalysis(zemberek.morphology.analysis.SingleAnalysis) SentenceWordAnalysis(zemberek.morphology.analysis.SentenceWordAnalysis) WordAnalysis(zemberek.morphology.analysis.WordAnalysis) Builder(zemberek.morphology.TurkishMorphology.Builder) SentenceAnalysis(zemberek.morphology.analysis.SentenceAnalysis) TurkishMorphology(zemberek.morphology.TurkishMorphology) SentenceWordAnalysis(zemberek.morphology.analysis.SentenceWordAnalysis)

Example 29 with SentenceAnalysis

use of zemberek.morphology.analysis.SentenceAnalysis in project zemberek-nlp by ahmetaa.

the class PreprocessTurkishCorpus method replaceWordsWithLemma.

private String replaceWordsWithLemma(String sentence) {
    SentenceAnalysis analysis = morphology.analyzeAndDisambiguate(sentence);
    List<String> res = new ArrayList<>();
    for (SentenceWordAnalysis e : analysis) {
        SingleAnalysis best = e.getBestAnalysis();
        if (best.isUnknown()) {
            res.add(e.getWordAnalysis().getInput());
            continue;
        }
        List<String> lemmas = best.getLemmas();
        res.add(lemmas.get(0));
    }
    return String.join(" ", res);
}
Also used : SingleAnalysis(zemberek.morphology.analysis.SingleAnalysis) ArrayList(java.util.ArrayList) SentenceAnalysis(zemberek.morphology.analysis.SentenceAnalysis) SentenceWordAnalysis(zemberek.morphology.analysis.SentenceWordAnalysis)

Example 30 with SentenceAnalysis

use of zemberek.morphology.analysis.SentenceAnalysis in project zemberek-nlp by ahmetaa.

the class QuestionClassifier method splitWords.

private String splitWords(String sentence) {
    List<String> tokens = Splitter.on(" ").splitToList(sentence);
    // assume first is label. Remove label from sentence for morphological analysis.
    String label = tokens.get(0);
    tokens = tokens.subList(1, tokens.size());
    sentence = String.join(" ", tokens);
    if (sentence.length() == 0) {
        return sentence;
    }
    SentenceAnalysis analysis = morphology.analyzeAndDisambiguate(sentence);
    List<String> res = new ArrayList<>();
    // add label first.
    res.add(label);
    for (SentenceWordAnalysis e : analysis) {
        SingleAnalysis best = e.getBestAnalysis();
        if (best.isUnknown()) {
            res.add(e.getWordAnalysis().getInput());
            continue;
        }
        List<String> lemmas = best.getLemmas();
        res.add(lemmas.get(lemmas.size() - 1));
    }
    return String.join(" ", res);
}
Also used : SingleAnalysis(zemberek.morphology.analysis.SingleAnalysis) ArrayList(java.util.ArrayList) SentenceAnalysis(zemberek.morphology.analysis.SentenceAnalysis) SentenceWordAnalysis(zemberek.morphology.analysis.SentenceWordAnalysis)

Aggregations

SentenceAnalysis (zemberek.morphology.analysis.SentenceAnalysis)35 SingleAnalysis (zemberek.morphology.analysis.SingleAnalysis)22 SentenceWordAnalysis (zemberek.morphology.analysis.SentenceWordAnalysis)19 WordAnalysis (zemberek.morphology.analysis.WordAnalysis)19 ArrayList (java.util.ArrayList)13 TurkishMorphology (zemberek.morphology.TurkishMorphology)10 PrintWriter (java.io.PrintWriter)5 Histogram (zemberek.core.collections.Histogram)5 Test (org.junit.Test)4 Token (zemberek.tokenization.Token)4 Stopwatch (com.google.common.base.Stopwatch)3 IOException (java.io.IOException)3 Path (java.nio.file.Path)3 Ignore (org.junit.Ignore)3 Log (zemberek.core.logging.Log)3 Lists (com.google.common.collect.Lists)2 Paths (java.nio.file.Paths)2 Collections (java.util.Collections)2 LinkedHashSet (java.util.LinkedHashSet)2 List (java.util.List)2