Search in sources :

Example 96 with WordAnalysis

use of zemberek.morphology.analysis.WordAnalysis in project zemberek-nlp by ahmetaa.

the class WordSimilarityConsole method run.

void run(Path vectorFile, Path vocabFile) throws IOException {
    TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
    System.out.println("Loading from " + vectorFile);
    WordVectorLookup lookup = WordVectorLookup.loadFromBinaryFast(vectorFile, vocabFile);
    WordVectorLookup.DistanceMatcher distanceMatcher = new WordVectorLookup.DistanceMatcher(lookup);
    String input;
    System.out.println("Enter word:");
    Scanner sc = new Scanner(System.in);
    input = sc.nextLine();
    while (!input.equals("exit") && !input.equals("quit")) {
        if (!lookup.containsWord(input)) {
            Log.info(input + " cannot be found.");
            input = sc.nextLine();
            continue;
        }
        List<WordDistances.Distance> distances = distanceMatcher.nearestK(input, 30);
        List<String> dist = new ArrayList<>(distances.size());
        dist.addAll(distances.stream().map(d -> d.word).collect(Collectors.toList()));
        System.out.println(String.join(" ", dist));
        List<String> noParse = new ArrayList<>();
        for (String s : dist) {
            WordAnalysis an = morphology.analyze(s);
            if (an.isCorrect() || (an.analysisCount() == 1 && an.getAnalysisResults().get(0).getDictionaryItem().primaryPos == PrimaryPos.Unknown)) {
                noParse.add(s);
            }
        }
        System.out.println(String.join(" ", noParse));
        input = sc.nextLine();
    }
}
Also used : Scanner(java.util.Scanner) WordAnalysis(zemberek.morphology.analysis.WordAnalysis) ArrayList(java.util.ArrayList) TurkishMorphology(zemberek.morphology.TurkishMorphology)

Aggregations

WordAnalysis (zemberek.morphology.analysis.WordAnalysis)96 Test (org.junit.Test)42 SingleAnalysis (zemberek.morphology.analysis.SingleAnalysis)36 TurkishMorphology (zemberek.morphology.TurkishMorphology)22 ArrayList (java.util.ArrayList)21 SentenceAnalysis (zemberek.morphology.analysis.SentenceAnalysis)19 LinkedHashSet (java.util.LinkedHashSet)13 Ignore (org.junit.Ignore)13 Histogram (zemberek.core.collections.Histogram)12 Path (java.nio.file.Path)11 PrintWriter (java.io.PrintWriter)10 SentenceWordAnalysis (zemberek.morphology.analysis.SentenceWordAnalysis)10 IOException (java.io.IOException)6 HashSet (java.util.HashSet)6 List (java.util.List)6 WordAnalyzer (zemberek.morphology.analysis.WordAnalyzer)6 SimpleGenerator (zemberek.morphology.generator.SimpleGenerator)6 DictionaryItem (zemberek.morphology.lexicon.DictionaryItem)6 DynamicLexiconGraph (zemberek.morphology.lexicon.graph.DynamicLexiconGraph)6 Log (zemberek.core.logging.Log)5