Search in sources :

Example 11 with TurkishMorphology

use of zemberek.morphology.analysis.tr.TurkishMorphology in project zemberek-nlp by ahmetaa.

the class Serializer method serializeDeserializeTest.

private static void serializeDeserializeTest() throws IOException {
    TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
    RootLexicon lexicon = morphology.getLexicon();
    Dictionary.Builder builder = Dictionary.newBuilder();
    for (DictionaryItem item : lexicon.getAllItems()) {
        builder.addItems(convertToProto(item));
    }
    Dictionary dictionary = builder.build();
    System.out.println("Total size of serialized dictionary: " + dictionary.getSerializedSize());
    File f = new File("lexicon.bin");
    BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(f));
    bos.write(dictionary.toByteArray());
    bos.close();
    long start = System.currentTimeMillis();
    byte[] serialized = Files.readAllBytes(new File("lexicon.bin").toPath());
    long end = System.currentTimeMillis();
    Log.info("Dictionary loaded in %d ms.", (end - start));
    start = System.currentTimeMillis();
    Dictionary readDictionary = Dictionary.parseFrom(serialized);
    end = System.currentTimeMillis();
    Log.info("Dictionary deserialized in %d ms.", (end - start));
    System.out.println("Total size of read dictionary: " + readDictionary.getSerializedSize());
    start = System.currentTimeMillis();
    RootLexicon loadedLexicon = new RootLexicon();
    for (LexiconProto.DictionaryItem item : readDictionary.getItemsList()) {
        loadedLexicon.add(convertToDictionaryItem(item));
    }
    end = System.currentTimeMillis();
    Log.info("RootLexicon generated in %d ms.", (end - start));
}
Also used : LexiconProto(zemberek.morphology.lexicon.proto.LexiconProto) Dictionary(zemberek.morphology.lexicon.proto.LexiconProto.Dictionary) FileOutputStream(java.io.FileOutputStream) TurkishMorphology(zemberek.morphology.analysis.tr.TurkishMorphology) File(java.io.File) BufferedOutputStream(java.io.BufferedOutputStream)

Example 12 with TurkishMorphology

use of zemberek.morphology.analysis.tr.TurkishMorphology in project zemberek-nlp by ahmetaa.

the class Serializer method createDefaultDictionary.

public static void createDefaultDictionary(Path path) throws IOException {
    TurkishMorphology morphology = TurkishMorphology.builder().addDefaultDictionaries().build();
    save(morphology.getLexicon(), path);
}
Also used : TurkishMorphology(zemberek.morphology.analysis.tr.TurkishMorphology)

Example 13 with TurkishMorphology

use of zemberek.morphology.analysis.tr.TurkishMorphology in project zemberek-nlp by ahmetaa.

the class WordAnalysisFormatterTest method formatKnownProperNounsNoQuote.

@Test
public void formatKnownProperNounsNoQuote() throws IOException {
    TurkishMorphology morphology = TurkishMorphology.builder().addDictionaryLines("Blah [A:NoQuote]").build();
    String[] inputs = { "blaha", "Blahta" };
    String[] expected = { "Blaha", "Blahta" };
    check(morphology, inputs, expected);
}
Also used : TurkishMorphology(zemberek.morphology.analysis.tr.TurkishMorphology) Test(org.junit.Test)

Example 14 with TurkishMorphology

use of zemberek.morphology.analysis.tr.TurkishMorphology in project zemberek-nlp by ahmetaa.

the class WordAnalysisFormatterTest method formatKnownProperNouns.

@Test
public void formatKnownProperNouns() throws IOException {
    TurkishMorphology morphology = TurkishMorphology.builder().addDictionaryLines("Ankara", "Iphone [Pr:ayfon]", "Google [Pr:gugıl]").build();
    String[] inputs = { "ankarada", "ıphonumun", "googledan", "Iphone", "Google", "Googlesa" };
    String[] expected = { "Ankara'da", "Iphone'umun", "Google'dan", "Iphone", "Google", "Google'sa" };
    check(morphology, inputs, expected);
}
Also used : TurkishMorphology(zemberek.morphology.analysis.tr.TurkishMorphology) Test(org.junit.Test)

Example 15 with TurkishMorphology

use of zemberek.morphology.analysis.tr.TurkishMorphology in project zemberek-nlp by ahmetaa.

the class WordAnalysisFormatterTest method formatNonProperNoun.

@Test
public void formatNonProperNoun() throws IOException {
    TurkishMorphology morphology = TurkishMorphology.builder().addDictionaryLines("elma", "kitap", "demek", "evet").build();
    String[] inputs = { "elmamadaki", "elma", "kitalarımdan", "kitabımızsa", "diyebileceğimiz", "dedi", "evet" };
    WordAnalysisFormatter formatter = new WordAnalysisFormatter();
    for (String input : inputs) {
        List<WordAnalysis> results = morphology.analyze(input);
        for (WordAnalysis result : results) {
            Assert.assertEquals(input, formatter.format(result, "'"));
        }
    }
}
Also used : TurkishMorphology(zemberek.morphology.analysis.tr.TurkishMorphology) Test(org.junit.Test)

Aggregations

TurkishMorphology (zemberek.morphology.analysis.tr.TurkishMorphology)26 Test (org.junit.Test)13 Ignore (org.junit.Ignore)5 Z3MarkovModelDisambiguator (zemberek.morphology.ambiguity.Z3MarkovModelDisambiguator)5 WordAnalysis (zemberek.morphology.analysis.WordAnalysis)5 TurkishSentenceAnalyzer (zemberek.morphology.analysis.tr.TurkishSentenceAnalyzer)5 Path (java.nio.file.Path)2 UnidentifiedTokenAnalyzer (zemberek.morphology.analysis.tr.UnidentifiedTokenAnalyzer)2 DictionaryItem (zemberek.morphology.lexicon.DictionaryItem)2 Stopwatch (com.google.common.base.Stopwatch)1 BufferedOutputStream (java.io.BufferedOutputStream)1 File (java.io.File)1 FileOutputStream (java.io.FileOutputStream)1 ArrayList (java.util.ArrayList)1 LinkedHashSet (java.util.LinkedHashSet)1 Before (org.junit.Before)1 SentenceAnalysis (zemberek.morphology.analysis.SentenceAnalysis)1 LexiconProto (zemberek.morphology.lexicon.proto.LexiconProto)1 Dictionary (zemberek.morphology.lexicon.proto.LexiconProto.Dictionary)1 TurkishSpellChecker (zemberek.normalization.TurkishSpellChecker)1