Search in sources :

Example 1 with TurkishMorphology

use of zemberek.morphology.TurkishMorphology in project zemberek-nlp by ahmetaa.

the class DictionarySerializer method serializeDeserializeTest.

private static void serializeDeserializeTest() throws IOException {
    TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
    RootLexicon lexicon = morphology.getLexicon();
    Dictionary.Builder builder = Dictionary.newBuilder();
    for (DictionaryItem item : lexicon.getAllItems()) {
        builder.addItems(convertToProto(item));
    }
    Dictionary dictionary = builder.build();
    System.out.println("Total size of serialized dictionary: " + dictionary.getSerializedSize());
    Path f = Files.createTempFile("lexicon", ".bin");
    BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(f.toFile()));
    bos.write(dictionary.toByteArray());
    bos.close();
    long start = System.currentTimeMillis();
    byte[] serialized = Files.readAllBytes(f);
    long end = System.currentTimeMillis();
    Log.info("Dictionary loaded in %d ms.", (end - start));
    start = System.currentTimeMillis();
    Dictionary readDictionary = Dictionary.parseFrom(serialized);
    end = System.currentTimeMillis();
    Log.info("Dictionary deserialized in %d ms.", (end - start));
    System.out.println("Total size of read dictionary: " + readDictionary.getSerializedSize());
    start = System.currentTimeMillis();
    RootLexicon loadedLexicon = new RootLexicon();
    for (LexiconProto.DictionaryItem item : readDictionary.getItemsList()) {
        loadedLexicon.add(convertToDictionaryItem(item));
    }
    end = System.currentTimeMillis();
    Log.info("RootLexicon generated in %d ms.", (end - start));
}
Also used : Path(java.nio.file.Path) LexiconProto(zemberek.morphology.lexicon.proto.LexiconProto) Dictionary(zemberek.morphology.lexicon.proto.LexiconProto.Dictionary) FileOutputStream(java.io.FileOutputStream) TurkishMorphology(zemberek.morphology.TurkishMorphology) BufferedOutputStream(java.io.BufferedOutputStream)

Example 2 with TurkishMorphology

use of zemberek.morphology.TurkishMorphology in project zemberek-nlp by ahmetaa.

the class AmbiguityResolutionTests method shouldNotThrowException.

@Test
public void shouldNotThrowException() throws IOException {
    List<String> lines = TextIO.loadLinesFromResource("corpora/cnn-turk-10k");
    lines = lines.subList(0, 1000);
    TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
    for (String line : lines) {
        List<String> sentences = TurkishSentenceExtractor.DEFAULT.fromParagraph(line);
        for (String sentence : sentences) {
            morphology.analyzeAndDisambiguate(sentence);
        }
    }
}
Also used : TurkishMorphology(zemberek.morphology.TurkishMorphology) Test(org.junit.Test)

Example 3 with TurkishMorphology

use of zemberek.morphology.TurkishMorphology in project zemberek-nlp by ahmetaa.

the class CharacterGraphDecoderTest method stemEndingTest2.

@Test
public void stemEndingTest2() {
    TurkishMorphology morphology = TurkishMorphology.builder().setLexicon("üzmek", "yüz", "güz").build();
    List<String> endings = Lists.newArrayList("düm");
    StemEndingGraph graph = new StemEndingGraph(morphology, endings);
    CharacterGraphDecoder spellChecker = new CharacterGraphDecoder(graph.stemGraph);
    List<ScoredItem<String>> res = spellChecker.getSuggestionsWithScores("yüzdüm");
    Assert.assertEquals(3, res.size());
    assertContainsAll(res, "yüzdüm", "üzdüm", "güzdüm");
}
Also used : ScoredItem(zemberek.core.ScoredItem) TurkishMorphology(zemberek.morphology.TurkishMorphology) Test(org.junit.Test)

Example 4 with TurkishMorphology

use of zemberek.morphology.TurkishMorphology in project zemberek-nlp by ahmetaa.

the class CharacterGraphDecoderTest method stemEndingTest3.

@Test
public void stemEndingTest3() {
    TurkishMorphology morphology = TurkishMorphology.builder().setLexicon("o", "ol", "ola").build();
    List<String> endings = Lists.newArrayList("arak", "acak");
    StemEndingGraph graph = new StemEndingGraph(morphology, endings);
    CharacterGraphDecoder spellChecker = new CharacterGraphDecoder(graph.stemGraph);
    List<ScoredItem<String>> res = spellChecker.getSuggestionsWithScores("olarak");
    assertContainsAll(res, "olarak", "olacak", "olaarak");
}
Also used : ScoredItem(zemberek.core.ScoredItem) TurkishMorphology(zemberek.morphology.TurkishMorphology) Test(org.junit.Test)

Example 5 with TurkishMorphology

use of zemberek.morphology.TurkishMorphology in project zemberek-nlp by ahmetaa.

the class CharacterGraphDecoderTest method stemEndingTest.

@Test
public void stemEndingTest() {
    TurkishMorphology morphology = TurkishMorphology.builder().setLexicon("Türkiye", "Bayram").build();
    List<String> endings = Lists.newArrayList("ında", "de");
    StemEndingGraph graph = new StemEndingGraph(morphology, endings);
    CharacterGraphDecoder spellChecker = new CharacterGraphDecoder(graph.stemGraph);
    List<ScoredItem<String>> res = spellChecker.getSuggestionsWithScores("türkiyede");
    assertContainsAll(res, "türkiyede");
}
Also used : ScoredItem(zemberek.core.ScoredItem) TurkishMorphology(zemberek.morphology.TurkishMorphology) Test(org.junit.Test)

Aggregations

TurkishMorphology (zemberek.morphology.TurkishMorphology)87 Test (org.junit.Test)38 Path (java.nio.file.Path)34 ArrayList (java.util.ArrayList)23 SingleAnalysis (zemberek.morphology.analysis.SingleAnalysis)23 WordAnalysis (zemberek.morphology.analysis.WordAnalysis)23 Ignore (org.junit.Ignore)21 DictionaryItem (zemberek.morphology.lexicon.DictionaryItem)15 LinkedHashSet (java.util.LinkedHashSet)13 PrintWriter (java.io.PrintWriter)10 SentenceAnalysis (zemberek.morphology.analysis.SentenceAnalysis)10 Stopwatch (com.google.common.base.Stopwatch)8 Histogram (zemberek.core.collections.Histogram)8 Token (zemberek.tokenization.Token)8 HashSet (java.util.HashSet)7 SentenceWordAnalysis (zemberek.morphology.analysis.SentenceWordAnalysis)7 TurkishTokenizer (zemberek.tokenization.TurkishTokenizer)7 ScoredItem (zemberek.core.ScoredItem)6 IOException (java.io.IOException)5 BlockTextLoader (zemberek.core.text.BlockTextLoader)5