Search in sources :

Example 6 with FastTextTrainer

use of zemberek.core.embeddings.FastTextTrainer in project zemberek-nlp by ahmetaa.

the class FastTextTest method skipgram.

/**
 * Generates word vectors using skip-gram model. run with -Xms8G or more.
 */
@Test
@Ignore("Not an actual Test.")
public void skipgram() throws Exception {
    Args argz = Args.forWordVectors(Args.model_name.skipGram);
    argz.thread = 4;
    argz.epoch = 10;
    argz.dim = 100;
    argz.bucket = 2_000_000;
    argz.minn = 3;
    argz.maxn = 6;
    argz.subWordHashProvider = new EmbeddingHashProviders.CharacterNgramHashProvider(argz.minn, argz.maxn);
    Path input = Paths.get("/home/ahmetaa/data/nlp/corpora/sentences.50k");
    Path outRoot = Paths.get("/home/ahmetaa/data/fasttext");
    FastText fastText = new FastTextTrainer(argz).train(input);
    Path vectorFile = outRoot.resolve("sentences-50k-skipgram.vec");
    Log.info("Saving vectors to %s", vectorFile);
    fastText.saveVectors(vectorFile);
}
Also used : Path(java.nio.file.Path) Args(zemberek.core.embeddings.Args) EmbeddingHashProviders(zemberek.core.embeddings.EmbeddingHashProviders) FastTextTrainer(zemberek.core.embeddings.FastTextTrainer) FastText(zemberek.core.embeddings.FastText) Ignore(org.junit.Ignore) Test(org.junit.Test)

Example 7 with FastTextTrainer

use of zemberek.core.embeddings.FastTextTrainer in project zemberek-nlp by ahmetaa.

the class FastTextTest method dbpediaClassificationTest.

/**
 * Runs the dbpedia classification task. run with -Xms8G or more.
 */
@Test
@Ignore("Not an actual Test.")
public void dbpediaClassificationTest() throws Exception {
    Path inputRoot = Paths.get("/media/aaa/3t/aaa/fasttext");
    Path trainFile = inputRoot.resolve("dbpedia.train");
    Path modelPath = Paths.get("/media/aaa/3t/aaa/fasttext/dbpedia.model.bin");
    FastText fastText;
    if (modelPath.toFile().exists()) {
        fastText = FastText.load(modelPath);
    } else {
        Args argz = Args.forSupervised();
        argz.thread = 4;
        argz.epoch = 5;
        argz.wordNgrams = 2;
        argz.minCount = 1;
        argz.lr = 0.1;
        argz.dim = 32;
        argz.bucket = 5_000_000;
        fastText = new FastTextTrainer(argz).train(trainFile);
        fastText.saveModel(modelPath);
    }
    Path testFile = inputRoot.resolve("dbpedia.test");
    Log.info("Testing started.");
    EvaluationResult result = fastText.test(testFile, 1);
    Log.info(result.toString());
}
Also used : Path(java.nio.file.Path) Args(zemberek.core.embeddings.Args) FastTextTrainer(zemberek.core.embeddings.FastTextTrainer) EvaluationResult(zemberek.core.embeddings.FastText.EvaluationResult) FastText(zemberek.core.embeddings.FastText) Ignore(org.junit.Ignore) Test(org.junit.Test)

Aggregations

Args (zemberek.core.embeddings.Args)7 FastTextTrainer (zemberek.core.embeddings.FastTextTrainer)7 FastText (zemberek.core.embeddings.FastText)6 Path (java.nio.file.Path)4 Ignore (org.junit.Ignore)3 Test (org.junit.Test)3 EmbeddingHashProviders (zemberek.core.embeddings.EmbeddingHashProviders)2 EvaluationResult (zemberek.core.embeddings.FastText.EvaluationResult)2 PrintWriter (java.io.PrintWriter)1 ArrayList (java.util.ArrayList)1 ScoredItem (zemberek.core.ScoredItem)1 SubWordHashProvider (zemberek.core.embeddings.SubWordHashProvider)1 WebCorpus (zemberek.corpus.WebCorpus)1 WebDocument (zemberek.corpus.WebDocument)1