Search in sources :

Example 1 with TokenizerFunction

use of org.deeplearning4j.spark.models.sequencevectors.functions.TokenizerFunction in project deeplearning4j by deeplearning4j.

the class SparkWord2Vec method fitSentences.

public void fitSentences(JavaRDD<String> sentences) {
    /**
         * Basically all we want here is tokenization, to get JavaRDD<Sequence<VocabWord>> out of Strings, and then we just go  for SeqVec
         */
    validateConfiguration();
    final JavaSparkContext context = new JavaSparkContext(sentences.context());
    broadcastEnvironment(context);
    JavaRDD<Sequence<VocabWord>> seqRdd = sentences.map(new TokenizerFunction(configurationBroadcast));
    // now since we have new rdd - just pass it to SeqVec
    super.fitSequences(seqRdd);
}
Also used : TokenizerFunction(org.deeplearning4j.spark.models.sequencevectors.functions.TokenizerFunction) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) Sequence(org.deeplearning4j.models.sequencevectors.sequence.Sequence)

Aggregations

JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)1 Sequence (org.deeplearning4j.models.sequencevectors.sequence.Sequence)1 TokenizerFunction (org.deeplearning4j.spark.models.sequencevectors.functions.TokenizerFunction)1