use of org.deeplearning4j.spark.models.sequencevectors.functions.TokenizerFunction in project deeplearning4j by deeplearning4j.
the class SparkWord2Vec method fitSentences.
public void fitSentences(JavaRDD<String> sentences) {
/**
* Basically all we want here is tokenization, to get JavaRDD<Sequence<VocabWord>> out of Strings, and then we just go for SeqVec
*/
validateConfiguration();
final JavaSparkContext context = new JavaSparkContext(sentences.context());
broadcastEnvironment(context);
JavaRDD<Sequence<VocabWord>> seqRdd = sentences.map(new TokenizerFunction(configurationBroadcast));
// now since we have new rdd - just pass it to SeqVec
super.fitSequences(seqRdd);
}
Aggregations