Search in sources :

Example 1 with LogisticRegressionWithSGD

use of org.apache.spark.mllib.classification.LogisticRegressionWithSGD in project spring-boot-quick by vector4wang.

the class EmailFilter method main.

public static void main(String[] args) {
    SparkConf conf = new SparkConf().setMaster("local").setAppName("垃圾邮件分类");
    JavaSparkContext sc = new JavaSparkContext(conf);
    JavaRDD<String> ham = sc.textFile("D:\\githubspace\\springbootquick\\src\\main\\resources\\ham.txt");
    JavaRDD<String> spam = sc.textFile("D:\\githubspace\\springbootquick\\src\\main\\resources\\spam.txt");
    final HashingTF tf = new HashingTF(10000);
    JavaRDD<LabeledPoint> posExamples = spam.map(h -> new LabeledPoint(1, tf.transform(Arrays.asList(h.split(" ")))));
    JavaRDD<LabeledPoint> negExamples = ham.map(s -> new LabeledPoint(0, tf.transform(Arrays.asList(s.split(" ")))));
    JavaRDD<LabeledPoint> trainingData = posExamples.union(negExamples);
    trainingData.cache();
    LogisticRegressionWithSGD lrLearner = new LogisticRegressionWithSGD();
    LogisticRegressionModel model = lrLearner.run(trainingData.rdd());
    Vector posTestExample = tf.transform(Arrays.asList("O M G GET cheap stuff by sending money to ...".split(" ")));
    System.out.println(posTestExample.toJson());
    Vector negTestExample = tf.transform(Arrays.asList("Hi Dad, I started studying Spark the other ...".split(" ")));
    System.out.println("Prediction for positive test example: " + model.predict(posTestExample));
    System.out.println("Prediction for negative test example: " + model.predict(negTestExample));
}
Also used : HashingTF(org.apache.spark.mllib.feature.HashingTF) LogisticRegressionWithSGD(org.apache.spark.mllib.classification.LogisticRegressionWithSGD) LogisticRegressionModel(org.apache.spark.mllib.classification.LogisticRegressionModel) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) LabeledPoint(org.apache.spark.mllib.regression.LabeledPoint) SparkConf(org.apache.spark.SparkConf) Vector(org.apache.spark.mllib.linalg.Vector)

Example 2 with LogisticRegressionWithSGD

use of org.apache.spark.mllib.classification.LogisticRegressionWithSGD in project learning-spark by databricks.

the class MLlib method main.

public static void main(String[] args) {
    SparkConf sparkConf = new SparkConf().setAppName("JavaBookExample");
    JavaSparkContext sc = new JavaSparkContext(sparkConf);
    // Load 2 types of emails from text files: spam and ham (non-spam).
    // Each line has text from one email.
    JavaRDD<String> spam = sc.textFile("files/spam.txt");
    JavaRDD<String> ham = sc.textFile("files/ham.txt");
    // Create a HashingTF instance to map email text to vectors of 100 features.
    final HashingTF tf = new HashingTF(100);
    // Each email is split into words, and each word is mapped to one feature.
    // Create LabeledPoint datasets for positive (spam) and negative (ham) examples.
    JavaRDD<LabeledPoint> positiveExamples = spam.map(new Function<String, LabeledPoint>() {

        @Override
        public LabeledPoint call(String email) {
            return new LabeledPoint(1, tf.transform(Arrays.asList(email.split(" "))));
        }
    });
    JavaRDD<LabeledPoint> negativeExamples = ham.map(new Function<String, LabeledPoint>() {

        @Override
        public LabeledPoint call(String email) {
            return new LabeledPoint(0, tf.transform(Arrays.asList(email.split(" "))));
        }
    });
    JavaRDD<LabeledPoint> trainingData = positiveExamples.union(negativeExamples);
    // Cache data since Logistic Regression is an iterative algorithm.
    trainingData.cache();
    // Create a Logistic Regression learner which uses the LBFGS optimizer.
    LogisticRegressionWithSGD lrLearner = new LogisticRegressionWithSGD();
    // Run the actual learning algorithm on the training data.
    LogisticRegressionModel model = lrLearner.run(trainingData.rdd());
    // Test on a positive example (spam) and a negative one (ham).
    // First apply the same HashingTF feature transformation used on the training data.
    Vector posTestExample = tf.transform(Arrays.asList("O M G GET cheap stuff by sending money to ...".split(" ")));
    Vector negTestExample = tf.transform(Arrays.asList("Hi Dad, I started studying Spark the other ...".split(" ")));
    // Now use the learned model to predict spam/ham for new emails.
    System.out.println("Prediction for positive test example: " + model.predict(posTestExample));
    System.out.println("Prediction for negative test example: " + model.predict(negTestExample));
    sc.stop();
}
Also used : HashingTF(org.apache.spark.mllib.feature.HashingTF) LogisticRegressionWithSGD(org.apache.spark.mllib.classification.LogisticRegressionWithSGD) LogisticRegressionModel(org.apache.spark.mllib.classification.LogisticRegressionModel) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) LabeledPoint(org.apache.spark.mllib.regression.LabeledPoint) SparkConf(org.apache.spark.SparkConf) Vector(org.apache.spark.mllib.linalg.Vector)

Aggregations

SparkConf (org.apache.spark.SparkConf)2 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)2 LogisticRegressionModel (org.apache.spark.mllib.classification.LogisticRegressionModel)2 LogisticRegressionWithSGD (org.apache.spark.mllib.classification.LogisticRegressionWithSGD)2 HashingTF (org.apache.spark.mllib.feature.HashingTF)2 Vector (org.apache.spark.mllib.linalg.Vector)2 LabeledPoint (org.apache.spark.mllib.regression.LabeledPoint)2