Search in sources :

Example 1 with SparkConf

use of org.apache.spark.SparkConf in project hbase by apache.

the class JavaHBaseBulkPutExample method main.

public static void main(String[] args) {
    if (args.length < 2) {
        System.out.println("JavaHBaseBulkPutExample  " + "{tableName} {columnFamily}");
        return;
    }
    String tableName = args[0];
    String columnFamily = args[1];
    SparkConf sparkConf = new SparkConf().setAppName("JavaHBaseBulkPutExample " + tableName);
    JavaSparkContext jsc = new JavaSparkContext(sparkConf);
    try {
        List<String> list = new ArrayList<>(5);
        list.add("1," + columnFamily + ",a,1");
        list.add("2," + columnFamily + ",a,2");
        list.add("3," + columnFamily + ",a,3");
        list.add("4," + columnFamily + ",a,4");
        list.add("5," + columnFamily + ",a,5");
        JavaRDD<String> rdd = jsc.parallelize(list);
        Configuration conf = HBaseConfiguration.create();
        JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf);
        hbaseContext.bulkPut(rdd, TableName.valueOf(tableName), new PutFunction());
    } finally {
        jsc.stop();
    }
}
Also used : HBaseConfiguration(org.apache.hadoop.hbase.HBaseConfiguration) Configuration(org.apache.hadoop.conf.Configuration) ArrayList(java.util.ArrayList) JavaHBaseContext(org.apache.hadoop.hbase.spark.JavaHBaseContext) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) SparkConf(org.apache.spark.SparkConf)

Example 2 with SparkConf

use of org.apache.spark.SparkConf in project zeppelin by apache.

the class SparkInterpreter method createHttpServer.

private Object createHttpServer(File outputDir) {
    SparkConf conf = new SparkConf();
    try {
        // try to create HttpServer
        Constructor<?> constructor = getClass().getClassLoader().loadClass("org.apache.spark.HttpServer").getConstructor(new Class[] { SparkConf.class, File.class, SecurityManager.class, int.class, String.class });
        Object securityManager = createSecurityManager(conf);
        return constructor.newInstance(new Object[] { conf, outputDir, securityManager, 0, "HTTP Server" });
    } catch (ClassNotFoundException | NoSuchMethodException | IllegalAccessException | InstantiationException | InvocationTargetException e) {
        // fallback to old constructor
        Constructor<?> constructor = null;
        try {
            constructor = getClass().getClassLoader().loadClass("org.apache.spark.HttpServer").getConstructor(new Class[] { File.class, SecurityManager.class, int.class, String.class });
            return constructor.newInstance(new Object[] { outputDir, createSecurityManager(conf), 0, "HTTP Server" });
        } catch (ClassNotFoundException | NoSuchMethodException | IllegalAccessException | InstantiationException | InvocationTargetException e1) {
            logger.error(e1.getMessage(), e1);
            return null;
        }
    }
}
Also used : Constructor(java.lang.reflect.Constructor) SparkConf(org.apache.spark.SparkConf) InvocationTargetException(java.lang.reflect.InvocationTargetException)

Example 3 with SparkConf

use of org.apache.spark.SparkConf in project learning-spark by databricks.

the class LogAnalyzerAppMain method main.

public static void main(String[] args) throws IOException {
    Flags.setFromCommandLineArgs(THE_OPTIONS, args);
    // Startup the Spark Conf.
    SparkConf conf = new SparkConf().setAppName("A Databricks Reference Application: Logs Analysis with Spark");
    JavaStreamingContext jssc = new JavaStreamingContext(conf, Flags.getInstance().getSlideInterval());
    // Checkpointing must be enabled to use the updateStateByKey function & windowed operations.
    jssc.checkpoint(Flags.getInstance().getCheckpointDirectory());
    // This methods monitors a directory for new files to read in for streaming.
    JavaDStream<String> logData = jssc.textFileStream(Flags.getInstance().getLogsDirectory());
    JavaDStream<ApacheAccessLog> accessLogsDStream = logData.map(new Functions.ParseFromLogLine()).cache();
    final LogAnalyzerTotal logAnalyzerTotal = new LogAnalyzerTotal();
    final LogAnalyzerWindowed logAnalyzerWindowed = new LogAnalyzerWindowed();
    // Process the DStream which gathers stats for all of time.
    logAnalyzerTotal.processAccessLogs(Flags.getInstance().getOutputDirectory(), accessLogsDStream);
    // Calculate statistics for the last time interval.
    logAnalyzerWindowed.processAccessLogs(Flags.getInstance().getOutputDirectory(), accessLogsDStream);
    // Render the output each time there is a new RDD in the accessLogsDStream.
    final Renderer renderer = new Renderer();
    accessLogsDStream.foreachRDD(new Function<JavaRDD<ApacheAccessLog>, Void>() {

        public Void call(JavaRDD<ApacheAccessLog> rdd) {
            // Call this to output the stats.
            try {
                renderer.render(logAnalyzerTotal.getLogStatistics(), logAnalyzerWindowed.getLogStatistics());
            } catch (Exception e) {
            }
            return null;
        }
    });
    // Start the streaming server.
    // Start the computation
    jssc.start();
    // Wait for the computation to terminate
    jssc.awaitTermination();
}
Also used : IOException(java.io.IOException) JavaRDD(org.apache.spark.api.java.JavaRDD) JavaStreamingContext(org.apache.spark.streaming.api.java.JavaStreamingContext) SparkConf(org.apache.spark.SparkConf)

Example 4 with SparkConf

use of org.apache.spark.SparkConf in project learning-spark by databricks.

the class KafkaInput method main.

public static void main(String[] args) throws Exception {
    String zkQuorum = args[0];
    String group = args[1];
    SparkConf conf = new SparkConf().setAppName("KafkaInput");
    // Create a StreamingContext with a 1 second batch size
    JavaStreamingContext jssc = new JavaStreamingContext(conf, new Duration(1000));
    Map<String, Integer> topics = new HashMap<String, Integer>();
    topics.put("pandas", 1);
    JavaPairDStream<String, String> input = KafkaUtils.createStream(jssc, zkQuorum, group, topics);
    input.print();
    // start our streaming context and wait for it to "finish"
    jssc.start();
    // Wait for 10 seconds then exit. To run forever call without a timeout
    jssc.awaitTermination(10000);
    // Stop the streaming context
    jssc.stop();
}
Also used : JavaStreamingContext(org.apache.spark.streaming.api.java.JavaStreamingContext) HashMap(java.util.HashMap) Duration(org.apache.spark.streaming.Duration) SparkConf(org.apache.spark.SparkConf)

Example 5 with SparkConf

use of org.apache.spark.SparkConf in project learning-spark by databricks.

the class MLlib method main.

public static void main(String[] args) {
    SparkConf sparkConf = new SparkConf().setAppName("JavaBookExample");
    JavaSparkContext sc = new JavaSparkContext(sparkConf);
    // Load 2 types of emails from text files: spam and ham (non-spam).
    // Each line has text from one email.
    JavaRDD<String> spam = sc.textFile("files/spam.txt");
    JavaRDD<String> ham = sc.textFile("files/ham.txt");
    // Create a HashingTF instance to map email text to vectors of 100 features.
    final HashingTF tf = new HashingTF(100);
    // Each email is split into words, and each word is mapped to one feature.
    // Create LabeledPoint datasets for positive (spam) and negative (ham) examples.
    JavaRDD<LabeledPoint> positiveExamples = spam.map(new Function<String, LabeledPoint>() {

        @Override
        public LabeledPoint call(String email) {
            return new LabeledPoint(1, tf.transform(Arrays.asList(email.split(" "))));
        }
    });
    JavaRDD<LabeledPoint> negativeExamples = ham.map(new Function<String, LabeledPoint>() {

        @Override
        public LabeledPoint call(String email) {
            return new LabeledPoint(0, tf.transform(Arrays.asList(email.split(" "))));
        }
    });
    JavaRDD<LabeledPoint> trainingData = positiveExamples.union(negativeExamples);
    // Cache data since Logistic Regression is an iterative algorithm.
    trainingData.cache();
    // Create a Logistic Regression learner which uses the LBFGS optimizer.
    LogisticRegressionWithSGD lrLearner = new LogisticRegressionWithSGD();
    // Run the actual learning algorithm on the training data.
    LogisticRegressionModel model = lrLearner.run(trainingData.rdd());
    // Test on a positive example (spam) and a negative one (ham).
    // First apply the same HashingTF feature transformation used on the training data.
    Vector posTestExample = tf.transform(Arrays.asList("O M G GET cheap stuff by sending money to ...".split(" ")));
    Vector negTestExample = tf.transform(Arrays.asList("Hi Dad, I started studying Spark the other ...".split(" ")));
    // Now use the learned model to predict spam/ham for new emails.
    System.out.println("Prediction for positive test example: " + model.predict(posTestExample));
    System.out.println("Prediction for negative test example: " + model.predict(negTestExample));
    sc.stop();
}
Also used : HashingTF(org.apache.spark.mllib.feature.HashingTF) LogisticRegressionWithSGD(org.apache.spark.mllib.classification.LogisticRegressionWithSGD) LogisticRegressionModel(org.apache.spark.mllib.classification.LogisticRegressionModel) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) LabeledPoint(org.apache.spark.mllib.regression.LabeledPoint) SparkConf(org.apache.spark.SparkConf) Vector(org.apache.spark.mllib.linalg.Vector)

Aggregations

SparkConf (org.apache.spark.SparkConf)83 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)46 Test (org.junit.Test)21 ArrayList (java.util.ArrayList)20 Configuration (org.apache.hadoop.conf.Configuration)20 Tuple2 (scala.Tuple2)15 Graph (uk.gov.gchq.gaffer.graph.Graph)13 DataOutputStream (java.io.DataOutputStream)11 File (java.io.File)10 HashSet (java.util.HashSet)10 ByteArrayOutputStream (org.apache.commons.io.output.ByteArrayOutputStream)10 Edge (uk.gov.gchq.gaffer.data.element.Edge)10 Element (uk.gov.gchq.gaffer.data.element.Element)10 Entity (uk.gov.gchq.gaffer.data.element.Entity)10 User (uk.gov.gchq.gaffer.user.User)10 Ignore (org.junit.Ignore)6 HBaseConfiguration (org.apache.hadoop.hbase.HBaseConfiguration)5 JavaHBaseContext (org.apache.hadoop.hbase.spark.JavaHBaseContext)5 Test (org.testng.annotations.Test)5 AddElements (uk.gov.gchq.gaffer.operation.impl.add.AddElements)5