Search in sources :

Example 51 with SparkSession

use of org.apache.spark.sql.SparkSession in project net.jgp.labs.spark by jgperrin.

the class FirstPrediction method start.

private void start() {
    SparkSession spark = SparkSession.builder().appName("First Prediction").master("local").getOrCreate();
    StructType schema = new StructType(new StructField[] { new StructField("label", DataTypes.DoubleType, false, Metadata.empty()), new StructField("features", new VectorUDT(), false, Metadata.empty()) });
// TODO this example is not working yet
}
Also used : VectorUDT(org.apache.spark.mllib.linalg.VectorUDT) SparkSession(org.apache.spark.sql.SparkSession) StructField(org.apache.spark.sql.types.StructField) StructType(org.apache.spark.sql.types.StructType)

Example 52 with SparkSession

use of org.apache.spark.sql.SparkSession in project net.jgp.labs.spark by jgperrin.

the class Loader method start.

private void start() {
    SparkConf conf = new SparkConf().setAppName("Concurrency Lab 001").setMaster(Config.MASTER).set("hello", "world");
    JavaSparkContext sc = new JavaSparkContext(conf);
    SparkSession spark = SparkSession.builder().config(conf).getOrCreate();
    String filename = "data/tuple-data-file.csv";
    Dataset<Row> df = spark.read().format("csv").option("inferSchema", "true").option("header", "false").load(filename);
    df.show();
    try {
        df.createGlobalTempView("myView");
    } catch (AnalysisException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    try {
        Thread.sleep(10000);
    } catch (InterruptedException e) {
        System.out.println("Hmmm... Something interrupted the thread: " + e.getMessage());
    }
}
Also used : SparkSession(org.apache.spark.sql.SparkSession) AnalysisException(org.apache.spark.sql.AnalysisException) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) Row(org.apache.spark.sql.Row) SparkConf(org.apache.spark.SparkConf)

Example 53 with SparkSession

use of org.apache.spark.sql.SparkSession in project net.jgp.labs.spark by jgperrin.

the class ListNCSchoolDistricts method main.

/**
 * @param args
 */
public static void main(String[] args) {
    String filename = "/tmp/" + System.currentTimeMillis() + ".json";
    try {
        FileUtils.copyURLToFile(new URL("https://opendurham.nc.gov/explore/dataset/north-carolina-school-performance-data/download/?format=json&timezone=America/New_York"), new File(filename));
    } catch (MalformedURLException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    System.out.println("File " + filename + " downloaded");
    SparkSession spark = SparkSession.builder().appName("NC Schools").master("local").getOrCreate();
    String fileToAnalyze = "/tmp/" + filename;
    System.out.println("File to analyze: " + fileToAnalyze);
    Dataset<Row> df;
    df = spark.read().option("dateFormat", "yyyy-mm-dd").json(fileToAnalyze);
    df = df.withColumn("district", df.col("fields.district"));
    df = df.groupBy("district").count().orderBy(df.col("district"));
    df.show(150, false);
}
Also used : MalformedURLException(java.net.MalformedURLException) SparkSession(org.apache.spark.sql.SparkSession) IOException(java.io.IOException) Row(org.apache.spark.sql.Row) File(java.io.File) URL(java.net.URL)

Aggregations

SparkSession (org.apache.spark.sql.SparkSession)53 Row (org.apache.spark.sql.Row)43 StructType (org.apache.spark.sql.types.StructType)11 ArrayList (java.util.ArrayList)6 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)6 StructField (org.apache.spark.sql.types.StructField)6 SparkConf (org.apache.spark.SparkConf)4 JavaRDD (org.apache.spark.api.java.JavaRDD)3 Script (org.apache.sysml.api.mlcontext.Script)3 Test (org.junit.Test)3 Dataset (org.apache.spark.sql.Dataset)2 StreamingQuery (org.apache.spark.sql.streaming.StreamingQuery)2 StreamingQueryException (org.apache.spark.sql.streaming.StreamingQueryException)2 DMLScript (org.apache.sysml.api.DMLScript)2 RUNTIME_PLATFORM (org.apache.sysml.api.DMLScript.RUNTIME_PLATFORM)2 MLContext (org.apache.sysml.api.mlcontext.MLContext)2 Matrix (org.apache.sysml.api.mlcontext.Matrix)2 MatrixBlock (org.apache.sysml.runtime.matrix.data.MatrixBlock)2 MatrixIndexes (org.apache.sysml.runtime.matrix.data.MatrixIndexes)2 File (java.io.File)1