Search in sources :

Example 1 with Multiplier2

use of net.jgp.labs.spark.x.udf.Multiplier2 in project net.jgp.labs.spark by jgperrin.

the class BasicExternalUdfFromTextFile method start.

private void start() {
    SparkSession spark = SparkSession.builder().appName("CSV to Dataset").master("local").getOrCreate();
    spark.udf().register("x2Multiplier", new Multiplier2(), DataTypes.IntegerType);
    String filename = "data/tuple-data-file.csv";
    Dataset<Row> df = spark.read().format("csv").option("inferSchema", "true").option("header", "false").load(filename);
    df = df.withColumn("label", df.col("_c0")).drop("_c0");
    df = df.withColumn("value", df.col("_c1")).drop("_c1");
    df = df.withColumn("x2", callUDF("x2Multiplier", df.col("value").cast(DataTypes.IntegerType)));
    df.show();
}
Also used : Multiplier2(net.jgp.labs.spark.x.udf.Multiplier2) SparkSession(org.apache.spark.sql.SparkSession) Row(org.apache.spark.sql.Row)

Aggregations

Multiplier2 (net.jgp.labs.spark.x.udf.Multiplier2)1 Row (org.apache.spark.sql.Row)1 SparkSession (org.apache.spark.sql.SparkSession)1