Search in sources :

Example 1 with LocalSparkJobRef

use of org.apache.hadoop.hive.ql.exec.spark.status.impl.LocalSparkJobRef in project hive by apache.

the class LocalHiveSparkClient method execute.

@Override
public SparkJobRef execute(DriverContext driverContext, SparkWork sparkWork) throws Exception {
    Context ctx = driverContext.getCtx();
    HiveConf hiveConf = (HiveConf) ctx.getConf();
    refreshLocalResources(sparkWork, hiveConf);
    JobConf jobConf = new JobConf(hiveConf);
    // Create temporary scratch dir
    Path emptyScratchDir;
    emptyScratchDir = ctx.getMRTmpPath();
    FileSystem fs = emptyScratchDir.getFileSystem(jobConf);
    fs.mkdirs(emptyScratchDir);
    // Update credential provider location
    // the password to the credential provider in already set in the sparkConf
    // in HiveSparkClientFactory
    HiveConfUtil.updateJobCredentialProviders(jobConf);
    SparkCounters sparkCounters = new SparkCounters(sc);
    Map<String, List<String>> prefixes = sparkWork.getRequiredCounterPrefix();
    if (prefixes != null) {
        for (String group : prefixes.keySet()) {
            for (String counterName : prefixes.get(group)) {
                sparkCounters.createCounter(group, counterName);
            }
        }
    }
    SparkReporter sparkReporter = new SparkReporter(sparkCounters);
    // Generate Spark plan
    SparkPlanGenerator gen = new SparkPlanGenerator(sc, ctx, jobConf, emptyScratchDir, sparkReporter);
    SparkPlan plan = gen.generate(sparkWork);
    if (driverContext.isShutdown()) {
        throw new HiveException("Operation is cancelled.");
    }
    // Execute generated plan.
    JavaPairRDD<HiveKey, BytesWritable> finalRDD = plan.generateGraph();
    sc.setJobGroup("queryId = " + sparkWork.getQueryId(), DagUtils.getQueryName(jobConf));
    // We use Spark RDD async action to submit job as it's the only way to get jobId now.
    JavaFutureAction<Void> future = finalRDD.foreachAsync(HiveVoidFunction.getInstance());
    // As we always use foreach action to submit RDD graph, it would only trigger one job.
    int jobId = future.jobIds().get(0);
    LocalSparkJobStatus sparkJobStatus = new LocalSparkJobStatus(sc, jobId, jobMetricsListener, sparkCounters, plan.getCachedRDDIds(), future);
    return new LocalSparkJobRef(Integer.toString(jobId), hiveConf, sparkJobStatus, sc);
}
Also used : JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) Context(org.apache.hadoop.hive.ql.Context) DriverContext(org.apache.hadoop.hive.ql.DriverContext) Path(org.apache.hadoop.fs.Path) SparkCounters(org.apache.hive.spark.counter.SparkCounters) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) BytesWritable(org.apache.hadoop.io.BytesWritable) LocalSparkJobStatus(org.apache.hadoop.hive.ql.exec.spark.status.impl.LocalSparkJobStatus) HiveKey(org.apache.hadoop.hive.ql.io.HiveKey) FileSystem(org.apache.hadoop.fs.FileSystem) HiveConf(org.apache.hadoop.hive.conf.HiveConf) ArrayList(java.util.ArrayList) List(java.util.List) LocalSparkJobRef(org.apache.hadoop.hive.ql.exec.spark.status.impl.LocalSparkJobRef) JobConf(org.apache.hadoop.mapred.JobConf)

Aggregations

ArrayList (java.util.ArrayList)1 List (java.util.List)1 FileSystem (org.apache.hadoop.fs.FileSystem)1 Path (org.apache.hadoop.fs.Path)1 HiveConf (org.apache.hadoop.hive.conf.HiveConf)1 Context (org.apache.hadoop.hive.ql.Context)1 DriverContext (org.apache.hadoop.hive.ql.DriverContext)1 LocalSparkJobRef (org.apache.hadoop.hive.ql.exec.spark.status.impl.LocalSparkJobRef)1 LocalSparkJobStatus (org.apache.hadoop.hive.ql.exec.spark.status.impl.LocalSparkJobStatus)1 HiveKey (org.apache.hadoop.hive.ql.io.HiveKey)1 HiveException (org.apache.hadoop.hive.ql.metadata.HiveException)1 BytesWritable (org.apache.hadoop.io.BytesWritable)1 JobConf (org.apache.hadoop.mapred.JobConf)1 SparkCounters (org.apache.hive.spark.counter.SparkCounters)1 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)1