Search in sources :

Example 1 with RemoteSparkJobRef

use of org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef in project hive by apache.

the class RemoteHiveSparkClient method submit.

private SparkJobRef submit(final DriverContext driverContext, final SparkWork sparkWork) throws Exception {
    final Context ctx = driverContext.getCtx();
    final HiveConf hiveConf = (HiveConf) ctx.getConf();
    refreshLocalResources(sparkWork, hiveConf);
    final JobConf jobConf = new JobConf(hiveConf);
    // update the credential provider location in the jobConf
    HiveConfUtil.updateJobCredentialProviders(jobConf);
    // Create temporary scratch dir
    final Path emptyScratchDir = ctx.getMRTmpPath();
    FileSystem fs = emptyScratchDir.getFileSystem(jobConf);
    fs.mkdirs(emptyScratchDir);
    // make sure NullScanFileSystem can be loaded - HIVE-18442
    jobConf.set("fs." + NullScanFileSystem.getBaseScheme() + ".impl", NullScanFileSystem.class.getCanonicalName());
    byte[] jobConfBytes = KryoSerializer.serializeJobConf(jobConf);
    byte[] scratchDirBytes = KryoSerializer.serialize(emptyScratchDir);
    byte[] sparkWorkBytes = KryoSerializer.serialize(sparkWork);
    JobStatusJob job = new JobStatusJob(jobConfBytes, scratchDirBytes, sparkWorkBytes);
    if (driverContext.isShutdown()) {
        throw new HiveException("Operation is cancelled.");
    }
    JobHandle<Serializable> jobHandle = remoteClient.submit(job);
    RemoteSparkJobStatus sparkJobStatus = new RemoteSparkJobStatus(remoteClient, jobHandle, sparkClientTimtout);
    return new RemoteSparkJobRef(hiveConf, jobHandle, sparkJobStatus);
}
Also used : Context(org.apache.hadoop.hive.ql.Context) DriverContext(org.apache.hadoop.hive.ql.DriverContext) JobContext(org.apache.hive.spark.client.JobContext) Path(org.apache.hadoop.fs.Path) Serializable(java.io.Serializable) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) RemoteSparkJobRef(org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef) RemoteSparkJobStatus(org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus) FileSystem(org.apache.hadoop.fs.FileSystem) NullScanFileSystem(org.apache.hadoop.hive.ql.io.NullScanFileSystem) HiveConf(org.apache.hadoop.hive.conf.HiveConf) NullScanFileSystem(org.apache.hadoop.hive.ql.io.NullScanFileSystem) JobConf(org.apache.hadoop.mapred.JobConf)

Example 2 with RemoteSparkJobRef

use of org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef in project hive by apache.

the class RemoteHiveSparkClient method submit.

private SparkJobRef submit(TaskQueue taskQueue, Context context, SparkWork sparkWork) throws Exception {
    final HiveConf hiveConf = (HiveConf) context.getConf();
    refreshLocalResources(sparkWork, hiveConf);
    final JobConf jobConf = new JobConf(hiveConf);
    // update the credential provider location in the jobConf
    HiveConfUtil.updateJobCredentialProviders(jobConf);
    // Create temporary scratch dir
    final Path emptyScratchDir = context.getMRTmpPath();
    FileSystem fs = emptyScratchDir.getFileSystem(jobConf);
    fs.mkdirs(emptyScratchDir);
    // make sure NullScanFileSystem can be loaded - HIVE-18442
    jobConf.set("fs." + NullScanFileSystem.getBaseScheme() + ".impl", NullScanFileSystem.class.getCanonicalName());
    byte[] jobConfBytes = KryoSerializer.serializeJobConf(jobConf);
    byte[] scratchDirBytes = KryoSerializer.serialize(emptyScratchDir);
    byte[] sparkWorkBytes = KryoSerializer.serialize(sparkWork);
    JobStatusJob job = new JobStatusJob(jobConfBytes, scratchDirBytes, sparkWorkBytes);
    if (taskQueue.isShutdown()) {
        throw new HiveException("Operation is cancelled.");
    }
    JobHandle<Serializable> jobHandle = remoteClient.submit(job);
    RemoteSparkJobStatus sparkJobStatus = new RemoteSparkJobStatus(remoteClient, jobHandle, sparkClientTimtout);
    return new RemoteSparkJobRef(hiveConf, jobHandle, sparkJobStatus);
}
Also used : Path(org.apache.hadoop.fs.Path) Serializable(java.io.Serializable) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) RemoteSparkJobRef(org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef) FileSystem(org.apache.hadoop.fs.FileSystem) NullScanFileSystem(org.apache.hadoop.hive.ql.io.NullScanFileSystem) RemoteSparkJobStatus(org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus) HiveConf(org.apache.hadoop.hive.conf.HiveConf) NullScanFileSystem(org.apache.hadoop.hive.ql.io.NullScanFileSystem) JobConf(org.apache.hadoop.mapred.JobConf)

Aggregations

Serializable (java.io.Serializable)2 FileSystem (org.apache.hadoop.fs.FileSystem)2 Path (org.apache.hadoop.fs.Path)2 HiveConf (org.apache.hadoop.hive.conf.HiveConf)2 RemoteSparkJobRef (org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef)2 RemoteSparkJobStatus (org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus)2 NullScanFileSystem (org.apache.hadoop.hive.ql.io.NullScanFileSystem)2 HiveException (org.apache.hadoop.hive.ql.metadata.HiveException)2 JobConf (org.apache.hadoop.mapred.JobConf)2 Context (org.apache.hadoop.hive.ql.Context)1 DriverContext (org.apache.hadoop.hive.ql.DriverContext)1 JobContext (org.apache.hive.spark.client.JobContext)1