Search in sources :

Example 1 with ActiveJob

use of org.apache.spark.scheduler.ActiveJob in project zeppelin by apache.

the class SparkInterpreter method getProgress.

@Override
public int getProgress(InterpreterContext context) {
    String jobGroup = Utils.buildJobGroupId(context);
    int completedTasks = 0;
    int totalTasks = 0;
    DAGScheduler scheduler = sc.dagScheduler();
    if (scheduler == null) {
        return 0;
    }
    HashSet<ActiveJob> jobs = scheduler.activeJobs();
    if (jobs == null || jobs.size() == 0) {
        return 0;
    }
    Iterator<ActiveJob> it = jobs.iterator();
    while (it.hasNext()) {
        ActiveJob job = it.next();
        String g = (String) job.properties().get("spark.jobGroup.id");
        if (jobGroup.equals(g)) {
            int[] progressInfo = null;
            try {
                Object finalStage = job.getClass().getMethod("finalStage").invoke(job);
                if (sparkVersion.getProgress1_0()) {
                    progressInfo = getProgressFromStage_1_0x(sparkListener, finalStage);
                } else {
                    progressInfo = getProgressFromStage_1_1x(sparkListener, finalStage);
                }
            } catch (IllegalAccessException | IllegalArgumentException | InvocationTargetException | NoSuchMethodException | SecurityException e) {
                logger.error("Can't get progress info", e);
                return 0;
            }
            totalTasks += progressInfo[0];
            completedTasks += progressInfo[1];
        }
    }
    if (totalTasks == 0) {
        return 0;
    }
    return completedTasks * 100 / totalTasks;
}
Also used : DAGScheduler(org.apache.spark.scheduler.DAGScheduler) InvocationTargetException(java.lang.reflect.InvocationTargetException) ActiveJob(org.apache.spark.scheduler.ActiveJob)

Example 2 with ActiveJob

use of org.apache.spark.scheduler.ActiveJob in project OpenLineage by OpenLineage.

the class OpenLineageSparkListener method onJobStart.

/**
 * called by the SparkListener when a job starts
 */
@Override
public void onJobStart(SparkListenerJobStart jobStart) {
    Optional<ActiveJob> activeJob = asJavaOptional(SparkSession.getDefaultSession().map(sparkContextFromSession).orElse(activeSparkContext)).flatMap(ctx -> Optional.ofNullable(ctx.dagScheduler()).map(ds -> ds.jobIdToActiveJob().get(jobStart.jobId()))).flatMap(ScalaConversionUtils::asJavaOptional);
    Set<Integer> stages = ScalaConversionUtils.fromSeq(jobStart.stageIds()).stream().map(Integer.class::cast).collect(Collectors.toSet());
    jobMetrics.addJobStages(jobStart.jobId(), stages);
    ExecutionContext context = Optional.ofNullable(getSqlExecutionId(jobStart.properties())).map(Optional::of).orElseGet(() -> asJavaOptional(SparkSession.getDefaultSession().map(sparkContextFromSession).orElse(activeSparkContext)).flatMap(ctx -> Optional.ofNullable(ctx.dagScheduler()).map(ds -> ds.jobIdToActiveJob().get(jobStart.jobId())).flatMap(ScalaConversionUtils::asJavaOptional)).map(job -> getSqlExecutionId(job.properties()))).map(id -> {
        long executionId = Long.parseLong(id);
        return getExecutionContext(jobStart.jobId(), executionId);
    }).orElseGet(() -> getExecutionContext(jobStart.jobId()));
    // set it in the rddExecutionRegistry so jobEnd is called
    rddExecutionRegistry.put(jobStart.jobId(), context);
    activeJob.ifPresent(context::setActiveJob);
    context.start(jobStart);
}
Also used : OpenLineageClient(io.openlineage.spark.agent.client.OpenLineageClient) SparkListenerApplicationStart(org.apache.spark.scheduler.SparkListenerApplicationStart) DEFAULTS(io.openlineage.spark.agent.ArgumentParser.DEFAULTS) URISyntaxException(java.net.URISyntaxException) ZonedDateTime(java.time.ZonedDateTime) Function0(scala.Function0) Function1(scala.Function1) SparkConfUtils.findSparkConfigKey(io.openlineage.spark.agent.util.SparkConfUtils.findSparkConfigKey) HashMap(java.util.HashMap) ScalaConversionUtils.asJavaOptional(io.openlineage.spark.agent.util.ScalaConversionUtils.asJavaOptional) Map(java.util.Map) Configuration(org.apache.hadoop.conf.Configuration) SparkListenerTaskEnd(org.apache.spark.scheduler.SparkListenerTaskEnd) SparkListenerSQLExecutionStart(org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart) SparkContext$(org.apache.spark.SparkContext$) ContextFactory(io.openlineage.spark.agent.lifecycle.ContextFactory) SparkListenerApplicationEnd(org.apache.spark.scheduler.SparkListenerApplicationEnd) SparkEnv(org.apache.spark.SparkEnv) WeakHashMap(java.util.WeakHashMap) SparkListenerJobEnd(org.apache.spark.scheduler.SparkListenerJobEnd) SparkSession(org.apache.spark.sql.SparkSession) SparkListenerSQLExecutionEnd(org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionEnd) PrintWriter(java.io.PrintWriter) Properties(java.util.Properties) ActiveJob(org.apache.spark.scheduler.ActiveJob) SparkListenerJobStart(org.apache.spark.scheduler.SparkListenerJobStart) ByteArrayOutputStream(org.apache.commons.io.output.ByteArrayOutputStream) SparkConf(org.apache.spark.SparkConf) SparkContext(org.apache.spark.SparkContext) Set(java.util.Set) ScalaConversionUtils(io.openlineage.spark.agent.util.ScalaConversionUtils) Field(java.lang.reflect.Field) Option(scala.Option) Collectors(java.util.stream.Collectors) SparkListenerEvent(org.apache.spark.scheduler.SparkListenerEvent) Slf4j(lombok.extern.slf4j.Slf4j) SparkConfUtils.findSparkUrlParams(io.openlineage.spark.agent.util.SparkConfUtils.findSparkUrlParams) Optional(java.util.Optional) PairRDDFunctions(org.apache.spark.rdd.PairRDDFunctions) ExecutionContext(io.openlineage.spark.agent.lifecycle.ExecutionContext) PairRDDFunctionsTransformer(io.openlineage.spark.agent.transformers.PairRDDFunctionsTransformer) OpenLineage(io.openlineage.client.OpenLineage) Collections(java.util.Collections) RDD(org.apache.spark.rdd.RDD) SparkEnv$(org.apache.spark.SparkEnv$) ExecutionContext(io.openlineage.spark.agent.lifecycle.ExecutionContext) ActiveJob(org.apache.spark.scheduler.ActiveJob) ScalaConversionUtils(io.openlineage.spark.agent.util.ScalaConversionUtils)

Aggregations

ActiveJob (org.apache.spark.scheduler.ActiveJob)2 OpenLineage (io.openlineage.client.OpenLineage)1 DEFAULTS (io.openlineage.spark.agent.ArgumentParser.DEFAULTS)1 OpenLineageClient (io.openlineage.spark.agent.client.OpenLineageClient)1 ContextFactory (io.openlineage.spark.agent.lifecycle.ContextFactory)1 ExecutionContext (io.openlineage.spark.agent.lifecycle.ExecutionContext)1 PairRDDFunctionsTransformer (io.openlineage.spark.agent.transformers.PairRDDFunctionsTransformer)1 ScalaConversionUtils (io.openlineage.spark.agent.util.ScalaConversionUtils)1 ScalaConversionUtils.asJavaOptional (io.openlineage.spark.agent.util.ScalaConversionUtils.asJavaOptional)1 SparkConfUtils.findSparkConfigKey (io.openlineage.spark.agent.util.SparkConfUtils.findSparkConfigKey)1 SparkConfUtils.findSparkUrlParams (io.openlineage.spark.agent.util.SparkConfUtils.findSparkUrlParams)1 PrintWriter (java.io.PrintWriter)1 Field (java.lang.reflect.Field)1 InvocationTargetException (java.lang.reflect.InvocationTargetException)1 URISyntaxException (java.net.URISyntaxException)1 ZonedDateTime (java.time.ZonedDateTime)1 Collections (java.util.Collections)1 HashMap (java.util.HashMap)1 Map (java.util.Map)1 Optional (java.util.Optional)1