Search in sources :

Example 1 with ResultStage

use of org.apache.spark.scheduler.ResultStage in project OpenLineage by OpenLineage.

the class RddExecutionContext method printStages.

private void printStages(String prefix, Stage stage) {
    if (stage instanceof ResultStage) {
        ResultStage resultStage = (ResultStage) stage;
    }
    printRDDs(prefix + "(stageId:" + stage.id() + ")-(" + stage.getClass().getSimpleName() + ")- RDD: ", stage.rdd());
    Collection<Stage> parents = asJavaCollection(stage.parents());
    for (Stage parent : parents) {
        printStages(prefix + " \\ ", parent);
    }
}
Also used : Stage(org.apache.spark.scheduler.Stage) ResultStage(org.apache.spark.scheduler.ResultStage) ResultStage(org.apache.spark.scheduler.ResultStage)

Example 2 with ResultStage

use of org.apache.spark.scheduler.ResultStage in project OpenLineage by OpenLineage.

the class RddExecutionContext method setActiveJob.

@Override
public void setActiveJob(ActiveJob activeJob) {
    RDD<?> finalRDD = activeJob.finalStage().rdd();
    this.jobSuffix = nameRDD(finalRDD);
    Set<RDD<?>> rdds = Rdds.flattenRDDs(finalRDD);
    this.inputs = findInputs(rdds);
    Configuration jc = new JobConf();
    if (activeJob.finalStage() instanceof ResultStage) {
        Function2<TaskContext, Iterator<?>, ?> fn = ((ResultStage) activeJob.finalStage()).func();
        try {
            Field f = getConfigField(fn);
            f.setAccessible(true);
            HadoopMapRedWriteConfigUtil configUtil = Optional.of(f.get(fn)).filter(HadoopMapRedWriteConfigUtil.class::isInstance).map(HadoopMapRedWriteConfigUtil.class::cast).orElseThrow(() -> new NoSuchFieldException("Field is not instance of HadoopMapRedWriteConfigUtil"));
            Field confField = HadoopMapRedWriteConfigUtil.class.getDeclaredField("conf");
            confField.setAccessible(true);
            SerializableJobConf conf = (SerializableJobConf) confField.get(configUtil);
            jc = conf.value();
        } catch (IllegalAccessException | NoSuchFieldException nfe) {
            log.warn("Unable to access job conf from RDD", nfe);
        }
        log.info("Found job conf from RDD {}", jc);
    } else {
        jc = OpenLineageSparkListener.getConfigForRDD(finalRDD);
    }
    this.outputs = findOutputs(finalRDD, jc);
}
Also used : TaskContext(org.apache.spark.TaskContext) Configuration(org.apache.hadoop.conf.Configuration) Field(java.lang.reflect.Field) HadoopRDD(org.apache.spark.rdd.HadoopRDD) MapPartitionsRDD(org.apache.spark.rdd.MapPartitionsRDD) RDD(org.apache.spark.rdd.RDD) NewHadoopRDD(org.apache.spark.rdd.NewHadoopRDD) HadoopMapRedWriteConfigUtil(org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil) SerializableJobConf(org.apache.spark.util.SerializableJobConf) Iterator(scala.collection.Iterator) ResultStage(org.apache.spark.scheduler.ResultStage) SerializableJobConf(org.apache.spark.util.SerializableJobConf) JobConf(org.apache.hadoop.mapred.JobConf)

Aggregations

ResultStage (org.apache.spark.scheduler.ResultStage)2 Field (java.lang.reflect.Field)1 Configuration (org.apache.hadoop.conf.Configuration)1 JobConf (org.apache.hadoop.mapred.JobConf)1 TaskContext (org.apache.spark.TaskContext)1 HadoopMapRedWriteConfigUtil (org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil)1 HadoopRDD (org.apache.spark.rdd.HadoopRDD)1 MapPartitionsRDD (org.apache.spark.rdd.MapPartitionsRDD)1 NewHadoopRDD (org.apache.spark.rdd.NewHadoopRDD)1 RDD (org.apache.spark.rdd.RDD)1 Stage (org.apache.spark.scheduler.Stage)1 SerializableJobConf (org.apache.spark.util.SerializableJobConf)1 Iterator (scala.collection.Iterator)1