Search in sources :

Example 11 with SparkTask

use of org.apache.hadoop.hive.ql.exec.spark.SparkTask in project hive by apache.

the class SparkCompiler method setInputFormat.

@Override
protected void setInputFormat(Task<?> task) {
    if (task instanceof SparkTask) {
        SparkWork work = ((SparkTask) task).getWork();
        List<BaseWork> all = work.getAllWork();
        for (BaseWork w : all) {
            if (w instanceof MapWork) {
                MapWork mapWork = (MapWork) w;
                Map<String, Operator<? extends OperatorDesc>> opMap = mapWork.getAliasToWork();
                if (!opMap.isEmpty()) {
                    for (Operator<? extends OperatorDesc> op : opMap.values()) {
                        setInputFormat(mapWork, op);
                    }
                }
            }
        }
    } else if (task instanceof ConditionalTask) {
        List<Task<?>> listTasks = ((ConditionalTask) task).getListTasks();
        for (Task<?> tsk : listTasks) {
            setInputFormat(tsk);
        }
    }
    if (task.getChildTasks() != null) {
        for (Task<?> childTask : task.getChildTasks()) {
            setInputFormat(childTask);
        }
    }
}
Also used : ReduceSinkOperator(org.apache.hadoop.hive.ql.exec.ReduceSinkOperator) MapJoinOperator(org.apache.hadoop.hive.ql.exec.MapJoinOperator) UnionOperator(org.apache.hadoop.hive.ql.exec.UnionOperator) FileSinkOperator(org.apache.hadoop.hive.ql.exec.FileSinkOperator) FilterOperator(org.apache.hadoop.hive.ql.exec.FilterOperator) SMBMapJoinOperator(org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator) JoinOperator(org.apache.hadoop.hive.ql.exec.JoinOperator) TableScanOperator(org.apache.hadoop.hive.ql.exec.TableScanOperator) Operator(org.apache.hadoop.hive.ql.exec.Operator) DummyStoreOperator(org.apache.hadoop.hive.ql.exec.DummyStoreOperator) SparkTask(org.apache.hadoop.hive.ql.exec.spark.SparkTask) ConditionalTask(org.apache.hadoop.hive.ql.exec.ConditionalTask) Task(org.apache.hadoop.hive.ql.exec.Task) SparkTask(org.apache.hadoop.hive.ql.exec.spark.SparkTask) SparkWork(org.apache.hadoop.hive.ql.plan.SparkWork) MapWork(org.apache.hadoop.hive.ql.plan.MapWork) ConditionalTask(org.apache.hadoop.hive.ql.exec.ConditionalTask) List(java.util.List) ArrayList(java.util.ArrayList) BaseWork(org.apache.hadoop.hive.ql.plan.BaseWork) OperatorDesc(org.apache.hadoop.hive.ql.plan.OperatorDesc)

Example 12 with SparkTask

use of org.apache.hadoop.hive.ql.exec.spark.SparkTask in project hive by apache.

the class GenMapRedUtils method finalMapWorkChores.

/**
 * Called at the end of TaskCompiler::compile
 * This currently does the following for each map work
 *   1.  Intern the table descriptors of the partitions
 *   2.  derive final explain attributes based on previous compilation.
 *
 * The original implementation had 2 functions internTableDesc and deriveFinalExplainAttributes,
 * respectively implementing 1 and 2 mentioned above.  This was done using recursion over the
 * task graph.  The recursion was inefficient in a couple of ways.
 *   - For large graphs the recursion was filling up the stack
 *   - Instead of finding the mapworks, it was walking all possible paths from root
 *     causing a huge performance problem.
 *
 * This implementation combines internTableDesc and deriveFinalExplainAttributes into 1 call.
 * This can be done because each refers to information within Map Work and performs a specific
 * action.
 *
 * The revised implementation generates all the map works from all MapReduce tasks (getMRTasks),
 * Spark Tasks (getSparkTasks) and Tez tasks (getTezTasks).  Then for each of those map works
 * invokes the respective call.  getMRTasks, getSparkTasks and getTezTasks iteratively walks
 * the task graph to find the respective map works.
 *
 * The iterative implementation of these functions was done as part of HIVE-17195.  Before
 * HIVE-17195, these functions were recursive and had the same issue.  So, picking this patch
 * for an older release will also require picking HIVE-17195 at the least.
 */
public static void finalMapWorkChores(List<Task<?>> tasks, Configuration conf, Interner<TableDesc> interner) {
    List<ExecDriver> mrTasks = Utilities.getMRTasks(tasks);
    if (!mrTasks.isEmpty()) {
        for (ExecDriver execDriver : mrTasks) {
            execDriver.getWork().getMapWork().internTable(interner);
            execDriver.getWork().getMapWork().deriveLlap(conf, true);
        }
    }
    List<TezTask> tezTasks = Utilities.getTezTasks(tasks);
    if (!tezTasks.isEmpty()) {
        for (TezTask tezTask : tezTasks) {
            if (tezTask.getWork() instanceof TezWork) {
                TezWork work = tezTask.getWork();
                for (BaseWork w : work.getAllWorkUnsorted()) {
                    if (w instanceof MapWork) {
                        ((MapWork) w).internTable(interner);
                        ((MapWork) w).deriveLlap(conf, false);
                    }
                }
            }
        }
    }
    List<SparkTask> sparkTasks = Utilities.getSparkTasks(tasks);
    if (!sparkTasks.isEmpty()) {
        for (SparkTask sparkTask : sparkTasks) {
            SparkWork work = sparkTask.getWork();
            for (BaseWork w : work.getAllWorkUnsorted()) {
                if (w instanceof MapWork) {
                    ((MapWork) w).internTable(interner);
                    ((MapWork) w).deriveLlap(conf, false);
                }
            }
        }
    }
}
Also used : MapWork(org.apache.hadoop.hive.ql.plan.MapWork) SparkTask(org.apache.hadoop.hive.ql.exec.spark.SparkTask) ExecDriver(org.apache.hadoop.hive.ql.exec.mr.ExecDriver) SparkWork(org.apache.hadoop.hive.ql.plan.SparkWork) TezTask(org.apache.hadoop.hive.ql.exec.tez.TezTask) BaseWork(org.apache.hadoop.hive.ql.plan.BaseWork) TezWork(org.apache.hadoop.hive.ql.plan.TezWork)

Aggregations

SparkTask (org.apache.hadoop.hive.ql.exec.spark.SparkTask)12 SparkWork (org.apache.hadoop.hive.ql.plan.SparkWork)9 Task (org.apache.hadoop.hive.ql.exec.Task)8 BaseWork (org.apache.hadoop.hive.ql.plan.BaseWork)8 ArrayList (java.util.ArrayList)7 MapWork (org.apache.hadoop.hive.ql.plan.MapWork)6 List (java.util.List)5 ConditionalTask (org.apache.hadoop.hive.ql.exec.ConditionalTask)5 JoinOperator (org.apache.hadoop.hive.ql.exec.JoinOperator)5 MapJoinOperator (org.apache.hadoop.hive.ql.exec.MapJoinOperator)5 Operator (org.apache.hadoop.hive.ql.exec.Operator)5 TableScanOperator (org.apache.hadoop.hive.ql.exec.TableScanOperator)5 Serializable (java.io.Serializable)4 FileSinkOperator (org.apache.hadoop.hive.ql.exec.FileSinkOperator)4 OperatorDesc (org.apache.hadoop.hive.ql.plan.OperatorDesc)4 Path (org.apache.hadoop.fs.Path)3 CommonJoinOperator (org.apache.hadoop.hive.ql.exec.CommonJoinOperator)3 ReduceSinkOperator (org.apache.hadoop.hive.ql.exec.ReduceSinkOperator)3 SMBMapJoinOperator (org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator)3 ParseContext (org.apache.hadoop.hive.ql.parse.ParseContext)3