use of org.apache.hadoop.hive.ql.exec.spark.SparkTask in project hive by apache.
the class SparkCompiler method setInputFormat.
@Override
protected void setInputFormat(Task<?> task) {
if (task instanceof SparkTask) {
SparkWork work = ((SparkTask) task).getWork();
List<BaseWork> all = work.getAllWork();
for (BaseWork w : all) {
if (w instanceof MapWork) {
MapWork mapWork = (MapWork) w;
Map<String, Operator<? extends OperatorDesc>> opMap = mapWork.getAliasToWork();
if (!opMap.isEmpty()) {
for (Operator<? extends OperatorDesc> op : opMap.values()) {
setInputFormat(mapWork, op);
}
}
}
}
} else if (task instanceof ConditionalTask) {
List<Task<?>> listTasks = ((ConditionalTask) task).getListTasks();
for (Task<?> tsk : listTasks) {
setInputFormat(tsk);
}
}
if (task.getChildTasks() != null) {
for (Task<?> childTask : task.getChildTasks()) {
setInputFormat(childTask);
}
}
}
use of org.apache.hadoop.hive.ql.exec.spark.SparkTask in project hive by apache.
the class GenMapRedUtils method finalMapWorkChores.
/**
* Called at the end of TaskCompiler::compile
* This currently does the following for each map work
* 1. Intern the table descriptors of the partitions
* 2. derive final explain attributes based on previous compilation.
*
* The original implementation had 2 functions internTableDesc and deriveFinalExplainAttributes,
* respectively implementing 1 and 2 mentioned above. This was done using recursion over the
* task graph. The recursion was inefficient in a couple of ways.
* - For large graphs the recursion was filling up the stack
* - Instead of finding the mapworks, it was walking all possible paths from root
* causing a huge performance problem.
*
* This implementation combines internTableDesc and deriveFinalExplainAttributes into 1 call.
* This can be done because each refers to information within Map Work and performs a specific
* action.
*
* The revised implementation generates all the map works from all MapReduce tasks (getMRTasks),
* Spark Tasks (getSparkTasks) and Tez tasks (getTezTasks). Then for each of those map works
* invokes the respective call. getMRTasks, getSparkTasks and getTezTasks iteratively walks
* the task graph to find the respective map works.
*
* The iterative implementation of these functions was done as part of HIVE-17195. Before
* HIVE-17195, these functions were recursive and had the same issue. So, picking this patch
* for an older release will also require picking HIVE-17195 at the least.
*/
public static void finalMapWorkChores(List<Task<?>> tasks, Configuration conf, Interner<TableDesc> interner) {
List<ExecDriver> mrTasks = Utilities.getMRTasks(tasks);
if (!mrTasks.isEmpty()) {
for (ExecDriver execDriver : mrTasks) {
execDriver.getWork().getMapWork().internTable(interner);
execDriver.getWork().getMapWork().deriveLlap(conf, true);
}
}
List<TezTask> tezTasks = Utilities.getTezTasks(tasks);
if (!tezTasks.isEmpty()) {
for (TezTask tezTask : tezTasks) {
if (tezTask.getWork() instanceof TezWork) {
TezWork work = tezTask.getWork();
for (BaseWork w : work.getAllWorkUnsorted()) {
if (w instanceof MapWork) {
((MapWork) w).internTable(interner);
((MapWork) w).deriveLlap(conf, false);
}
}
}
}
}
List<SparkTask> sparkTasks = Utilities.getSparkTasks(tasks);
if (!sparkTasks.isEmpty()) {
for (SparkTask sparkTask : sparkTasks) {
SparkWork work = sparkTask.getWork();
for (BaseWork w : work.getAllWorkUnsorted()) {
if (w instanceof MapWork) {
((MapWork) w).internTable(interner);
((MapWork) w).deriveLlap(conf, false);
}
}
}
}
}
Aggregations