Search in sources :

Example 11 with Flatten

use of com.tdunning.plume.local.lazy.op.Flatten in project Plume by tdunning.

the class Optimizer method fuseSiblingParallelDos.

/**
 * Join ParallelDos that use the same PCollection into multiple-output {@link MultipleParallelDo}
 * @param arg  The original collection that may contain sibling do chains
 */
@SuppressWarnings({ "unchecked", "rawtypes" })
<T> void fuseSiblingParallelDos(PCollection<T> arg) {
    LazyCollection<T> output = (LazyCollection<T>) arg;
    if (output.isMaterialized()) {
        // stop condition for recursive algorithm
        return;
    }
    DeferredOp dOp = output.getDeferredOp();
    if (!(dOp instanceof ParallelDo)) {
        // not a ParallelDo
        if (dOp instanceof OneToOneOp) {
            // Recursively apply this function to parent
            fuseSiblingParallelDos(((OneToOneOp) dOp).getOrigin());
            return;
        }
        if (dOp instanceof Flatten) {
            Flatten<T> flatten = (Flatten) dOp;
            // Recursively apply this function to all parents
            for (PCollection<T> col : flatten.getOrigins()) {
                fuseSiblingParallelDos(col);
            }
            return;
        }
        if (dOp instanceof MultipleParallelDo) {
            return;
        }
    }
    ParallelDo pDo = (ParallelDo) output.getDeferredOp();
    LazyCollection<T> orig = (LazyCollection<T>) pDo.getOrigin();
    int willAdd = 0;
    for (DeferredOp op : orig.getDownOps()) {
        if (op instanceof ParallelDo) {
            willAdd++;
        }
    }
    if (willAdd == 1) {
        // Parent doesn't have more ParallelDos to fuse
        // Recursively apply this function to parent
        fuseSiblingParallelDos(orig);
        return;
    }
    // MultipleParallelDo is viable, create it
    MultipleParallelDo<T> mPDo = new MultipleParallelDo<T>(orig);
    mPDo.addDest(pDo.getFunction(), output);
    orig.downOps.remove(pDo);
    output.deferredOp = mPDo;
    List<DeferredOp> newList = new ArrayList<DeferredOp>();
    for (DeferredOp op : orig.getDownOps()) {
        if (op instanceof ParallelDo) {
            ParallelDo thisPDo = (ParallelDo) op;
            mPDo.addDest(thisPDo.getFunction(), thisPDo.getDest());
            LazyCollection thisDest = (LazyCollection) thisPDo.getDest();
            thisDest.deferredOp = mPDo;
        } else {
            newList.add(op);
        }
    }
    newList.add(mPDo);
    orig.downOps = newList;
    // Recursively apply this function to parent
    fuseSiblingParallelDos(orig);
}
Also used : ParallelDo(com.tdunning.plume.local.lazy.op.ParallelDo) MultipleParallelDo(com.tdunning.plume.local.lazy.op.MultipleParallelDo) MultipleParallelDo(com.tdunning.plume.local.lazy.op.MultipleParallelDo) Flatten(com.tdunning.plume.local.lazy.op.Flatten) ArrayList(java.util.ArrayList) OneToOneOp(com.tdunning.plume.local.lazy.op.OneToOneOp) DeferredOp(com.tdunning.plume.local.lazy.op.DeferredOp)

Aggregations

Flatten (com.tdunning.plume.local.lazy.op.Flatten)11 MultipleParallelDo (com.tdunning.plume.local.lazy.op.MultipleParallelDo)11 DeferredOp (com.tdunning.plume.local.lazy.op.DeferredOp)10 ParallelDo (com.tdunning.plume.local.lazy.op.ParallelDo)8 PCollection (com.tdunning.plume.PCollection)6 OneToOneOp (com.tdunning.plume.local.lazy.op.OneToOneOp)6 GroupByKey (com.tdunning.plume.local.lazy.op.GroupByKey)5 ArrayList (java.util.ArrayList)5 DoFn (com.tdunning.plume.DoFn)4 EmitFn (com.tdunning.plume.EmitFn)4 Map (java.util.Map)4 Pair (com.tdunning.plume.Pair)3 PlumeObject (com.tdunning.plume.local.lazy.MapRedExecutor.PlumeObject)2 CombineValues (com.tdunning.plume.local.lazy.op.CombineValues)2 PCollectionType (com.tdunning.plume.types.PCollectionType)2 PTableType (com.tdunning.plume.types.PTableType)2 IOException (java.io.IOException)2 HashSet (java.util.HashSet)2 Stack (java.util.Stack)2 FileSplit (org.apache.hadoop.mapreduce.lib.input.FileSplit)2