Examples with SparkExecutionContext - org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext

Example 96 with SparkExecutionContext

use of org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext in project systemml by apache.

the class FrameAppendMSPInstruction method processInstruction.

@Override
public void processInstruction(ExecutionContext ec) {
    // map-only append (rhs must be vector and fit in mapper mem)
    SparkExecutionContext sec = (SparkExecutionContext) ec;
    checkBinaryAppendInputCharacteristics(sec, _cbind, false, false);
    JavaPairRDD<Long, FrameBlock> in1 = sec.getFrameBinaryBlockRDDHandleForVariable(input1.getName());
    PartitionedBroadcast<FrameBlock> in2 = sec.getBroadcastForFrameVariable(input2.getName());
    // execute map-append operations (partitioning preserving if keys for blocks not changing)
    JavaPairRDD<Long, FrameBlock> out = null;
    if (preservesPartitioning(_cbind)) {
        out = in1.mapPartitionsToPair(new MapSideAppendPartitionFunction(in2), true);
    } else
        throw new DMLRuntimeException("Append type rbind not supported for frame mappend, instead use rappend");
    // put output RDD handle into symbol table
    updateBinaryAppendOutputMatrixCharacteristics(sec, _cbind);
    sec.setRDDHandleForVariable(output.getName(), out);
    sec.addLineageRDD(output.getName(), input1.getName());
    sec.addLineageBroadcast(output.getName(), input2.getName());
    // update schema of output with merged input schemas
    sec.getFrameObject(output.getName()).setSchema(sec.getFrameObject(input1.getName()).mergeSchemas(sec.getFrameObject(input2.getName())));
}

Also used : FrameBlock(org.apache.sysml.runtime.matrix.data.FrameBlock) SparkExecutionContext(org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext) DMLRuntimeException(org.apache.sysml.runtime.DMLRuntimeException)

Example 97 with SparkExecutionContext

use of org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext in project systemml by apache.

the class FrameAppendRSPInstruction method processInstruction.

@Override
public void processInstruction(ExecutionContext ec) {
    SparkExecutionContext sec = (SparkExecutionContext) ec;
    JavaPairRDD<Long, FrameBlock> in1 = sec.getFrameBinaryBlockRDDHandleForVariable(input1.getName());
    JavaPairRDD<Long, FrameBlock> in2 = sec.getFrameBinaryBlockRDDHandleForVariable(input2.getName());
    JavaPairRDD<Long, FrameBlock> out = null;
    long leftRows = sec.getMatrixCharacteristics(input1.getName()).getRows();
    if (_cbind) {
        JavaPairRDD<Long, FrameBlock> in1Aligned = in1.mapToPair(new ReduceSideAppendAlignFunction(leftRows));
        in1Aligned = FrameRDDAggregateUtils.mergeByKey(in1Aligned);
        JavaPairRDD<Long, FrameBlock> in2Aligned = in2.mapToPair(new ReduceSideAppendAlignFunction(leftRows));
        in2Aligned = FrameRDDAggregateUtils.mergeByKey(in2Aligned);
        out = in1Aligned.join(in2Aligned).mapValues(new ReduceSideColumnsFunction(_cbind));
    } else {
        // rbind
        JavaPairRDD<Long, FrameBlock> right = in2.mapToPair(new ReduceSideAppendRowsFunction(leftRows));
        out = in1.union(right);
    }
    // put output RDD handle into symbol table
    updateBinaryAppendOutputMatrixCharacteristics(sec, _cbind);
    sec.setRDDHandleForVariable(output.getName(), out);
    sec.addLineageRDD(output.getName(), input1.getName());
    sec.addLineageRDD(output.getName(), input2.getName());
    // update schema of output with merged input schemas
    sec.getFrameObject(output.getName()).setSchema(sec.getFrameObject(input1.getName()).mergeSchemas(sec.getFrameObject(input2.getName())));
}

Also used : FrameBlock(org.apache.sysml.runtime.matrix.data.FrameBlock) SparkExecutionContext(org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext)

Example 98 with SparkExecutionContext

use of org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext in project systemml by apache.

the class FrameIndexingSPInstruction method processInstruction.

@Override
public void processInstruction(ExecutionContext ec) {
    SparkExecutionContext sec = (SparkExecutionContext) ec;
    String opcode = getOpcode();
    // get indexing range
    long rl = ec.getScalarInput(rowLower.getName(), rowLower.getValueType(), rowLower.isLiteral()).getLongValue();
    long ru = ec.getScalarInput(rowUpper.getName(), rowUpper.getValueType(), rowUpper.isLiteral()).getLongValue();
    long cl = ec.getScalarInput(colLower.getName(), colLower.getValueType(), colLower.isLiteral()).getLongValue();
    long cu = ec.getScalarInput(colUpper.getName(), colUpper.getValueType(), colUpper.isLiteral()).getLongValue();
    IndexRange ixrange = new IndexRange(rl, ru, cl, cu);
    // right indexing
    if (opcode.equalsIgnoreCase(RightIndex.OPCODE)) {
        // update and check output dimensions
        MatrixCharacteristics mcIn = sec.getMatrixCharacteristics(input1.getName());
        MatrixCharacteristics mcOut = sec.getMatrixCharacteristics(output.getName());
        mcOut.set(ru - rl + 1, cu - cl + 1, mcIn.getRowsPerBlock(), mcIn.getColsPerBlock());
        checkValidOutputDimensions(mcOut);
        // execute right indexing operation (partitioning-preserving if possible)
        JavaPairRDD<Long, FrameBlock> in1 = sec.getFrameBinaryBlockRDDHandleForVariable(input1.getName());
        JavaPairRDD<Long, FrameBlock> out = null;
        if (isPartitioningPreservingRightIndexing(mcIn, ixrange)) {
            out = in1.mapPartitionsToPair(new SliceBlockPartitionFunction(ixrange, mcOut), true);
        } else {
            out = in1.filter(new IsFrameBlockInRange(rl, ru, mcOut)).mapToPair(new SliceBlock(ixrange, mcOut));
        }
        // put output RDD handle into symbol table
        sec.setRDDHandleForVariable(output.getName(), out);
        sec.addLineageRDD(output.getName(), input1.getName());
        // update schema of output with subset of input schema
        sec.getFrameObject(output.getName()).setSchema(sec.getFrameObject(input1.getName()).getSchema((int) cl, (int) cu));
    } else // left indexing
    if (opcode.equalsIgnoreCase(LeftIndex.OPCODE) || opcode.equalsIgnoreCase("mapLeftIndex")) {
        JavaPairRDD<Long, FrameBlock> in1 = sec.getFrameBinaryBlockRDDHandleForVariable(input1.getName());
        PartitionedBroadcast<FrameBlock> broadcastIn2 = null;
        JavaPairRDD<Long, FrameBlock> in2 = null;
        JavaPairRDD<Long, FrameBlock> out = null;
        // update and check output dimensions
        MatrixCharacteristics mcOut = sec.getMatrixCharacteristics(output.getName());
        MatrixCharacteristics mcLeft = ec.getMatrixCharacteristics(input1.getName());
        mcOut.set(mcLeft.getRows(), mcLeft.getCols(), mcLeft.getRowsPerBlock(), mcLeft.getColsPerBlock());
        checkValidOutputDimensions(mcOut);
        // note: always frame rhs, scalars are preprocessed via cast to 1x1 frame
        MatrixCharacteristics mcRight = ec.getMatrixCharacteristics(input2.getName());
        // sanity check matching index range and rhs dimensions
        if (!mcRight.dimsKnown()) {
            throw new DMLRuntimeException("The right input frame dimensions are not specified for FrameIndexingSPInstruction");
        }
        if (!(ru - rl + 1 == mcRight.getRows() && cu - cl + 1 == mcRight.getCols())) {
            throw new DMLRuntimeException("Invalid index range of leftindexing: [" + rl + ":" + ru + "," + cl + ":" + cu + "] vs [" + mcRight.getRows() + "x" + mcRight.getCols() + "].");
        }
        if (opcode.equalsIgnoreCase("mapLeftIndex")) {
            broadcastIn2 = sec.getBroadcastForFrameVariable(input2.getName());
            // partitioning-preserving mappartitions (key access required for broadcast loopkup)
            out = in1.mapPartitionsToPair(new LeftIndexPartitionFunction(broadcastIn2, ixrange, mcOut), true);
        } else {
            // general case
            // zero-out lhs
            in1 = in1.flatMapToPair(new ZeroOutLHS(false, ixrange, mcLeft));
            // slice rhs, shift and merge with lhs
            in2 = sec.getFrameBinaryBlockRDDHandleForVariable(input2.getName()).flatMapToPair(new SliceRHSForLeftIndexing(ixrange, mcLeft));
            out = FrameRDDAggregateUtils.mergeByKey(in1.union(in2));
        }
        sec.setRDDHandleForVariable(output.getName(), out);
        sec.addLineageRDD(output.getName(), input1.getName());
        if (broadcastIn2 != null)
            sec.addLineageBroadcast(output.getName(), input2.getName());
        if (in2 != null)
            sec.addLineageRDD(output.getName(), input2.getName());
    } else
        throw new DMLRuntimeException("Invalid opcode (" + opcode + ") encountered in FrameIndexingSPInstruction.");
}

Also used : IsFrameBlockInRange(org.apache.sysml.runtime.instructions.spark.functions.IsFrameBlockInRange) MatrixCharacteristics(org.apache.sysml.runtime.matrix.MatrixCharacteristics) DMLRuntimeException(org.apache.sysml.runtime.DMLRuntimeException) IndexRange(org.apache.sysml.runtime.util.IndexRange) PartitionedBroadcast(org.apache.sysml.runtime.instructions.spark.data.PartitionedBroadcast) FrameBlock(org.apache.sysml.runtime.matrix.data.FrameBlock) JavaPairRDD(org.apache.spark.api.java.JavaPairRDD) SparkExecutionContext(org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext)

Example 99 with SparkExecutionContext

use of org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext in project systemml by apache.

the class MapmmChainSPInstruction method processInstruction.

@Override
public void processInstruction(ExecutionContext ec) {
    SparkExecutionContext sec = (SparkExecutionContext) ec;
    // get rdd and broadcast inputs
    JavaPairRDD<MatrixIndexes, MatrixBlock> inX = sec.getBinaryBlockRDDHandleForVariable(_input1.getName());
    PartitionedBroadcast<MatrixBlock> inV = sec.getBroadcastForVariable(_input2.getName());
    // execute mapmmchain (guaranteed to have single output block)
    MatrixBlock out = null;
    if (_chainType == ChainType.XtXv) {
        JavaRDD<MatrixBlock> tmp = inX.values().map(new RDDMapMMChainFunction(inV));
        out = RDDAggregateUtils.sumStable(tmp);
    } else {
        // ChainType.XtwXv / ChainType.XtXvy
        PartitionedBroadcast<MatrixBlock> inW = sec.getBroadcastForVariable(_input3.getName());
        JavaRDD<MatrixBlock> tmp = inX.map(new RDDMapMMChainFunction2(inV, inW, _chainType));
        out = RDDAggregateUtils.sumStable(tmp);
    }
    // put output block into symbol table (no lineage because single block)
    // this also includes implicit maintenance of matrix characteristics
    sec.setMatrixOutput(_output.getName(), out, getExtendedOpcode());
}

Also used : MatrixBlock(org.apache.sysml.runtime.matrix.data.MatrixBlock) MatrixIndexes(org.apache.sysml.runtime.matrix.data.MatrixIndexes) SparkExecutionContext(org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext)

Example 100 with SparkExecutionContext

use of org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext in project systemml by apache.

the class MapmmSPInstruction method processInstruction.

@Override
public void processInstruction(ExecutionContext ec) {
    SparkExecutionContext sec = (SparkExecutionContext) ec;
    CacheType type = _type;
    String rddVar = type.isRight() ? input1.getName() : input2.getName();
    String bcastVar = type.isRight() ? input2.getName() : input1.getName();
    MatrixCharacteristics mcRdd = sec.getMatrixCharacteristics(rddVar);
    MatrixCharacteristics mcBc = sec.getMatrixCharacteristics(bcastVar);
    // get input rdd with preferred number of partitions to avoid unnecessary repartition
    JavaPairRDD<MatrixIndexes, MatrixBlock> in1 = sec.getBinaryBlockRDDHandleForVariable(rddVar, (requiresFlatMapFunction(type, mcBc) && requiresRepartitioning(type, mcRdd, mcBc, sec.getSparkContext().defaultParallelism())) ? getNumRepartitioning(type, mcRdd, mcBc) : -1);
    // inputs - is required to ensure moderately sized output partitions (2GB limitation)
    if (requiresFlatMapFunction(type, mcBc) && requiresRepartitioning(type, mcRdd, mcBc, in1.getNumPartitions())) {
        int numParts = getNumRepartitioning(type, mcRdd, mcBc);
        int numParts2 = getNumRepartitioning(type.getFlipped(), mcBc, mcRdd);
        if (numParts2 > numParts) {
            // flip required
            type = type.getFlipped();
            rddVar = type.isRight() ? input1.getName() : input2.getName();
            bcastVar = type.isRight() ? input2.getName() : input1.getName();
            mcRdd = sec.getMatrixCharacteristics(rddVar);
            mcBc = sec.getMatrixCharacteristics(bcastVar);
            in1 = sec.getBinaryBlockRDDHandleForVariable(rddVar);
            LOG.warn("Mapmm: Switching rdd ('" + bcastVar + "') and broadcast ('" + rddVar + "') inputs " + "for repartitioning because this allows better control of output partition " + "sizes (" + numParts + " < " + numParts2 + ").");
        }
    }
    // get inputs
    PartitionedBroadcast<MatrixBlock> in2 = sec.getBroadcastForVariable(bcastVar);
    // empty input block filter
    if (!_outputEmpty)
        in1 = in1.filter(new FilterNonEmptyBlocksFunction());
    // execute mapmm and aggregation if necessary and put output into symbol table
    if (_aggtype == SparkAggType.SINGLE_BLOCK) {
        JavaRDD<MatrixBlock> out = in1.map(new RDDMapMMFunction2(type, in2));
        MatrixBlock out2 = RDDAggregateUtils.sumStable(out);
        // put output block into symbol table (no lineage because single block)
        // this also includes implicit maintenance of matrix characteristics
        sec.setMatrixOutput(output.getName(), out2, getExtendedOpcode());
    } else // MULTI_BLOCK or NONE
    {
        JavaPairRDD<MatrixIndexes, MatrixBlock> out = null;
        if (requiresFlatMapFunction(type, mcBc)) {
            if (requiresRepartitioning(type, mcRdd, mcBc, in1.getNumPartitions())) {
                int numParts = getNumRepartitioning(type, mcRdd, mcBc);
                LOG.warn("Mapmm: Repartition input rdd '" + rddVar + "' from " + in1.getNumPartitions() + " to " + numParts + " partitions to satisfy size restrictions of output partitions.");
                in1 = in1.repartition(numParts);
            }
            out = in1.flatMapToPair(new RDDFlatMapMMFunction(type, in2));
        } else if (preservesPartitioning(mcRdd, type))
            out = in1.mapPartitionsToPair(new RDDMapMMPartitionFunction(type, in2), true);
        else
            out = in1.mapToPair(new RDDMapMMFunction(type, in2));
        // empty output block filter
        if (!_outputEmpty)
            out = out.filter(new FilterNonEmptyBlocksFunction());
        if (_aggtype == SparkAggType.MULTI_BLOCK)
            out = RDDAggregateUtils.sumByKeyStable(out, false);
        // put output RDD handle into symbol table
        sec.setRDDHandleForVariable(output.getName(), out);
        sec.addLineageRDD(output.getName(), rddVar);
        sec.addLineageBroadcast(output.getName(), bcastVar);
        // update output statistics if not inferred
        updateBinaryMMOutputMatrixCharacteristics(sec, true);
    }
}

Also used : FilterNonEmptyBlocksFunction(org.apache.sysml.runtime.instructions.spark.functions.FilterNonEmptyBlocksFunction) MatrixBlock(org.apache.sysml.runtime.matrix.data.MatrixBlock) MatrixIndexes(org.apache.sysml.runtime.matrix.data.MatrixIndexes) CacheType(org.apache.sysml.lops.MapMult.CacheType) MatrixCharacteristics(org.apache.sysml.runtime.matrix.MatrixCharacteristics) SparkExecutionContext(org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext)

Aggregations

SparkExecutionContext (org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext)112 MatrixBlock (org.apache.sysml.runtime.matrix.data.MatrixBlock)92 MatrixIndexes (org.apache.sysml.runtime.matrix.data.MatrixIndexes)92 MatrixCharacteristics (org.apache.sysml.runtime.matrix.MatrixCharacteristics)71 DMLRuntimeException (org.apache.sysml.runtime.DMLRuntimeException)39 JavaPairRDD (org.apache.spark.api.java.JavaPairRDD)22 FrameBlock (org.apache.sysml.runtime.matrix.data.FrameBlock)14 DoubleObject (org.apache.sysml.runtime.instructions.cp.DoubleObject)12 ScalarObject (org.apache.sysml.runtime.instructions.cp.ScalarObject)9 PartitionedBroadcast (org.apache.sysml.runtime.instructions.spark.data.PartitionedBroadcast)8 FilterNonEmptyBlocksFunction (org.apache.sysml.runtime.instructions.spark.functions.FilterNonEmptyBlocksFunction)7 InputInfo (org.apache.sysml.runtime.matrix.data.InputInfo)7 ArrayList (java.util.ArrayList)6 CPOperand (org.apache.sysml.runtime.instructions.cp.CPOperand)6 RDDObject (org.apache.sysml.runtime.instructions.spark.data.RDDObject)6 AggregateDropCorrectionFunction (org.apache.sysml.runtime.instructions.spark.functions.AggregateDropCorrectionFunction)6 AggregateOperator (org.apache.sysml.runtime.matrix.operators.AggregateOperator)6 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)5 FrameObject (org.apache.sysml.runtime.controlprogram.caching.FrameObject)5 ValueType (org.apache.sysml.parser.Expression.ValueType)4