Search in sources :

Example 6 with PartitionedBroadcast

use of org.apache.sysml.runtime.instructions.spark.data.PartitionedBroadcast in project incubator-systemml by apache.

the class SparkExecutionContext method getBroadcastForVariable.

/**
	 * TODO So far we only create broadcast variables but never destroy
	 * them. This is a memory leak which might lead to executor out-of-memory.
	 * However, in order to handle this, we need to keep track when broadcast
	 * variables are no longer required.
	 *
	 * @param varname variable name
	 * @return wrapper for broadcast variables
	 * @throws DMLRuntimeException if DMLRuntimeException occurs
	 */
@SuppressWarnings("unchecked")
public PartitionedBroadcast<MatrixBlock> getBroadcastForVariable(String varname) throws DMLRuntimeException {
    long t0 = DMLScript.STATISTICS ? System.nanoTime() : 0;
    MatrixObject mo = getMatrixObject(varname);
    PartitionedBroadcast<MatrixBlock> bret = null;
    //reuse existing broadcast handle
    if (mo.getBroadcastHandle() != null && mo.getBroadcastHandle().isValid()) {
        bret = mo.getBroadcastHandle().getBroadcast();
    }
    //create new broadcast handle (never created, evicted)
    if (bret == null) {
        //account for overwritten invalid broadcast (e.g., evicted)
        if (mo.getBroadcastHandle() != null)
            CacheableData.addBroadcastSize(-mo.getBroadcastHandle().getSize());
        //obtain meta data for matrix
        int brlen = (int) mo.getNumRowsPerBlock();
        int bclen = (int) mo.getNumColumnsPerBlock();
        //create partitioned matrix block and release memory consumed by input
        MatrixBlock mb = mo.acquireRead();
        PartitionedBlock<MatrixBlock> pmb = new PartitionedBlock<MatrixBlock>(mb, brlen, bclen);
        mo.release();
        //determine coarse-grained partitioning
        int numPerPart = PartitionedBroadcast.computeBlocksPerPartition(mo.getNumRows(), mo.getNumColumns(), brlen, bclen);
        int numParts = (int) Math.ceil((double) pmb.getNumRowBlocks() * pmb.getNumColumnBlocks() / numPerPart);
        Broadcast<PartitionedBlock<MatrixBlock>>[] ret = new Broadcast[numParts];
        //create coarse-grained partitioned broadcasts
        if (numParts > 1) {
            for (int i = 0; i < numParts; i++) {
                int offset = i * numPerPart;
                int numBlks = Math.min(numPerPart, pmb.getNumRowBlocks() * pmb.getNumColumnBlocks() - offset);
                PartitionedBlock<MatrixBlock> tmp = pmb.createPartition(offset, numBlks, new MatrixBlock());
                ret[i] = getSparkContext().broadcast(tmp);
            }
        } else {
            //single partition
            ret[0] = getSparkContext().broadcast(pmb);
        }
        bret = new PartitionedBroadcast<MatrixBlock>(ret);
        BroadcastObject<MatrixBlock> bchandle = new BroadcastObject<MatrixBlock>(bret, varname, OptimizerUtils.estimatePartitionedSizeExactSparsity(mo.getMatrixCharacteristics()));
        mo.setBroadcastHandle(bchandle);
        CacheableData.addBroadcastSize(bchandle.getSize());
    }
    if (DMLScript.STATISTICS) {
        Statistics.accSparkBroadCastTime(System.nanoTime() - t0);
        Statistics.incSparkBroadcastCount(1);
    }
    return bret;
}
Also used : PartitionedBlock(org.apache.sysml.runtime.instructions.spark.data.PartitionedBlock) MatrixBlock(org.apache.sysml.runtime.matrix.data.MatrixBlock) MatrixObject(org.apache.sysml.runtime.controlprogram.caching.MatrixObject) PartitionedBroadcast(org.apache.sysml.runtime.instructions.spark.data.PartitionedBroadcast) Broadcast(org.apache.spark.broadcast.Broadcast) Checkpoint(org.apache.sysml.lops.Checkpoint) BroadcastObject(org.apache.sysml.runtime.instructions.spark.data.BroadcastObject)

Example 7 with PartitionedBroadcast

use of org.apache.sysml.runtime.instructions.spark.data.PartitionedBroadcast in project incubator-systemml by apache.

the class SparkExecutionContext method getBroadcastForFrameVariable.

@SuppressWarnings("unchecked")
public PartitionedBroadcast<FrameBlock> getBroadcastForFrameVariable(String varname) throws DMLRuntimeException {
    long t0 = DMLScript.STATISTICS ? System.nanoTime() : 0;
    FrameObject fo = getFrameObject(varname);
    PartitionedBroadcast<FrameBlock> bret = null;
    //reuse existing broadcast handle
    if (fo.getBroadcastHandle() != null && fo.getBroadcastHandle().isValid()) {
        bret = fo.getBroadcastHandle().getBroadcast();
    }
    //create new broadcast handle (never created, evicted)
    if (bret == null) {
        //account for overwritten invalid broadcast (e.g., evicted)
        if (fo.getBroadcastHandle() != null)
            CacheableData.addBroadcastSize(-fo.getBroadcastHandle().getSize());
        //obtain meta data for frame
        int bclen = (int) fo.getNumColumns();
        int brlen = OptimizerUtils.getDefaultFrameSize();
        //create partitioned frame block and release memory consumed by input
        FrameBlock mb = fo.acquireRead();
        PartitionedBlock<FrameBlock> pmb = new PartitionedBlock<FrameBlock>(mb, brlen, bclen);
        fo.release();
        //determine coarse-grained partitioning
        int numPerPart = PartitionedBroadcast.computeBlocksPerPartition(fo.getNumRows(), fo.getNumColumns(), brlen, bclen);
        int numParts = (int) Math.ceil((double) pmb.getNumRowBlocks() * pmb.getNumColumnBlocks() / numPerPart);
        Broadcast<PartitionedBlock<FrameBlock>>[] ret = new Broadcast[numParts];
        //create coarse-grained partitioned broadcasts
        if (numParts > 1) {
            for (int i = 0; i < numParts; i++) {
                int offset = i * numPerPart;
                int numBlks = Math.min(numPerPart, pmb.getNumRowBlocks() * pmb.getNumColumnBlocks() - offset);
                PartitionedBlock<FrameBlock> tmp = pmb.createPartition(offset, numBlks, new FrameBlock());
                ret[i] = getSparkContext().broadcast(tmp);
            }
        } else {
            //single partition
            ret[0] = getSparkContext().broadcast(pmb);
        }
        bret = new PartitionedBroadcast<FrameBlock>(ret);
        BroadcastObject<FrameBlock> bchandle = new BroadcastObject<FrameBlock>(bret, varname, OptimizerUtils.estimatePartitionedSizeExactSparsity(fo.getMatrixCharacteristics()));
        fo.setBroadcastHandle(bchandle);
        CacheableData.addBroadcastSize(bchandle.getSize());
    }
    if (DMLScript.STATISTICS) {
        Statistics.accSparkBroadCastTime(System.nanoTime() - t0);
        Statistics.incSparkBroadcastCount(1);
    }
    return bret;
}
Also used : PartitionedBlock(org.apache.sysml.runtime.instructions.spark.data.PartitionedBlock) FrameBlock(org.apache.sysml.runtime.matrix.data.FrameBlock) PartitionedBroadcast(org.apache.sysml.runtime.instructions.spark.data.PartitionedBroadcast) Broadcast(org.apache.spark.broadcast.Broadcast) FrameObject(org.apache.sysml.runtime.controlprogram.caching.FrameObject) Checkpoint(org.apache.sysml.lops.Checkpoint) BroadcastObject(org.apache.sysml.runtime.instructions.spark.data.BroadcastObject)

Aggregations

PartitionedBroadcast (org.apache.sysml.runtime.instructions.spark.data.PartitionedBroadcast)7 DMLRuntimeException (org.apache.sysml.runtime.DMLRuntimeException)4 SparkExecutionContext (org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext)4 MatrixCharacteristics (org.apache.sysml.runtime.matrix.MatrixCharacteristics)4 MatrixBlock (org.apache.sysml.runtime.matrix.data.MatrixBlock)4 JavaPairRDD (org.apache.spark.api.java.JavaPairRDD)3 Checkpoint (org.apache.sysml.lops.Checkpoint)3 BroadcastObject (org.apache.sysml.runtime.instructions.spark.data.BroadcastObject)3 PartitionedBlock (org.apache.sysml.runtime.instructions.spark.data.PartitionedBlock)3 FrameBlock (org.apache.sysml.runtime.matrix.data.FrameBlock)3 MatrixIndexes (org.apache.sysml.runtime.matrix.data.MatrixIndexes)3 Broadcast (org.apache.spark.broadcast.Broadcast)2 FrameObject (org.apache.sysml.runtime.controlprogram.caching.FrameObject)2 MatrixObject (org.apache.sysml.runtime.controlprogram.caching.MatrixObject)2 AggregateOperator (org.apache.sysml.runtime.matrix.operators.AggregateOperator)2 IndexRange (org.apache.sysml.runtime.util.IndexRange)2 ArrayList (java.util.ArrayList)1 SpoofCellwise (org.apache.sysml.runtime.codegen.SpoofCellwise)1 SpoofMultiAggregate (org.apache.sysml.runtime.codegen.SpoofMultiAggregate)1 SpoofOperator (org.apache.sysml.runtime.codegen.SpoofOperator)1