Search in sources :

Example 6 with PartitionedBlock

use of org.apache.sysml.runtime.instructions.spark.data.PartitionedBlock in project incubator-systemml by apache.

the class SparkExecutionContext method getBroadcastForFrameVariable.

@SuppressWarnings("unchecked")
public PartitionedBroadcast<FrameBlock> getBroadcastForFrameVariable(String varname) {
    long t0 = DMLScript.STATISTICS ? System.nanoTime() : 0;
    FrameObject fo = getFrameObject(varname);
    PartitionedBroadcast<FrameBlock> bret = null;
    // reuse existing broadcast handle
    if (fo.getBroadcastHandle() != null && fo.getBroadcastHandle().isValid()) {
        bret = fo.getBroadcastHandle().getBroadcast();
    }
    // create new broadcast handle (never created, evicted)
    if (bret == null) {
        // account for overwritten invalid broadcast (e.g., evicted)
        if (fo.getBroadcastHandle() != null)
            CacheableData.addBroadcastSize(-fo.getBroadcastHandle().getSize());
        // obtain meta data for frame
        int bclen = (int) fo.getNumColumns();
        int brlen = OptimizerUtils.getDefaultFrameSize();
        // create partitioned frame block and release memory consumed by input
        FrameBlock mb = fo.acquireRead();
        PartitionedBlock<FrameBlock> pmb = new PartitionedBlock<>(mb, brlen, bclen);
        fo.release();
        // determine coarse-grained partitioning
        int numPerPart = PartitionedBroadcast.computeBlocksPerPartition(fo.getNumRows(), fo.getNumColumns(), brlen, bclen);
        int numParts = (int) Math.ceil((double) pmb.getNumRowBlocks() * pmb.getNumColumnBlocks() / numPerPart);
        Broadcast<PartitionedBlock<FrameBlock>>[] ret = new Broadcast[numParts];
        // create coarse-grained partitioned broadcasts
        if (numParts > 1) {
            for (int i = 0; i < numParts; i++) {
                int offset = i * numPerPart;
                int numBlks = Math.min(numPerPart, pmb.getNumRowBlocks() * pmb.getNumColumnBlocks() - offset);
                PartitionedBlock<FrameBlock> tmp = pmb.createPartition(offset, numBlks, new FrameBlock());
                ret[i] = getSparkContext().broadcast(tmp);
                if (!isLocalMaster())
                    tmp.clearBlocks();
            }
        } else {
            // single partition
            ret[0] = getSparkContext().broadcast(pmb);
            if (!isLocalMaster())
                pmb.clearBlocks();
        }
        bret = new PartitionedBroadcast<>(ret, fo.getMatrixCharacteristics());
        BroadcastObject<FrameBlock> bchandle = new BroadcastObject<>(bret, OptimizerUtils.estimatePartitionedSizeExactSparsity(fo.getMatrixCharacteristics()));
        fo.setBroadcastHandle(bchandle);
        CacheableData.addBroadcastSize(bchandle.getSize());
    }
    if (DMLScript.STATISTICS) {
        Statistics.accSparkBroadCastTime(System.nanoTime() - t0);
        Statistics.incSparkBroadcastCount(1);
    }
    return bret;
}
Also used : PartitionedBlock(org.apache.sysml.runtime.instructions.spark.data.PartitionedBlock) FrameBlock(org.apache.sysml.runtime.matrix.data.FrameBlock) PartitionedBroadcast(org.apache.sysml.runtime.instructions.spark.data.PartitionedBroadcast) Broadcast(org.apache.spark.broadcast.Broadcast) FrameObject(org.apache.sysml.runtime.controlprogram.caching.FrameObject) Checkpoint(org.apache.sysml.lops.Checkpoint) BroadcastObject(org.apache.sysml.runtime.instructions.spark.data.BroadcastObject)

Example 7 with PartitionedBlock

use of org.apache.sysml.runtime.instructions.spark.data.PartitionedBlock in project incubator-systemml by apache.

the class SparkExecutionContext method getBroadcastForVariable.

@SuppressWarnings("unchecked")
public PartitionedBroadcast<MatrixBlock> getBroadcastForVariable(String varname) {
    // NOTE: The memory consumption of this method is the in-memory size of the
    // matrix object plus the partitioned size in 1k-1k blocks. Since the call
    // to broadcast happens after the matrix object has been released, the memory
    // requirements of blockified chunks in Spark's block manager are covered under
    // this maximum. Also note that we explicitly clear the in-memory blocks once
    // the broadcasts are created (other than in local mode) in order to avoid
    // unnecessary memory requirements during the lifetime of this broadcast handle.
    long t0 = DMLScript.STATISTICS ? System.nanoTime() : 0;
    MatrixObject mo = getMatrixObject(varname);
    PartitionedBroadcast<MatrixBlock> bret = null;
    // reuse existing broadcast handle
    if (mo.getBroadcastHandle() != null && mo.getBroadcastHandle().isValid()) {
        bret = mo.getBroadcastHandle().getBroadcast();
    }
    // create new broadcast handle (never created, evicted)
    if (bret == null) {
        // account for overwritten invalid broadcast (e.g., evicted)
        if (mo.getBroadcastHandle() != null)
            CacheableData.addBroadcastSize(-mo.getBroadcastHandle().getSize());
        // obtain meta data for matrix
        int brlen = (int) mo.getNumRowsPerBlock();
        int bclen = (int) mo.getNumColumnsPerBlock();
        // create partitioned matrix block and release memory consumed by input
        MatrixBlock mb = mo.acquireRead();
        PartitionedBlock<MatrixBlock> pmb = new PartitionedBlock<>(mb, brlen, bclen);
        mo.release();
        // determine coarse-grained partitioning
        int numPerPart = PartitionedBroadcast.computeBlocksPerPartition(mo.getNumRows(), mo.getNumColumns(), brlen, bclen);
        int numParts = (int) Math.ceil((double) pmb.getNumRowBlocks() * pmb.getNumColumnBlocks() / numPerPart);
        Broadcast<PartitionedBlock<MatrixBlock>>[] ret = new Broadcast[numParts];
        // create coarse-grained partitioned broadcasts
        if (numParts > 1) {
            for (int i = 0; i < numParts; i++) {
                int offset = i * numPerPart;
                int numBlks = Math.min(numPerPart, pmb.getNumRowBlocks() * pmb.getNumColumnBlocks() - offset);
                PartitionedBlock<MatrixBlock> tmp = pmb.createPartition(offset, numBlks, new MatrixBlock());
                ret[i] = getSparkContext().broadcast(tmp);
                if (!isLocalMaster())
                    tmp.clearBlocks();
            }
        } else {
            // single partition
            ret[0] = getSparkContext().broadcast(pmb);
            if (!isLocalMaster())
                pmb.clearBlocks();
        }
        bret = new PartitionedBroadcast<>(ret, mo.getMatrixCharacteristics());
        BroadcastObject<MatrixBlock> bchandle = new BroadcastObject<>(bret, OptimizerUtils.estimatePartitionedSizeExactSparsity(mo.getMatrixCharacteristics()));
        mo.setBroadcastHandle(bchandle);
        CacheableData.addBroadcastSize(bchandle.getSize());
    }
    if (DMLScript.STATISTICS) {
        Statistics.accSparkBroadCastTime(System.nanoTime() - t0);
        Statistics.incSparkBroadcastCount(1);
    }
    return bret;
}
Also used : PartitionedBlock(org.apache.sysml.runtime.instructions.spark.data.PartitionedBlock) MatrixBlock(org.apache.sysml.runtime.matrix.data.MatrixBlock) CompressedMatrixBlock(org.apache.sysml.runtime.compress.CompressedMatrixBlock) MatrixObject(org.apache.sysml.runtime.controlprogram.caching.MatrixObject) PartitionedBroadcast(org.apache.sysml.runtime.instructions.spark.data.PartitionedBroadcast) Broadcast(org.apache.spark.broadcast.Broadcast) Checkpoint(org.apache.sysml.lops.Checkpoint) BroadcastObject(org.apache.sysml.runtime.instructions.spark.data.BroadcastObject)

Aggregations

PartitionedBlock (org.apache.sysml.runtime.instructions.spark.data.PartitionedBlock)7 MatrixBlock (org.apache.sysml.runtime.matrix.data.MatrixBlock)5 MatrixIndexes (org.apache.sysml.runtime.matrix.data.MatrixIndexes)4 Checkpoint (org.apache.sysml.lops.Checkpoint)3 BroadcastObject (org.apache.sysml.runtime.instructions.spark.data.BroadcastObject)3 PartitionedBroadcast (org.apache.sysml.runtime.instructions.spark.data.PartitionedBroadcast)3 Broadcast (org.apache.spark.broadcast.Broadcast)2 CompressedMatrixBlock (org.apache.sysml.runtime.compress.CompressedMatrixBlock)2 SparkExecutionContext (org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext)2 IsBlockInRange (org.apache.sysml.runtime.instructions.spark.functions.IsBlockInRange)2 MatrixCharacteristics (org.apache.sysml.runtime.matrix.MatrixCharacteristics)2 StorageLevel (org.apache.spark.storage.StorageLevel)1 FrameObject (org.apache.sysml.runtime.controlprogram.caching.FrameObject)1 MatrixObject (org.apache.sysml.runtime.controlprogram.caching.MatrixObject)1 SortIndex (org.apache.sysml.runtime.functionobjects.SortIndex)1 LineageObject (org.apache.sysml.runtime.instructions.spark.data.LineageObject)1 RDDObject (org.apache.sysml.runtime.instructions.spark.data.RDDObject)1 RowMatrixBlock (org.apache.sysml.runtime.instructions.spark.data.RowMatrixBlock)1 FrameBlock (org.apache.sysml.runtime.matrix.data.FrameBlock)1 ReorgOperator (org.apache.sysml.runtime.matrix.operators.ReorgOperator)1