Search in sources :

Example 26 with FrameObject

use of org.apache.sysml.runtime.controlprogram.caching.FrameObject in project incubator-systemml by apache.

the class MLContextConversionUtil method javaRDDStringIJVToFrameObject.

/**
	 * Convert a {@code JavaRDD<String>} in IJV format to a {@code FrameObject}
	 * . Note that metadata is required for IJV format.
	 * 
	 * @param variableName
	 *            name of the variable associated with the frame
	 * @param javaRDD
	 *            the Java RDD of strings
	 * @param frameMetadata
	 *            frame metadata
	 * @return the {@code JavaRDD<String>} converted to a {@code FrameObject}
	 */
public static FrameObject javaRDDStringIJVToFrameObject(String variableName, JavaRDD<String> javaRDD, FrameMetadata frameMetadata) {
    JavaPairRDD<LongWritable, Text> javaPairRDD = javaRDD.mapToPair(new ConvertStringToLongTextPair());
    MatrixCharacteristics mc = (frameMetadata != null) ? frameMetadata.asMatrixCharacteristics() : new MatrixCharacteristics();
    JavaPairRDD<LongWritable, Text> javaPairRDDText = javaPairRDD.mapToPair(new CopyTextInputFunction());
    FrameObject frameObject = new FrameObject(OptimizerUtils.getUniqueTempFileName(), new MatrixFormatMetaData(mc, OutputInfo.BinaryBlockOutputInfo, InputInfo.BinaryBlockInputInfo), frameMetadata.getFrameSchema().getSchema().toArray(new ValueType[0]));
    JavaPairRDD<Long, FrameBlock> rdd;
    try {
        ValueType[] lschema = null;
        if (lschema == null)
            lschema = UtilFunctions.nCopies((int) mc.getCols(), ValueType.STRING);
        rdd = FrameRDDConverterUtils.textCellToBinaryBlock(jsc(), javaPairRDDText, mc, lschema);
    } catch (DMLRuntimeException e) {
        e.printStackTrace();
        return null;
    }
    frameObject.setRDDHandle(new RDDObject(rdd, variableName));
    return frameObject;
}
Also used : ValueType(org.apache.sysml.parser.Expression.ValueType) Text(org.apache.hadoop.io.Text) FrameObject(org.apache.sysml.runtime.controlprogram.caching.FrameObject) MatrixFormatMetaData(org.apache.sysml.runtime.matrix.MatrixFormatMetaData) MatrixCharacteristics(org.apache.sysml.runtime.matrix.MatrixCharacteristics) DMLRuntimeException(org.apache.sysml.runtime.DMLRuntimeException) CopyTextInputFunction(org.apache.sysml.runtime.instructions.spark.functions.CopyTextInputFunction) ConvertStringToLongTextPair(org.apache.sysml.runtime.instructions.spark.functions.ConvertStringToLongTextPair) FrameBlock(org.apache.sysml.runtime.matrix.data.FrameBlock) RDDObject(org.apache.sysml.runtime.instructions.spark.data.RDDObject) LongWritable(org.apache.hadoop.io.LongWritable)

Aggregations

FrameObject (org.apache.sysml.runtime.controlprogram.caching.FrameObject)26 DMLRuntimeException (org.apache.sysml.runtime.DMLRuntimeException)14 FrameBlock (org.apache.sysml.runtime.matrix.data.FrameBlock)14 MatrixCharacteristics (org.apache.sysml.runtime.matrix.MatrixCharacteristics)13 ValueType (org.apache.sysml.parser.Expression.ValueType)7 MatrixObject (org.apache.sysml.runtime.controlprogram.caching.MatrixObject)7 MetaDataFormat (org.apache.sysml.runtime.matrix.MetaDataFormat)7 RDDObject (org.apache.sysml.runtime.instructions.spark.data.RDDObject)6 MatrixBlock (org.apache.sysml.runtime.matrix.data.MatrixBlock)6 LongWritable (org.apache.hadoop.io.LongWritable)5 Text (org.apache.hadoop.io.Text)5 Data (org.apache.sysml.runtime.instructions.cp.Data)4 ConvertStringToLongTextPair (org.apache.sysml.runtime.instructions.spark.functions.ConvertStringToLongTextPair)4 CopyTextInputFunction (org.apache.sysml.runtime.instructions.spark.functions.CopyTextInputFunction)4 JavaPairRDD (org.apache.spark.api.java.JavaPairRDD)3 DMLException (org.apache.sysml.api.DMLException)3 CacheableData (org.apache.sysml.runtime.controlprogram.caching.CacheableData)3 SparkExecutionContext (org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext)3 MatrixFormatMetaData (org.apache.sysml.runtime.matrix.MatrixFormatMetaData)3 Encoder (org.apache.sysml.runtime.transform.encode.Encoder)3