Search in sources :

Example 6 with OffsetCount

use of org.apache.sysml.runtime.matrix.CSVReblockMR.OffsetCount in project incubator-systemml by apache.

the class GTFMTDReducer method generateOffsetsFile.

@SuppressWarnings({ "unchecked", "deprecation" })
private long generateOffsetsFile(ArrayList<OffsetCount> list) throws IllegalArgumentException, IOException {
    Collections.sort(list);
    SequenceFile.Writer writer = null;
    long lineOffset = 0;
    try {
        writer = new SequenceFile.Writer(FileSystem.get(_rJob), _rJob, new Path(_agents.getOffsetFile() + "/part-00000"), ByteWritable.class, OffsetCount.class);
        for (OffsetCount oc : list) {
            long count = oc.count;
            oc.count = lineOffset;
            writer.append(new ByteWritable((byte) 0), oc);
            lineOffset += count;
        }
    } finally {
        IOUtilFunctions.closeSilently(writer);
    }
    list.clear();
    return lineOffset;
}
Also used : Path(org.apache.hadoop.fs.Path) OffsetCount(org.apache.sysml.runtime.matrix.CSVReblockMR.OffsetCount) SequenceFile(org.apache.hadoop.io.SequenceFile) ByteWritable(org.apache.hadoop.io.ByteWritable)

Example 7 with OffsetCount

use of org.apache.sysml.runtime.matrix.CSVReblockMR.OffsetCount in project incubator-systemml by apache.

the class GTFMTDMapper method close.

@Override
public void close() throws IOException {
    _agents.getMVImputeAgent().mapOutputTransformationMetadata(_collector, _mapTaskID, _agents);
    _agents.getRecodeAgent().mapOutputTransformationMetadata(_collector, _mapTaskID, _agents);
    _agents.getBinAgent().mapOutputTransformationMetadata(_collector, _mapTaskID, _agents);
    // OffsetCount is denoted as a DistinctValue by concatenating parfile name and offset within partfile.
    if (_collector != null) {
        IntWritable key = new IntWritable((int) _agents.getNumCols() + 1);
        DistinctValue val = new DistinctValue(new OffsetCount(_partFileName, _offsetInPartFile, _agents.getValid()));
        _collector.collect(key, val);
    }
    // reset global variables, required when the jvm is reused.
    _firstRecordInSplit = true;
    _offsetInPartFile = -1;
    _partFileWithHeader = false;
}
Also used : OffsetCount(org.apache.sysml.runtime.matrix.CSVReblockMR.OffsetCount) IntWritable(org.apache.hadoop.io.IntWritable)

Aggregations

OffsetCount (org.apache.sysml.runtime.matrix.CSVReblockMR.OffsetCount)7 ByteWritable (org.apache.hadoop.io.ByteWritable)4 FileSystem (org.apache.hadoop.fs.FileSystem)3 Path (org.apache.hadoop.fs.Path)3 SequenceFile (org.apache.hadoop.io.SequenceFile)3 IOException (java.io.IOException)2 CSVReblockInstruction (org.apache.sysml.runtime.instructions.mr.CSVReblockInstruction)2 ArrayList (java.util.ArrayList)1 IntWritable (org.apache.hadoop.io.IntWritable)1 Reader (org.apache.hadoop.io.SequenceFile.Reader)1 DMLRuntimeException (org.apache.sysml.runtime.DMLRuntimeException)1 MatrixReader (org.apache.sysml.runtime.io.MatrixReader)1 CSVReblockMapper (org.apache.sysml.runtime.matrix.mapred.CSVReblockMapper)1 IndexedBlockRow (org.apache.sysml.runtime.matrix.mapred.CSVReblockMapper.IndexedBlockRow)1 JSONException (org.apache.wink.json4j.JSONException)1