Search in sources :

Example 21 with RecordReader

use of org.datavec.api.records.reader.RecordReader in project deeplearning4j by deeplearning4j.

the class StringToDataSetExportFunction method processBatchIfRequired.

private void processBatchIfRequired(List<List<Writable>> list, boolean finalRecord) throws Exception {
    if (list.isEmpty())
        return;
    if (list.size() < batchSize && !finalRecord)
        return;
    RecordReader rr = new CollectionRecordReader(list);
    RecordReaderDataSetIterator iter = new RecordReaderDataSetIterator(rr, new SelfWritableConverter(), batchSize, labelIndex, numPossibleLabels, regression);
    DataSet ds = iter.next();
    String filename = "dataset_" + uid + "_" + (outputCount++) + ".bin";
    URI uri = new URI(outputDir.getPath() + "/" + filename);
    FileSystem file = FileSystem.get(uri, conf);
    try (FSDataOutputStream out = file.create(new Path(uri))) {
        ds.save(out);
    }
    list.clear();
}
Also used : Path(org.apache.hadoop.fs.Path) SelfWritableConverter(org.datavec.api.io.converters.SelfWritableConverter) DataSet(org.nd4j.linalg.dataset.DataSet) RecordReader(org.datavec.api.records.reader.RecordReader) CollectionRecordReader(org.datavec.api.records.reader.impl.collection.CollectionRecordReader) FileSystem(org.apache.hadoop.fs.FileSystem) CollectionRecordReader(org.datavec.api.records.reader.impl.collection.CollectionRecordReader) RecordReaderDataSetIterator(org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator) FSDataOutputStream(org.apache.hadoop.fs.FSDataOutputStream) URI(java.net.URI)

Aggregations

RecordReader (org.datavec.api.records.reader.RecordReader)21 Test (org.junit.Test)18 CSVRecordReader (org.datavec.api.records.reader.impl.csv.CSVRecordReader)17 FileSplit (org.datavec.api.split.FileSplit)17 SequenceRecordReader (org.datavec.api.records.reader.SequenceRecordReader)13 DataSet (org.nd4j.linalg.dataset.DataSet)13 ClassPathResource (org.nd4j.linalg.io.ClassPathResource)12 CSVSequenceRecordReader (org.datavec.api.records.reader.impl.csv.CSVSequenceRecordReader)11 DataSetIterator (org.nd4j.linalg.dataset.api.iterator.DataSetIterator)10 CollectionRecordReader (org.datavec.api.records.reader.impl.collection.CollectionRecordReader)7 INDArray (org.nd4j.linalg.api.ndarray.INDArray)7 CollectionSequenceRecordReader (org.datavec.api.records.reader.impl.collection.CollectionSequenceRecordReader)6 ImageRecordReader (org.datavec.image.recordreader.ImageRecordReader)6 RecordReaderDataSetIterator (org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator)6 RecordMetaData (org.datavec.api.records.metadata.RecordMetaData)5 MultiDataSet (org.nd4j.linalg.dataset.api.MultiDataSet)5 MultiDataSetIterator (org.nd4j.linalg.dataset.api.iterator.MultiDataSetIterator)5 ClassPathResource (org.datavec.api.util.ClassPathResource)4 Record (org.datavec.api.records.Record)3 NDArrayWritable (org.datavec.common.data.NDArrayWritable)3