Search in sources :

Example 1 with GridFSSplit

use of com.mongodb.hadoop.input.GridFSSplit in project mongo-hadoop by mongodb.

the class GridFSInputFormat method getSplits.

@Override
public List<InputSplit> getSplits(final JobContext context) throws IOException, InterruptedException {
    Configuration conf = context.getConfiguration();
    DBCollection inputCollection = MongoConfigUtil.getInputCollection(conf);
    MongoClientURI inputURI = MongoConfigUtil.getInputURI(conf);
    GridFS gridFS = new GridFS(inputCollection.getDB(), inputCollection.getName());
    DBObject query = MongoConfigUtil.getQuery(conf);
    List<InputSplit> splits = new LinkedList<InputSplit>();
    for (GridFSDBFile file : gridFS.find(query)) {
        // One split per file.
        if (MongoConfigUtil.isGridFSWholeFileSplit(conf)) {
            splits.add(new GridFSSplit(inputURI, (ObjectId) file.getId(), (int) file.getChunkSize(), file.getLength()));
        } else // One split per file chunk.
        {
            for (int chunk = 0; chunk < file.numChunks(); ++chunk) {
                splits.add(new GridFSSplit(inputURI, (ObjectId) file.getId(), (int) file.getChunkSize(), file.getLength(), chunk));
            }
        }
    }
    LOG.debug("Found GridFS splits: " + splits);
    return splits;
}
Also used : DBCollection(com.mongodb.DBCollection) GridFSSplit(com.mongodb.hadoop.input.GridFSSplit) Configuration(org.apache.hadoop.conf.Configuration) ObjectId(org.bson.types.ObjectId) GridFSDBFile(com.mongodb.gridfs.GridFSDBFile) MongoClientURI(com.mongodb.MongoClientURI) GridFS(com.mongodb.gridfs.GridFS) DBObject(com.mongodb.DBObject) InputSplit(org.apache.hadoop.mapreduce.InputSplit) LinkedList(java.util.LinkedList)

Aggregations

DBCollection (com.mongodb.DBCollection)1 DBObject (com.mongodb.DBObject)1 MongoClientURI (com.mongodb.MongoClientURI)1 GridFS (com.mongodb.gridfs.GridFS)1 GridFSDBFile (com.mongodb.gridfs.GridFSDBFile)1 GridFSSplit (com.mongodb.hadoop.input.GridFSSplit)1 LinkedList (java.util.LinkedList)1 Configuration (org.apache.hadoop.conf.Configuration)1 InputSplit (org.apache.hadoop.mapreduce.InputSplit)1 ObjectId (org.bson.types.ObjectId)1