Search in sources :

Example 21 with BSONWritable

use of com.mongodb.hadoop.io.BSONWritable in project mongo-hadoop by mongodb.

the class BSONFileInputFormatTest method enronEmails.

@Test
public void enronEmails() throws IOException {
    BSONFileInputFormat inputFormat = new BSONFileInputFormat();
    JobConf job = new JobConf();
    String inputDirectory = new File(EXAMPLE_DATA_HOME, "/dump/enron_mail/messages.bson").getAbsoluteFile().toURI().toString();
    // Hadoop 2.X
    job.set("mapreduce.input.fileinputformat.inputdir", inputDirectory);
    // Hadoop 1.2.X
    job.set("mapred.input.dir", inputDirectory);
    FileSplit[] splits = inputFormat.getSplits(job, 5);
    int count = 0;
    BSONWritable writable = new BSONWritable();
    for (FileSplit split : splits) {
        RecordReader<NullWritable, BSONWritable> recordReader = inputFormat.getRecordReader(split, job, null);
        while (recordReader.next(null, writable)) {
            count++;
        }
    }
    assertEquals("There are 501513 messages in the enron corpus", 501513, count);
}
Also used : BSONWritable(com.mongodb.hadoop.io.BSONWritable) BSONFileInputFormat(com.mongodb.hadoop.mapred.BSONFileInputFormat) FileSplit(org.apache.hadoop.mapred.FileSplit) JobConf(org.apache.hadoop.mapred.JobConf) File(java.io.File) NullWritable(org.apache.hadoop.io.NullWritable) Test(org.junit.Test)

Aggregations

BSONWritable (com.mongodb.hadoop.io.BSONWritable)21 BasicBSONObject (org.bson.BasicBSONObject)14 Test (org.junit.Test)13 ListObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector)11 MapObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector)11 ObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector)11 StructObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector)11 BasicDBObject (com.mongodb.BasicDBObject)4 MongoUpdateWritable (com.mongodb.hadoop.io.MongoUpdateWritable)4 ArrayList (java.util.ArrayList)4 BSONObject (org.bson.BSONObject)4 DBObject (com.mongodb.DBObject)3 File (java.io.File)2 IOException (java.io.IOException)2 SerDeException (org.apache.hadoop.hive.serde2.SerDeException)2 NullWritable (org.apache.hadoop.io.NullWritable)2 FileSplit (org.apache.hadoop.mapred.FileSplit)2 JobConf (org.apache.hadoop.mapred.JobConf)2 BulkUpdateRequestBuilder (com.mongodb.BulkUpdateRequestBuilder)1 BulkWriteOperation (com.mongodb.BulkWriteOperation)1