Search in sources :

Example 1 with BSONFileInputFormat

use of com.mongodb.hadoop.mapred.BSONFileInputFormat in project mongo-hadoop by mongodb.

the class BSONFileInputFormatTest method enronEmails.

@Test
public void enronEmails() throws IOException {
    BSONFileInputFormat inputFormat = new BSONFileInputFormat();
    JobConf job = new JobConf();
    String inputDirectory = new File(EXAMPLE_DATA_HOME, "/dump/enron_mail/messages.bson").getAbsoluteFile().toURI().toString();
    // Hadoop 2.X
    job.set("mapreduce.input.fileinputformat.inputdir", inputDirectory);
    // Hadoop 1.2.X
    job.set("mapred.input.dir", inputDirectory);
    FileSplit[] splits = inputFormat.getSplits(job, 5);
    int count = 0;
    BSONWritable writable = new BSONWritable();
    for (FileSplit split : splits) {
        RecordReader<NullWritable, BSONWritable> recordReader = inputFormat.getRecordReader(split, job, null);
        while (recordReader.next(null, writable)) {
            count++;
        }
    }
    assertEquals("There are 501513 messages in the enron corpus", 501513, count);
}
Also used : BSONWritable(com.mongodb.hadoop.io.BSONWritable) BSONFileInputFormat(com.mongodb.hadoop.mapred.BSONFileInputFormat) FileSplit(org.apache.hadoop.mapred.FileSplit) JobConf(org.apache.hadoop.mapred.JobConf) File(java.io.File) NullWritable(org.apache.hadoop.io.NullWritable) Test(org.junit.Test)

Aggregations

BSONWritable (com.mongodb.hadoop.io.BSONWritable)1 BSONFileInputFormat (com.mongodb.hadoop.mapred.BSONFileInputFormat)1 File (java.io.File)1 NullWritable (org.apache.hadoop.io.NullWritable)1 FileSplit (org.apache.hadoop.mapred.FileSplit)1 JobConf (org.apache.hadoop.mapred.JobConf)1 Test (org.junit.Test)1