Search in sources :

Example 1 with SequenceFileRecordReader

use of org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader in project beam by apache.

the class HadoopFormatIOSequenceFileTest method extractResultsFromFile.

private Stream<KV<Text, LongWritable>> extractResultsFromFile(String fileName) {
    try (SequenceFileRecordReader<Text, LongWritable> reader = new SequenceFileRecordReader<>()) {
        Path path = new Path(fileName);
        TaskAttemptContext taskContext = HadoopFormats.createTaskAttemptContext(new Configuration(), new JobID("readJob", 0), 0);
        reader.initialize(new FileSplit(path, 0L, Long.MAX_VALUE, new String[] { "localhost" }), taskContext);
        List<KV<Text, LongWritable>> result = new ArrayList<>();
        while (reader.nextKeyValue()) {
            result.add(KV.of(new Text(reader.getCurrentKey().toString()), new LongWritable(reader.getCurrentValue().get())));
        }
        return result.stream();
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}
Also used : Path(org.apache.hadoop.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) ArrayList(java.util.ArrayList) SequenceFileRecordReader(org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader) Text(org.apache.hadoop.io.Text) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) KV(org.apache.beam.sdk.values.KV) FileSplit(org.apache.hadoop.mapreduce.lib.input.FileSplit) LongWritable(org.apache.hadoop.io.LongWritable) JobID(org.apache.hadoop.mapreduce.JobID)

Aggregations

ArrayList (java.util.ArrayList)1 KV (org.apache.beam.sdk.values.KV)1 Configuration (org.apache.hadoop.conf.Configuration)1 Path (org.apache.hadoop.fs.Path)1 LongWritable (org.apache.hadoop.io.LongWritable)1 Text (org.apache.hadoop.io.Text)1 JobID (org.apache.hadoop.mapreduce.JobID)1 TaskAttemptContext (org.apache.hadoop.mapreduce.TaskAttemptContext)1 FileSplit (org.apache.hadoop.mapreduce.lib.input.FileSplit)1 SequenceFileRecordReader (org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader)1