Search in sources :

Example 36 with WritableComparable

use of org.apache.hadoop.io.WritableComparable in project Cloud9 by lintool.

the class CreateMetadata method GenerateMetadata.

public static void GenerateMetadata(Path bitextPath, Path resultPath) throws IOException {
    System.out.println(bitextPath.toString());
    JobConf conf = new JobConf(CreateMetadata.class);
    FileSystem fileSys = FileSystem.get(conf);
    // SequenceFile.Reader[] x = SequenceFileOutputFormat.getReaders(conf, bitextPath);
    SequenceFile.Reader[] x = SequenceFileOutputFormat.getReaders(conf, new Path("/shared/bitexts/ar-en.ldc.10k/ar-en.10k.bitext"));
    WritableComparable key = new IntWritable();
    PhrasePair value = new PhrasePair();
    int sc = 0;
    int ec = 0;
    int fc = 0;
    try {
        for (SequenceFile.Reader r : x) while (r.next(key, value)) {
            sc = sc + 1;
            for (int word : value.getE().getWords()) if (word > ec)
                ec = word;
            for (int word : value.getF().getWords()) if (word > fc)
                fc = word;
        }
    } catch (IOException e) {
        throw new RuntimeException("IO exception: " + e.getMessage());
    }
    Metadata theMetadata = new Metadata(sc, ec, fc);
    ObjectOutputStream mdstream = new ObjectOutputStream(new BufferedOutputStream(FileSystem.get(conf).create(resultPath)));
    mdstream.writeObject(theMetadata);
    mdstream.close();
}
Also used : Path(org.apache.hadoop.fs.Path) IOException(java.io.IOException) ObjectOutputStream(java.io.ObjectOutputStream) SequenceFile(org.apache.hadoop.io.SequenceFile) WritableComparable(org.apache.hadoop.io.WritableComparable) FileSystem(org.apache.hadoop.fs.FileSystem) JobConf(org.apache.hadoop.mapred.JobConf) BufferedOutputStream(java.io.BufferedOutputStream) IntWritable(org.apache.hadoop.io.IntWritable)

Aggregations

WritableComparable (org.apache.hadoop.io.WritableComparable)36 Writable (org.apache.hadoop.io.Writable)15 IOException (java.io.IOException)14 Path (org.apache.hadoop.fs.Path)12 FileSystem (org.apache.hadoop.fs.FileSystem)11 JobConf (org.apache.hadoop.mapred.JobConf)6 CompressionCodec (org.apache.hadoop.io.compress.CompressionCodec)5 ArrayList (java.util.ArrayList)4 IntWritable (org.apache.hadoop.io.IntWritable)4 NullWritable (org.apache.hadoop.io.NullWritable)4 SequenceFile (org.apache.hadoop.io.SequenceFile)4 TaskAttemptContext (org.apache.hadoop.mapreduce.TaskAttemptContext)4 PCollection (com.tdunning.plume.PCollection)3 OutputChannel (com.tdunning.plume.local.lazy.MSCR.OutputChannel)3 PlumeObject (com.tdunning.plume.local.lazy.MapRedExecutor.PlumeObject)3 HashMap (java.util.HashMap)3 BytesWritable (org.apache.hadoop.io.BytesWritable)3 FloatWritable (org.apache.hadoop.io.FloatWritable)3 HCatRecord (org.apache.hive.hcatalog.data.HCatRecord)3 DoFn (com.tdunning.plume.DoFn)2