Search in sources :

Example 6 with ArrayListWritable

use of tl.lin.data.array.ArrayListWritable in project Cloud9 by lintool.

the class LookupPostings method lookupTerm.

public static void lookupTerm(String term, MapFile.Reader reader, String collectionPath, FileSystem fs) throws IOException {
    FSDataInputStream collection = fs.open(new Path(collectionPath));
    Text key = new Text();
    PairOfWritables<IntWritable, ArrayListWritable<PairOfInts>> value = new PairOfWritables<IntWritable, ArrayListWritable<PairOfInts>>();
    key.set(term);
    Writable w = reader.get(key, value);
    if (w == null) {
        System.out.println("\nThe term '" + term + "' does not appear in the collection");
        return;
    }
    ArrayListWritable<PairOfInts> postings = value.getRightElement();
    System.out.println("\nComplete postings list for '" + term + "':");
    System.out.println("df = " + value.getLeftElement());
    Int2IntFrequencyDistribution hist = new Int2IntFrequencyDistributionEntry();
    for (PairOfInts pair : postings) {
        hist.increment(pair.getRightElement());
        System.out.print(pair);
        collection.seek(pair.getLeftElement());
        BufferedReader r = new BufferedReader(new InputStreamReader(collection));
        String d = r.readLine();
        d = d.length() > 80 ? d.substring(0, 80) + "..." : d;
        System.out.println(": " + d);
    }
    System.out.println("\nHistogram of tf values for '" + term + "'");
    for (PairOfInts pair : hist) {
        System.out.println(pair.getLeftElement() + "\t" + pair.getRightElement());
    }
    collection.close();
}
Also used : Path(org.apache.hadoop.fs.Path) ArrayListWritable(tl.lin.data.array.ArrayListWritable) InputStreamReader(java.io.InputStreamReader) Int2IntFrequencyDistribution(tl.lin.data.fd.Int2IntFrequencyDistribution) PairOfInts(tl.lin.data.pair.PairOfInts) Writable(org.apache.hadoop.io.Writable) ArrayListWritable(tl.lin.data.array.ArrayListWritable) IntWritable(org.apache.hadoop.io.IntWritable) Text(org.apache.hadoop.io.Text) Int2IntFrequencyDistributionEntry(tl.lin.data.fd.Int2IntFrequencyDistributionEntry) PairOfWritables(tl.lin.data.pair.PairOfWritables) BufferedReader(java.io.BufferedReader) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream) IntWritable(org.apache.hadoop.io.IntWritable)

Aggregations

IntWritable (org.apache.hadoop.io.IntWritable)6 ArrayListWritable (tl.lin.data.array.ArrayListWritable)6 Path (org.apache.hadoop.fs.Path)5 Configuration (org.apache.hadoop.conf.Configuration)3 FileSystem (org.apache.hadoop.fs.FileSystem)3 SequenceFile (org.apache.hadoop.io.SequenceFile)3 Text (org.apache.hadoop.io.Text)3 PairOfInts (tl.lin.data.pair.PairOfInts)3 PairOfWritables (tl.lin.data.pair.PairOfWritables)3 AnchorText (edu.umd.cloud9.webgraph.data.AnchorText)2 BufferedReader (java.io.BufferedReader)1 IOException (java.io.IOException)1 InputStreamReader (java.io.InputStreamReader)1 DecimalFormat (java.text.DecimalFormat)1 FSDataInputStream (org.apache.hadoop.fs.FSDataInputStream)1 MapFile (org.apache.hadoop.io.MapFile)1 Writable (org.apache.hadoop.io.Writable)1 Test (org.junit.Test)1 Int2IntFrequencyDistribution (tl.lin.data.fd.Int2IntFrequencyDistribution)1 Int2IntFrequencyDistributionEntry (tl.lin.data.fd.Int2IntFrequencyDistributionEntry)1