Search in sources :

Example 1 with Int2IntFrequencyDistribution

use of tl.lin.data.fd.Int2IntFrequencyDistribution in project Cloud9 by lintool.

the class LookupPostings method lookupTerm.

public static void lookupTerm(String term, MapFile.Reader reader, String collectionPath, FileSystem fs) throws IOException {
    FSDataInputStream collection = fs.open(new Path(collectionPath));
    Text key = new Text();
    PairOfWritables<IntWritable, ArrayListWritable<PairOfInts>> value = new PairOfWritables<IntWritable, ArrayListWritable<PairOfInts>>();
    key.set(term);
    Writable w = reader.get(key, value);
    if (w == null) {
        System.out.println("\nThe term '" + term + "' does not appear in the collection");
        return;
    }
    ArrayListWritable<PairOfInts> postings = value.getRightElement();
    System.out.println("\nComplete postings list for '" + term + "':");
    System.out.println("df = " + value.getLeftElement());
    Int2IntFrequencyDistribution hist = new Int2IntFrequencyDistributionEntry();
    for (PairOfInts pair : postings) {
        hist.increment(pair.getRightElement());
        System.out.print(pair);
        collection.seek(pair.getLeftElement());
        BufferedReader r = new BufferedReader(new InputStreamReader(collection));
        String d = r.readLine();
        d = d.length() > 80 ? d.substring(0, 80) + "..." : d;
        System.out.println(": " + d);
    }
    System.out.println("\nHistogram of tf values for '" + term + "'");
    for (PairOfInts pair : hist) {
        System.out.println(pair.getLeftElement() + "\t" + pair.getRightElement());
    }
    collection.close();
}
Also used : Path(org.apache.hadoop.fs.Path) ArrayListWritable(tl.lin.data.array.ArrayListWritable) InputStreamReader(java.io.InputStreamReader) Int2IntFrequencyDistribution(tl.lin.data.fd.Int2IntFrequencyDistribution) PairOfInts(tl.lin.data.pair.PairOfInts) Writable(org.apache.hadoop.io.Writable) ArrayListWritable(tl.lin.data.array.ArrayListWritable) IntWritable(org.apache.hadoop.io.IntWritable) Text(org.apache.hadoop.io.Text) Int2IntFrequencyDistributionEntry(tl.lin.data.fd.Int2IntFrequencyDistributionEntry) PairOfWritables(tl.lin.data.pair.PairOfWritables) BufferedReader(java.io.BufferedReader) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream) IntWritable(org.apache.hadoop.io.IntWritable)

Aggregations

BufferedReader (java.io.BufferedReader)1 InputStreamReader (java.io.InputStreamReader)1 FSDataInputStream (org.apache.hadoop.fs.FSDataInputStream)1 Path (org.apache.hadoop.fs.Path)1 IntWritable (org.apache.hadoop.io.IntWritable)1 Text (org.apache.hadoop.io.Text)1 Writable (org.apache.hadoop.io.Writable)1 ArrayListWritable (tl.lin.data.array.ArrayListWritable)1 Int2IntFrequencyDistribution (tl.lin.data.fd.Int2IntFrequencyDistribution)1 Int2IntFrequencyDistributionEntry (tl.lin.data.fd.Int2IntFrequencyDistributionEntry)1 PairOfInts (tl.lin.data.pair.PairOfInts)1 PairOfWritables (tl.lin.data.pair.PairOfWritables)1