Search in sources :

Example 1 with PairOfInts

use of tl.lin.data.pair.PairOfInts in project Cloud9 by lintool.

the class BooleanRetrieval method fetchPostings.

private ArrayListWritable<PairOfInts> fetchPostings(String term) throws IOException {
    Text key = new Text();
    PairOfWritables<IntWritable, ArrayListWritable<PairOfInts>> value = new PairOfWritables<IntWritable, ArrayListWritable<PairOfInts>>();
    key.set(term);
    index.get(key, value);
    return value.getRightElement();
}
Also used : ArrayListWritable(tl.lin.data.array.ArrayListWritable) PairOfWritables(tl.lin.data.pair.PairOfWritables) PairOfInts(tl.lin.data.pair.PairOfInts) Text(org.apache.hadoop.io.Text) IntWritable(org.apache.hadoop.io.IntWritable)

Example 2 with PairOfInts

use of tl.lin.data.pair.PairOfInts in project Cloud9 by lintool.

the class InvertedIndexingIT method testInvertedIndexing.

@Test
public void testInvertedIndexing() throws Exception {
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(conf);
    assertTrue(fs.exists(collectionPath));
    String[] args = new String[] { "hadoop --config src/test/resources/hadoop-local-conf/ jar", IntegrationUtils.getJar("target", "cloud9"), edu.umd.cloud9.example.ir.BuildInvertedIndex.class.getCanonicalName(), "-input", collectionPath.toString(), "-output", tmpPrefix, "-numReducers", "1" };
    IntegrationUtils.exec(Joiner.on(" ").join(args));
    MapFile.Reader reader = new MapFile.Reader(new Path(tmpPrefix + "/part-r-00000"), conf);
    Text key = new Text();
    PairOfWritables<IntWritable, ArrayListWritable<PairOfInts>> value = new PairOfWritables<IntWritable, ArrayListWritable<PairOfInts>>();
    key.set("gold");
    reader.get(key, value);
    assertEquals(584, value.getLeftElement().get());
    ArrayListWritable<PairOfInts> postings = value.getRightElement();
    assertEquals(584, value.getLeftElement().get());
    assertEquals(5303, postings.get(0).getLeftElement());
    assertEquals(684030, postings.get(100).getLeftElement());
    assertEquals(1634312, postings.get(200).getLeftElement());
    reader.close();
}
Also used : Path(org.apache.hadoop.fs.Path) ArrayListWritable(tl.lin.data.array.ArrayListWritable) Configuration(org.apache.hadoop.conf.Configuration) PairOfInts(tl.lin.data.pair.PairOfInts) MapFile(org.apache.hadoop.io.MapFile) Text(org.apache.hadoop.io.Text) PairOfWritables(tl.lin.data.pair.PairOfWritables) FileSystem(org.apache.hadoop.fs.FileSystem) IntWritable(org.apache.hadoop.io.IntWritable) Test(org.junit.Test)

Example 3 with PairOfInts

use of tl.lin.data.pair.PairOfInts in project Cloud9 by lintool.

the class LookupPostings method lookupTerm.

public static void lookupTerm(String term, MapFile.Reader reader, String collectionPath, FileSystem fs) throws IOException {
    FSDataInputStream collection = fs.open(new Path(collectionPath));
    Text key = new Text();
    PairOfWritables<IntWritable, ArrayListWritable<PairOfInts>> value = new PairOfWritables<IntWritable, ArrayListWritable<PairOfInts>>();
    key.set(term);
    Writable w = reader.get(key, value);
    if (w == null) {
        System.out.println("\nThe term '" + term + "' does not appear in the collection");
        return;
    }
    ArrayListWritable<PairOfInts> postings = value.getRightElement();
    System.out.println("\nComplete postings list for '" + term + "':");
    System.out.println("df = " + value.getLeftElement());
    Int2IntFrequencyDistribution hist = new Int2IntFrequencyDistributionEntry();
    for (PairOfInts pair : postings) {
        hist.increment(pair.getRightElement());
        System.out.print(pair);
        collection.seek(pair.getLeftElement());
        BufferedReader r = new BufferedReader(new InputStreamReader(collection));
        String d = r.readLine();
        d = d.length() > 80 ? d.substring(0, 80) + "..." : d;
        System.out.println(": " + d);
    }
    System.out.println("\nHistogram of tf values for '" + term + "'");
    for (PairOfInts pair : hist) {
        System.out.println(pair.getLeftElement() + "\t" + pair.getRightElement());
    }
    collection.close();
}
Also used : Path(org.apache.hadoop.fs.Path) ArrayListWritable(tl.lin.data.array.ArrayListWritable) InputStreamReader(java.io.InputStreamReader) Int2IntFrequencyDistribution(tl.lin.data.fd.Int2IntFrequencyDistribution) PairOfInts(tl.lin.data.pair.PairOfInts) Writable(org.apache.hadoop.io.Writable) ArrayListWritable(tl.lin.data.array.ArrayListWritable) IntWritable(org.apache.hadoop.io.IntWritable) Text(org.apache.hadoop.io.Text) Int2IntFrequencyDistributionEntry(tl.lin.data.fd.Int2IntFrequencyDistributionEntry) PairOfWritables(tl.lin.data.pair.PairOfWritables) BufferedReader(java.io.BufferedReader) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream) IntWritable(org.apache.hadoop.io.IntWritable)

Aggregations

IntWritable (org.apache.hadoop.io.IntWritable)3 Text (org.apache.hadoop.io.Text)3 ArrayListWritable (tl.lin.data.array.ArrayListWritable)3 PairOfInts (tl.lin.data.pair.PairOfInts)3 PairOfWritables (tl.lin.data.pair.PairOfWritables)3 Path (org.apache.hadoop.fs.Path)2 BufferedReader (java.io.BufferedReader)1 InputStreamReader (java.io.InputStreamReader)1 Configuration (org.apache.hadoop.conf.Configuration)1 FSDataInputStream (org.apache.hadoop.fs.FSDataInputStream)1 FileSystem (org.apache.hadoop.fs.FileSystem)1 MapFile (org.apache.hadoop.io.MapFile)1 Writable (org.apache.hadoop.io.Writable)1 Test (org.junit.Test)1 Int2IntFrequencyDistribution (tl.lin.data.fd.Int2IntFrequencyDistribution)1 Int2IntFrequencyDistributionEntry (tl.lin.data.fd.Int2IntFrequencyDistributionEntry)1