Search in sources :

Example 26 with PTypeFamily

use of org.apache.crunch.types.PTypeFamily in project crunch by cloudera.

the class PageRankTest method pageRank.

public static PTable<String, PageRankData> pageRank(PTable<String, PageRankData> input, final float d) {
    PTypeFamily ptf = input.getTypeFamily();
    PTable<String, Float> outbound = input.parallelDo(new DoFn<Pair<String, PageRankData>, Pair<String, Float>>() {

        @Override
        public void process(Pair<String, PageRankData> input, Emitter<Pair<String, Float>> emitter) {
            PageRankData prd = input.second();
            for (String link : prd.urls) {
                emitter.emit(Pair.of(link, prd.propagatedScore()));
            }
        }
    }, ptf.tableOf(ptf.strings(), ptf.floats()));
    return input.cogroup(outbound).parallelDo(new MapFn<Pair<String, Pair<Collection<PageRankData>, Collection<Float>>>, Pair<String, PageRankData>>() {

        @Override
        public Pair<String, PageRankData> map(Pair<String, Pair<Collection<PageRankData>, Collection<Float>>> input) {
            PageRankData prd = Iterables.getOnlyElement(input.second().first());
            Collection<Float> propagatedScores = input.second().second();
            float sum = 0.0f;
            for (Float s : propagatedScores) {
                sum += s;
            }
            return Pair.of(input.first(), prd.next(d + (1.0f - d) * sum));
        }
    }, input.getPTableType());
}
Also used : PTypeFamily(org.apache.crunch.types.PTypeFamily) Collection(java.util.Collection)

Aggregations

PTypeFamily (org.apache.crunch.types.PTypeFamily)26 Pair (org.apache.crunch.Pair)15 GroupingOptions (org.apache.crunch.GroupingOptions)7 MRPipeline (org.apache.crunch.impl.mr.MRPipeline)7 Test (org.junit.Test)7 Configuration (org.apache.hadoop.conf.Configuration)6 Collection (java.util.Collection)5 PCollection (org.apache.crunch.PCollection)4 CombineFn (org.apache.crunch.CombineFn)2 DoFn (org.apache.crunch.DoFn)2 Emitter (org.apache.crunch.Emitter)2 Tuple3 (org.apache.crunch.Tuple3)2 File (java.io.File)1 List (java.util.List)1 Tuple4 (org.apache.crunch.Tuple4)1 TupleN (org.apache.crunch.TupleN)1 MemPipeline (org.apache.crunch.impl.mem.MemPipeline)1 CrunchRuntimeException (org.apache.crunch.impl.mr.run.CrunchRuntimeException)1 ReadableSourceTarget (org.apache.crunch.io.ReadableSourceTarget)1 SourcePathTargetImpl (org.apache.crunch.io.impl.SourcePathTargetImpl)1