Search in sources :

Example 1 with Tuple3

use of org.apache.crunch.Tuple3 in project crunch by cloudera.

the class AvrosTest method testTriples.

@Test
@SuppressWarnings("rawtypes")
public void testTriples() throws Exception {
    AvroType at = Avros.triples(Avros.strings(), Avros.strings(), Avros.strings());
    Tuple3 j = Tuple3.of("a", "b", "c");
    GenericData.Record w = new GenericData.Record(at.getSchema());
    w.put(0, new Utf8("a"));
    w.put(1, new Utf8("b"));
    w.put(2, new Utf8("c"));
    testInputOutputFn(at, j, w);
}
Also used : Tuple3(org.apache.crunch.Tuple3) Utf8(org.apache.avro.util.Utf8) GenericData(org.apache.avro.generic.GenericData) Test(org.junit.Test)

Example 2 with Tuple3

use of org.apache.crunch.Tuple3 in project crunch by cloudera.

the class WritablesTest method testTriples.

@Test
@SuppressWarnings("rawtypes")
public void testTriples() throws Exception {
    Tuple3 j = Tuple3.of("a", "b", "c");
    TupleWritable w = new TupleWritable(new Text[] { new Text("a"), new Text("b"), new Text("c") });
    w.setWritten(0);
    w.setWritten(1);
    w.setWritten(2);
    WritableType<?, ?> wt = Writables.triples(Writables.strings(), Writables.strings(), Writables.strings());
    testInputOutputFn(wt, j, w);
}
Also used : Tuple3(org.apache.crunch.Tuple3) Text(org.apache.hadoop.io.Text) Test(org.junit.Test)

Example 3 with Tuple3

use of org.apache.crunch.Tuple3 in project crunch by cloudera.

the class Set method comm.

/**
 * Find the elements that are common to two sets, like the Unix <code>comm</code>
 * utility. This method returns a {@link PCollection} of {@link Tuple3} objects,
 * and the position in the tuple that an element appears is determined by
 * the collections that it is a member of, as follows:
 * <ol>
 * <li>elements only in <code>coll1</code>,</li>
 * <li>elements only in <code>coll2</code>, or</li>
 * <li>elements in both collections</li>
 * </ol>
 * Tuples are otherwise filled with <code>null</code>.
 *
 * @return a collection of {@link Tuple3} objects
 */
public static <T> PCollection<Tuple3<T, T, T>> comm(PCollection<T> coll1, PCollection<T> coll2) {
    PTypeFamily typeFamily = coll1.getTypeFamily();
    PType<T> type = coll1.getPType();
    return Cogroup.cogroup(toTable(coll1), toTable(coll2)).parallelDo(new DoFn<Pair<T, Pair<Collection<Boolean>, Collection<Boolean>>>, Tuple3<T, T, T>>() {

        @Override
        public void process(Pair<T, Pair<Collection<Boolean>, Collection<Boolean>>> input, Emitter<Tuple3<T, T, T>> emitter) {
            Pair<Collection<Boolean>, Collection<Boolean>> groups = input.second();
            boolean inFirst = !groups.first().isEmpty();
            boolean inSecond = !groups.second().isEmpty();
            T t = input.first();
            emitter.emit(Tuple3.of(inFirst && !inSecond ? t : null, !inFirst && inSecond ? t : null, inFirst && inSecond ? t : null));
        }
    }, typeFamily.triples(type, type, type));
}
Also used : PTypeFamily(org.apache.crunch.types.PTypeFamily) Tuple3(org.apache.crunch.Tuple3) Collection(java.util.Collection) PCollection(org.apache.crunch.PCollection) Pair(org.apache.crunch.Pair)

Example 4 with Tuple3

use of org.apache.crunch.Tuple3 in project crunch by cloudera.

the class Sort method sortTriples.

/**
 * Sorts the {@link PCollection} of {@link Tuple3}s using the specified column
 * ordering.
 *
 * @return a {@link PCollection} representing the sorted collection.
 */
public static <V1, V2, V3> PCollection<Tuple3<V1, V2, V3>> sortTriples(PCollection<Tuple3<V1, V2, V3>> collection, ColumnOrder... columnOrders) {
    PTypeFamily tf = collection.getTypeFamily();
    PType<Tuple3<V1, V2, V3>> pType = collection.getPType();
    @SuppressWarnings("unchecked") PTableType<Tuple3<V1, V2, V3>, Void> type = tf.tableOf(tf.triples(pType.getSubTypes().get(0), pType.getSubTypes().get(1), pType.getSubTypes().get(2)), tf.nulls());
    PTable<Tuple3<V1, V2, V3>, Void> pt = collection.parallelDo(new DoFn<Tuple3<V1, V2, V3>, Pair<Tuple3<V1, V2, V3>, Void>>() {

        @Override
        public void process(Tuple3<V1, V2, V3> input, Emitter<Pair<Tuple3<V1, V2, V3>, Void>> emitter) {
            emitter.emit(Pair.of(input, (Void) null));
        }
    }, type);
    Configuration conf = collection.getPipeline().getConfiguration();
    GroupingOptions options = buildGroupingOptions(conf, tf, pType, columnOrders);
    PTable<Tuple3<V1, V2, V3>, Void> sortedPt = pt.groupByKey(options).ungroup();
    return sortedPt.parallelDo(new DoFn<Pair<Tuple3<V1, V2, V3>, Void>, Tuple3<V1, V2, V3>>() {

        @Override
        public void process(Pair<Tuple3<V1, V2, V3>, Void> input, Emitter<Tuple3<V1, V2, V3>> emitter) {
            emitter.emit(input.first());
        }
    }, collection.getPType());
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) PTypeFamily(org.apache.crunch.types.PTypeFamily) Tuple3(org.apache.crunch.Tuple3) GroupingOptions(org.apache.crunch.GroupingOptions) Pair(org.apache.crunch.Pair)

Aggregations

Tuple3 (org.apache.crunch.Tuple3)4 Pair (org.apache.crunch.Pair)2 PTypeFamily (org.apache.crunch.types.PTypeFamily)2 Test (org.junit.Test)2 Collection (java.util.Collection)1 GenericData (org.apache.avro.generic.GenericData)1 Utf8 (org.apache.avro.util.Utf8)1 GroupingOptions (org.apache.crunch.GroupingOptions)1 PCollection (org.apache.crunch.PCollection)1 Configuration (org.apache.hadoop.conf.Configuration)1 Text (org.apache.hadoop.io.Text)1