Search in sources :

Example 21 with MRPipeline

use of org.apache.crunch.impl.mr.MRPipeline in project crunch by cloudera.

the class AggregateTest method testCollectUrls.

@Test
public void testCollectUrls() throws Exception {
    Pipeline p = new MRPipeline(AggregateTest.class);
    String urlsInputPath = FileHelper.createTempCopyOf("urls.txt");
    PTable<String, Collection<String>> urls = Aggregate.collectValues(p.readTextFile(urlsInputPath).parallelDo(new SplitFn(), tableOf(strings(), strings())));
    for (Pair<String, Collection<String>> e : urls.materialize()) {
        String key = e.first();
        int expectedSize = 0;
        if ("www.A.com".equals(key)) {
            expectedSize = 4;
        } else if ("www.B.com".equals(key) || "www.F.com".equals(key)) {
            expectedSize = 2;
        } else if ("www.C.com".equals(key) || "www.D.com".equals(key) || "www.E.com".equals(key)) {
            expectedSize = 1;
        }
        assertEquals("Checking key = " + key, expectedSize, e.second().size());
        p.done();
    }
}
Also used : MRPipeline(org.apache.crunch.impl.mr.MRPipeline) PCollection(org.apache.crunch.PCollection) Collection(java.util.Collection) MemPipeline(org.apache.crunch.impl.mem.MemPipeline) Pipeline(org.apache.crunch.Pipeline) MRPipeline(org.apache.crunch.impl.mr.MRPipeline) Test(org.junit.Test)

Example 22 with MRPipeline

use of org.apache.crunch.impl.mr.MRPipeline in project crunch by cloudera.

the class AggregateTest method testCollectValues_Writables.

@Test
public void testCollectValues_Writables() throws IOException {
    Pipeline pipeline = new MRPipeline(AggregateTest.class);
    Map<Integer, Collection<Text>> collectionMap = pipeline.readTextFile(FileHelper.createTempCopyOf("set2.txt")).parallelDo(new MapStringToTextPair(), Writables.tableOf(Writables.ints(), Writables.writables(Text.class))).collectValues().materializeToMap();
    assertEquals(1, collectionMap.size());
    assertEquals(Lists.newArrayList(new Text("c"), new Text("d"), new Text("a")), collectionMap.get(1));
}
Also used : MRPipeline(org.apache.crunch.impl.mr.MRPipeline) PCollection(org.apache.crunch.PCollection) Collection(java.util.Collection) Text(org.apache.hadoop.io.Text) MemPipeline(org.apache.crunch.impl.mem.MemPipeline) Pipeline(org.apache.crunch.Pipeline) MRPipeline(org.apache.crunch.impl.mr.MRPipeline) Test(org.junit.Test)

Example 23 with MRPipeline

use of org.apache.crunch.impl.mr.MRPipeline in project crunch by cloudera.

the class AvroTypeSortTest method testSortAvroTypesBySelectedFields.

@Test
public void testSortAvroTypesBySelectedFields() throws Exception {
    MRPipeline pipeline = new MRPipeline(AvroTypeSortTest.class);
    Person ccc10 = createPerson("CCC", 10);
    Person bbb20 = createPerson("BBB", 20);
    Person aaa30 = createPerson("AAA", 30);
    writeAvroFile(Lists.newArrayList(ccc10, bbb20, aaa30), avroFile);
    PCollection<Person> unsorted = pipeline.read(At.avroFile(avroFile.getAbsolutePath(), records(Person.class)));
    // Sort by Name
    MapFn<Person, String> nameExtractor = new MapFn<Person, String>() {

        @Override
        public String map(Person input) {
            return input.getName().toString();
        }
    };
    PCollection<Person> sortedByName = unsorted.by(nameExtractor, strings()).groupByKey().ungroup().values();
    List<Person> sortedByNameList = Lists.newArrayList(sortedByName.materialize());
    assertEquals(3, sortedByNameList.size());
    assertEquals(aaa30, sortedByNameList.get(0));
    assertEquals(bbb20, sortedByNameList.get(1));
    assertEquals(ccc10, sortedByNameList.get(2));
    // Sort by Age
    MapFn<Person, Integer> ageExtractor = new MapFn<Person, Integer>() {

        @Override
        public Integer map(Person input) {
            return input.getAge();
        }
    };
    PCollection<Person> sortedByAge = unsorted.by(ageExtractor, ints()).groupByKey().ungroup().values();
    List<Person> sortedByAgeList = Lists.newArrayList(sortedByAge.materialize());
    assertEquals(3, sortedByAgeList.size());
    assertEquals(ccc10, sortedByAgeList.get(0));
    assertEquals(bbb20, sortedByAgeList.get(1));
    assertEquals(aaa30, sortedByAgeList.get(2));
    pipeline.done();
}
Also used : MRPipeline(org.apache.crunch.impl.mr.MRPipeline) Person(org.apache.crunch.test.Person) MapFn(org.apache.crunch.MapFn) Test(org.junit.Test)

Example 24 with MRPipeline

use of org.apache.crunch.impl.mr.MRPipeline in project crunch by cloudera.

the class SetTest method setUp.

@Before
public void setUp() throws IOException {
    String set1InputPath = FileHelper.createTempCopyOf("set1.txt");
    String set2InputPath = FileHelper.createTempCopyOf("set2.txt");
    pipeline = new MRPipeline(SetTest.class);
    set1 = pipeline.read(At.textFile(set1InputPath, typeFamily.strings()));
    set2 = pipeline.read(At.textFile(set2InputPath, typeFamily.strings()));
}
Also used : MRPipeline(org.apache.crunch.impl.mr.MRPipeline) Before(org.junit.Before)

Example 25 with MRPipeline

use of org.apache.crunch.impl.mr.MRPipeline in project crunch by cloudera.

the class AvroFileSourceTargetTest method testReflect.

@Test
public void testReflect() throws IOException {
    Schema pojoPersonSchema = ReflectData.get().getSchema(PojoPerson.class);
    GenericRecord savedRecord = new GenericData.Record(pojoPersonSchema);
    savedRecord.put("name", "John Doe");
    populateGenericFile(Lists.newArrayList(savedRecord), pojoPersonSchema);
    Pipeline pipeline = new MRPipeline(AvroFileSourceTargetTest.class);
    PCollection<PojoPerson> personCollection = pipeline.read(At.avroFile(avroFile.getAbsolutePath(), Avros.reflects(PojoPerson.class)));
    List<PojoPerson> recordList = Lists.newArrayList(personCollection.materialize());
    assertEquals(1, recordList.size());
    PojoPerson person = recordList.get(0);
    assertEquals("John Doe", person.getName());
}
Also used : PojoPerson(org.apache.crunch.io.avro.AvroFileReaderFactoryTest.PojoPerson) Schema(org.apache.avro.Schema) MRPipeline(org.apache.crunch.impl.mr.MRPipeline) GenericRecord(org.apache.avro.generic.GenericRecord) Record(org.apache.avro.generic.GenericData.Record) GenericRecord(org.apache.avro.generic.GenericRecord) Pipeline(org.apache.crunch.Pipeline) MRPipeline(org.apache.crunch.impl.mr.MRPipeline) Test(org.junit.Test)

Aggregations

MRPipeline (org.apache.crunch.impl.mr.MRPipeline)34 Test (org.junit.Test)26 Pipeline (org.apache.crunch.Pipeline)13 PTypeFamily (org.apache.crunch.types.PTypeFamily)7 MemPipeline (org.apache.crunch.impl.mem.MemPipeline)6 Pair (org.apache.crunch.Pair)4 Collection (java.util.Collection)3 Record (org.apache.avro.generic.GenericData.Record)3 GenericRecord (org.apache.avro.generic.GenericRecord)3 PCollection (org.apache.crunch.PCollection)3 Person (org.apache.crunch.test.Person)3 Schema (org.apache.avro.Schema)2 PojoPerson (org.apache.crunch.io.avro.AvroFileReaderFactoryTest.PojoPerson)2 Employee (org.apache.crunch.test.Employee)2 Before (org.junit.Before)2 ImmutableMap (com.google.common.collect.ImmutableMap)1 Map (java.util.Map)1 MapFn (org.apache.crunch.MapFn)1 CrunchRuntimeException (org.apache.crunch.impl.mr.run.CrunchRuntimeException)1 SourcePathTargetImpl (org.apache.crunch.io.impl.SourcePathTargetImpl)1