Search in sources :

Example 6 with MRPipeline

use of org.apache.crunch.impl.mr.MRPipeline in project crunch by cloudera.

the class PCollectionGetSizeTest method testGetSizeOfEmptyIntermediatePCollection_NoSave_MRPipeline.

@Test
@Ignore("GetSize of a DoCollection is only an estimate based on scale factor, so we can't count on it being reported as 0")
public void testGetSizeOfEmptyIntermediatePCollection_NoSave_MRPipeline() throws IOException {
    PCollection<String> data = new MRPipeline(this.getClass()).readTextFile(nonEmptyInputPath);
    PCollection<String> emptyPCollection = data.filter(new FalseFilterFn());
    assertThat(emptyPCollection.getSize(), is(0L));
}
Also used : MRPipeline(org.apache.crunch.impl.mr.MRPipeline) Ignore(org.junit.Ignore) Test(org.junit.Test)

Example 7 with MRPipeline

use of org.apache.crunch.impl.mr.MRPipeline in project crunch by cloudera.

the class PCollectionGetSizeTest method testMaterializeOfEmptyIntermediatePCollection_MRPipeline.

@Test
public void testMaterializeOfEmptyIntermediatePCollection_MRPipeline() throws IOException {
    PCollection<String> emptyIntermediate = createPesistentEmptyIntermediate(new MRPipeline(this.getClass()));
    assertThat(newArrayList(emptyIntermediate.materialize()).size(), is(0));
}
Also used : MRPipeline(org.apache.crunch.impl.mr.MRPipeline) Test(org.junit.Test)

Example 8 with MRPipeline

use of org.apache.crunch.impl.mr.MRPipeline in project crunch by cloudera.

the class PTableKeyValueTest method setUp.

@Before
public void setUp() throws IOException {
    pipeline = new MRPipeline(PTableKeyValueTest.class);
    inputFile = FileHelper.createTempCopyOf("set1.txt");
}
Also used : MRPipeline(org.apache.crunch.impl.mr.MRPipeline) Before(org.junit.Before)

Example 9 with MRPipeline

use of org.apache.crunch.impl.mr.MRPipeline in project crunch by cloudera.

the class WordCountTest method runWithTop.

public static void runWithTop(PTypeFamily tf) throws IOException {
    Pipeline pipeline = new MRPipeline(WordCountTest.class);
    String inputPath = FileHelper.createTempCopyOf("shakes.txt");
    PCollection<String> shakespeare = pipeline.read(At.textFile(inputPath, tf.strings()));
    PTable<String, Long> wordCount = wordCount(shakespeare, tf);
    List<Pair<String, Long>> top5 = Lists.newArrayList(Aggregate.top(wordCount, 5, true).materialize());
    assertEquals(ImmutableList.of(Pair.of("", 1470L), Pair.of("the", 620L), Pair.of("and", 427L), Pair.of("of", 396L), Pair.of("to", 367L)), top5);
}
Also used : MRPipeline(org.apache.crunch.impl.mr.MRPipeline) MRPipeline(org.apache.crunch.impl.mr.MRPipeline)

Example 10 with MRPipeline

use of org.apache.crunch.impl.mr.MRPipeline in project crunch by cloudera.

the class WordCountTest method testWritablesWithSecond.

@Test
public void testWritablesWithSecond() throws IOException {
    runSecond = true;
    run(new MRPipeline(WordCountTest.class), WritableTypeFamily.getInstance());
}
Also used : MRPipeline(org.apache.crunch.impl.mr.MRPipeline) Test(org.junit.Test)

Aggregations

MRPipeline (org.apache.crunch.impl.mr.MRPipeline)34 Test (org.junit.Test)26 Pipeline (org.apache.crunch.Pipeline)13 PTypeFamily (org.apache.crunch.types.PTypeFamily)7 MemPipeline (org.apache.crunch.impl.mem.MemPipeline)6 Pair (org.apache.crunch.Pair)4 Collection (java.util.Collection)3 Record (org.apache.avro.generic.GenericData.Record)3 GenericRecord (org.apache.avro.generic.GenericRecord)3 PCollection (org.apache.crunch.PCollection)3 Person (org.apache.crunch.test.Person)3 Schema (org.apache.avro.Schema)2 PojoPerson (org.apache.crunch.io.avro.AvroFileReaderFactoryTest.PojoPerson)2 Employee (org.apache.crunch.test.Employee)2 Before (org.junit.Before)2 ImmutableMap (com.google.common.collect.ImmutableMap)1 Map (java.util.Map)1 MapFn (org.apache.crunch.MapFn)1 CrunchRuntimeException (org.apache.crunch.impl.mr.run.CrunchRuntimeException)1 SourcePathTargetImpl (org.apache.crunch.io.impl.SourcePathTargetImpl)1