Search in sources :

Example 21 with Pipeline

use of com.google.cloud.dataflow.sdk.Pipeline in project spark-dataflow by cloudera.

the class CombinePerKeyTest method testRun.

@Test
public void testRun() {
    Pipeline p = Pipeline.create(PipelineOptionsFactory.create());
    PCollection<String> inputWords = p.apply(Create.of(WORDS)).setCoder(StringUtf8Coder.of());
    PCollection<KV<String, Long>> cnts = inputWords.apply(new SumPerKey<String>());
    EvaluationResult res = SparkPipelineRunner.create().run(p);
    Map<String, Long> actualCnts = new HashMap<>();
    for (KV<String, Long> kv : res.get(cnts)) {
        actualCnts.put(kv.getKey(), kv.getValue());
    }
    res.close();
    Assert.assertEquals(8, actualCnts.size());
    Assert.assertEquals(Long.valueOf(2L), actualCnts.get("the"));
}
Also used : HashMap(java.util.HashMap) KV(com.google.cloud.dataflow.sdk.values.KV) Pipeline(com.google.cloud.dataflow.sdk.Pipeline) Test(org.junit.Test)

Example 22 with Pipeline

use of com.google.cloud.dataflow.sdk.Pipeline in project spark-dataflow by cloudera.

the class DeDupTest method testRun.

@Test
public void testRun() throws Exception {
    SparkPipelineOptions options = SparkPipelineOptionsFactory.create();
    options.setRunner(SparkPipelineRunner.class);
    Pipeline p = Pipeline.create(options);
    PCollection<String> input = p.apply(Create.of(LINES)).setCoder(StringUtf8Coder.of());
    PCollection<String> output = input.apply(RemoveDuplicates.<String>create());
    DataflowAssert.that(output).containsInAnyOrder(EXPECTED_SET);
    EvaluationResult res = SparkPipelineRunner.create().run(p);
    res.close();
}
Also used : Pipeline(com.google.cloud.dataflow.sdk.Pipeline) Test(org.junit.Test)

Example 23 with Pipeline

use of com.google.cloud.dataflow.sdk.Pipeline in project spark-dataflow by cloudera.

the class SerializationTest method testRun.

@Test
public void testRun() throws Exception {
    SparkPipelineOptions options = SparkPipelineOptionsFactory.create();
    options.setRunner(SparkPipelineRunner.class);
    Pipeline p = Pipeline.create(options);
    PCollection<StringHolder> inputWords = p.apply(Create.of(WORDS).withCoder(StringHolderUtf8Coder.of()));
    PCollection<StringHolder> output = inputWords.apply(new CountWords());
    DataflowAssert.that(output).containsInAnyOrder(EXPECTED_COUNT_SET);
    EvaluationResult res = SparkPipelineRunner.create().run(p);
    res.close();
}
Also used : Pipeline(com.google.cloud.dataflow.sdk.Pipeline) Test(org.junit.Test)

Aggregations

Pipeline (com.google.cloud.dataflow.sdk.Pipeline)23 Test (org.junit.Test)17 TestPipeline (com.google.cloud.dataflow.sdk.testing.TestPipeline)5 KV (com.google.cloud.dataflow.sdk.values.KV)5 ReferenceAPISource (org.broadinstitute.hellbender.engine.datasources.ReferenceAPISource)5 EvaluationResult (com.cloudera.dataflow.spark.EvaluationResult)3 BaseTest (org.broadinstitute.hellbender.utils.test.BaseTest)3 Test (org.testng.annotations.Test)3 DoFn (com.google.cloud.dataflow.sdk.transforms.DoFn)2 URI (java.net.URI)2 SimpleWordCountTest (com.cloudera.dataflow.spark.SimpleWordCountTest)1 TfIdf (com.google.cloud.dataflow.examples.complete.TfIdf)1 PCollectionTuple (com.google.cloud.dataflow.sdk.values.PCollectionTuple)1 GenomicsOptions (com.google.cloud.genomics.dataflow.utils.GenomicsOptions)1 File (java.io.File)1 FileFilter (java.io.FileFilter)1 HashMap (java.util.HashMap)1 StringDecoder (kafka.serializer.StringDecoder)1 Schema (org.apache.avro.Schema)1 GenericRecord (org.apache.avro.generic.GenericRecord)1