Search in sources :

Example 1 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class DataflowGroupByKeyTest method testGroupByKeyServiceUnbounded.

@Test
public void testGroupByKeyServiceUnbounded() {
    Pipeline p = createTestServiceRunner();
    PCollection<KV<String, Integer>> input = p.apply(new PTransform<PBegin, PCollection<KV<String, Integer>>>() {

        @Override
        public PCollection<KV<String, Integer>> expand(PBegin input) {
            return PCollection.<KV<String, Integer>>createPrimitiveOutputInternal(input.getPipeline(), WindowingStrategy.globalDefault(), PCollection.IsBounded.UNBOUNDED).setTypeDescriptor(new TypeDescriptor<KV<String, Integer>>() {
            });
        }
    });
    thrown.expect(IllegalStateException.class);
    thrown.expectMessage("GroupByKey cannot be applied to non-bounded PCollection in the GlobalWindow without " + "a trigger. Use a Window.into or Window.triggering transform prior to GroupByKey.");
    input.apply("GroupByKey", GroupByKey.<String, Integer>create());
}
Also used : PCollection(org.apache.beam.sdk.values.PCollection) TypeDescriptor(org.apache.beam.sdk.values.TypeDescriptor) KV(org.apache.beam.sdk.values.KV) PBegin(org.apache.beam.sdk.values.PBegin) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Example 2 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class WordCount method main.

public static void main(String[] args) {
    WordCountOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(WordCountOptions.class);
    Pipeline p = Pipeline.create(options);
    // Concepts #2 and #3: Our pipeline applies the composite CountWords transform, and passes the
    // static FormatAsTextFn() to the ParDo transform.
    p.apply("ReadLines", TextIO.read().from(options.getInputFile())).apply(new CountWords()).apply(MapElements.via(new FormatAsTextFn())).apply("WriteCounts", TextIO.write().to(options.getOutput()));
    p.run().waitUntilFinish();
}
Also used : Pipeline(org.apache.beam.sdk.Pipeline)

Example 3 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class TestDataflowRunnerTest method testStreamingOnCreateMatcher.

@Test
public void testStreamingOnCreateMatcher() throws Exception {
    options.setStreaming(true);
    Pipeline p = TestPipeline.create(options);
    PCollection<Integer> pc = p.apply(Create.of(1, 2, 3));
    PAssert.that(pc).containsInAnyOrder(1, 2, 3);
    final DataflowPipelineJob mockJob = Mockito.mock(DataflowPipelineJob.class);
    when(mockJob.getState()).thenReturn(State.DONE);
    when(mockJob.getProjectId()).thenReturn("test-project");
    when(mockJob.getJobId()).thenReturn("test-job");
    DataflowRunner mockRunner = Mockito.mock(DataflowRunner.class);
    when(mockRunner.run(any(Pipeline.class))).thenReturn(mockJob);
    TestDataflowRunner runner = TestDataflowRunner.fromOptionsAndClient(options, mockClient);
    options.as(TestPipelineOptions.class).setOnCreateMatcher(new TestSuccessMatcher(mockJob, 0));
    when(mockJob.waitUntilFinish(any(Duration.class), any(JobMessagesHandler.class))).thenReturn(State.DONE);
    when(mockClient.getJobMetrics(anyString())).thenReturn(generateMockMetricResponse(true, /* success */
    true));
    runner.run(p, mockRunner);
}
Also used : JobMessagesHandler(org.apache.beam.runners.dataflow.util.MonitoringUtil.JobMessagesHandler) Duration(org.joda.time.Duration) TestPipelineOptions(org.apache.beam.sdk.testing.TestPipelineOptions) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Example 4 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class TestDataflowRunnerTest method testRunBatchJobThatFails.

/**
   * Tests that when a batch job terminates in a failure state even if all assertions
   * passed, it throws an error to that effect.
   */
@Test
public void testRunBatchJobThatFails() throws Exception {
    Pipeline p = TestPipeline.create(options);
    PCollection<Integer> pc = p.apply(Create.of(1, 2, 3));
    PAssert.that(pc).containsInAnyOrder(1, 2, 3);
    DataflowPipelineJob mockJob = Mockito.mock(DataflowPipelineJob.class);
    when(mockJob.getState()).thenReturn(State.FAILED);
    when(mockJob.getProjectId()).thenReturn("test-project");
    when(mockJob.getJobId()).thenReturn("test-job");
    DataflowRunner mockRunner = Mockito.mock(DataflowRunner.class);
    when(mockRunner.run(any(Pipeline.class))).thenReturn(mockJob);
    TestDataflowRunner runner = TestDataflowRunner.fromOptionsAndClient(options, mockClient);
    when(mockClient.getJobMetrics(anyString())).thenReturn(generateMockMetricResponse(true, /* success */
    false));
    expectedException.expect(RuntimeException.class);
    runner.run(p, mockRunner);
    // Note that fail throws an AssertionError which is why it is placed out here
    // instead of inside the try-catch block.
    fail("AssertionError expected");
}
Also used : TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Example 5 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class TestDataflowRunnerTest method testStreamingOnSuccessMatcherWhenPipelineSucceeds.

/**
   * Tests that when a streaming pipeline terminates and doesn't fail due to {@link PAssert} that
   * the {@link TestPipelineOptions#setOnSuccessMatcher(SerializableMatcher) on success matcher} is
   * invoked.
   */
@Test
public void testStreamingOnSuccessMatcherWhenPipelineSucceeds() throws Exception {
    options.setStreaming(true);
    Pipeline p = TestPipeline.create(options);
    PCollection<Integer> pc = p.apply(Create.of(1, 2, 3));
    PAssert.that(pc).containsInAnyOrder(1, 2, 3);
    final DataflowPipelineJob mockJob = Mockito.mock(DataflowPipelineJob.class);
    when(mockJob.getState()).thenReturn(State.DONE);
    when(mockJob.getProjectId()).thenReturn("test-project");
    when(mockJob.getJobId()).thenReturn("test-job");
    DataflowRunner mockRunner = Mockito.mock(DataflowRunner.class);
    when(mockRunner.run(any(Pipeline.class))).thenReturn(mockJob);
    TestDataflowRunner runner = TestDataflowRunner.fromOptionsAndClient(options, mockClient);
    options.as(TestPipelineOptions.class).setOnSuccessMatcher(new TestSuccessMatcher(mockJob, 1));
    when(mockJob.waitUntilFinish(any(Duration.class), any(JobMessagesHandler.class))).thenReturn(State.DONE);
    when(mockClient.getJobMetrics(anyString())).thenReturn(generateMockMetricResponse(true, /* success */
    true));
    runner.run(p, mockRunner);
}
Also used : JobMessagesHandler(org.apache.beam.runners.dataflow.util.MonitoringUtil.JobMessagesHandler) Duration(org.joda.time.Duration) TestPipelineOptions(org.apache.beam.sdk.testing.TestPipelineOptions) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Aggregations

Pipeline (org.apache.beam.sdk.Pipeline)184 Test (org.junit.Test)123 TestPipeline (org.apache.beam.sdk.testing.TestPipeline)86 DataflowPipelineOptions (org.apache.beam.runners.dataflow.options.DataflowPipelineOptions)39 KV (org.apache.beam.sdk.values.KV)35 Job (com.google.api.services.dataflow.model.Job)26 DoFn (org.apache.beam.sdk.transforms.DoFn)24 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)22 DataflowPackage (com.google.api.services.dataflow.model.DataflowPackage)21 TableRow (com.google.api.services.bigquery.model.TableRow)16 PipelineResult (org.apache.beam.sdk.PipelineResult)14 Structs.getString (org.apache.beam.runners.dataflow.util.Structs.getString)13 TableSchema (com.google.api.services.bigquery.model.TableSchema)12 ApexPipelineOptions (org.apache.beam.runners.apex.ApexPipelineOptions)12 Map (java.util.Map)11 TableFieldSchema (com.google.api.services.bigquery.model.TableFieldSchema)10 ArrayList (java.util.ArrayList)10 Instant (org.joda.time.Instant)10 TableReference (com.google.api.services.bigquery.model.TableReference)9 JsonSchemaToTableSchema (org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.JsonSchemaToTableSchema)9