Examples with Pipeline - org.apache.beam.sdk.Pipeline

Example 16 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class DataflowRunnerTest method testUpdateNonExistentPipeline.

@Test
public void testUpdateNonExistentPipeline() throws IOException {
    thrown.expect(IllegalArgumentException.class);
    thrown.expectMessage("Could not find running job named badjobname");
    DataflowPipelineOptions options = buildPipelineOptions();
    options.setUpdate(true);
    options.setJobName("badJobName");
    Pipeline p = buildDataflowPipeline(options);
    p.run();
}

Also used : DataflowPipelineOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineOptions) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Example 17 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class DataflowRunnerTest method testGcsUploadBufferSizeIsSetForStreamingWhenDefault.

@Test
public void testGcsUploadBufferSizeIsSetForStreamingWhenDefault() throws IOException {
    DataflowPipelineOptions streamingOptions = buildPipelineOptions();
    streamingOptions.setStreaming(true);
    streamingOptions.setRunner(DataflowRunner.class);
    Pipeline p = Pipeline.create(streamingOptions);
    // Instantiation of a runner prior to run() currently has a side effect of mutating the options.
    // This could be tested by DataflowRunner.fromOptions(streamingOptions) but would not ensure
    // that the pipeline itself had the expected options set.
    p.run();
    assertEquals(DataflowRunner.GCS_UPLOAD_BUFFER_SIZE_BYTES_DEFAULT, streamingOptions.getGcsUploadBufferSizeBytes().intValue());
}

Example 18 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class DataflowPipelineTranslatorTest method testWorkerMachineTypeConfig.

@Test
public void testWorkerMachineTypeConfig() throws IOException {
    final String testMachineType = "test-machine-type";
    DataflowPipelineOptions options = buildPipelineOptions();
    options.setWorkerMachineType(testMachineType);
    Pipeline p = buildPipeline(options);
    p.traverseTopologically(new RecordingPipelineVisitor());
    Job job = DataflowPipelineTranslator.fromOptions(options).translate(p, DataflowRunner.fromOptions(options), Collections.<DataflowPackage>emptyList()).getJob();
    assertEquals(1, job.getEnvironment().getWorkerPools().size());
    WorkerPool workerPool = job.getEnvironment().getWorkerPools().get(0);
    assertEquals(testMachineType, workerPool.getMachineType());
}

Also used : WorkerPool(com.google.api.services.dataflow.model.WorkerPool) DataflowPipelineOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineOptions) Structs.getString(org.apache.beam.runners.dataflow.util.Structs.getString) Job(com.google.api.services.dataflow.model.Job) DataflowPackage(com.google.api.services.dataflow.model.DataflowPackage) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Example 19 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class DataflowPipelineTranslatorTest method testDiskSizeGbConfig.

@Test
public void testDiskSizeGbConfig() throws IOException {
    final Integer diskSizeGb = 1234;
    DataflowPipelineOptions options = buildPipelineOptions();
    options.setDiskSizeGb(diskSizeGb);
    Pipeline p = buildPipeline(options);
    p.traverseTopologically(new RecordingPipelineVisitor());
    Job job = DataflowPipelineTranslator.fromOptions(options).translate(p, DataflowRunner.fromOptions(options), Collections.<DataflowPackage>emptyList()).getJob();
    assertEquals(1, job.getEnvironment().getWorkerPools().size());
    assertEquals(diskSizeGb, job.getEnvironment().getWorkerPools().get(0).getDiskSizeGb());
}

Also used : DataflowPipelineOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineOptions) Job(com.google.api.services.dataflow.model.Job) DataflowPackage(com.google.api.services.dataflow.model.DataflowPackage) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Example 20 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class CrashingRunnerTest method applySucceeds.

@Test
public void applySucceeds() {
    PipelineOptions opts = PipelineOptionsFactory.create();
    opts.setRunner(CrashingRunner.class);
    Pipeline p = Pipeline.create(opts);
    p.apply(Create.of(1, 2, 3));
}

Also used : PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Aggregations

Pipeline (org.apache.beam.sdk.Pipeline)184 Test (org.junit.Test)123 TestPipeline (org.apache.beam.sdk.testing.TestPipeline)86 DataflowPipelineOptions (org.apache.beam.runners.dataflow.options.DataflowPipelineOptions)39 KV (org.apache.beam.sdk.values.KV)35 Job (com.google.api.services.dataflow.model.Job)26 DoFn (org.apache.beam.sdk.transforms.DoFn)24 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)22 DataflowPackage (com.google.api.services.dataflow.model.DataflowPackage)21 TableRow (com.google.api.services.bigquery.model.TableRow)16 PipelineResult (org.apache.beam.sdk.PipelineResult)14 Structs.getString (org.apache.beam.runners.dataflow.util.Structs.getString)13 TableSchema (com.google.api.services.bigquery.model.TableSchema)12 ApexPipelineOptions (org.apache.beam.runners.apex.ApexPipelineOptions)12 Map (java.util.Map)11 TableFieldSchema (com.google.api.services.bigquery.model.TableFieldSchema)10 ArrayList (java.util.ArrayList)10 Instant (org.joda.time.Instant)10 TableReference (com.google.api.services.bigquery.model.TableReference)9 JsonSchemaToTableSchema (org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.JsonSchemaToTableSchema)9