Examples with Pipeline - org.apache.beam.sdk.Pipeline

Example 26 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class DataflowRunnerTest method testTemplateRunnerFullCompletion.

/**
   * Tests that the {@link DataflowRunner} with {@code --templateLocation} returns normally
   * when the runner issuccessfully run.
   */
@Test
public void testTemplateRunnerFullCompletion() throws Exception {
    File existingFile = tmpFolder.newFile();
    DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
    options.setJobName("TestJobName");
    options.setGcpCredential(new TestCredential());
    options.setPathValidatorClass(NoopPathValidator.class);
    options.setProject("test-project");
    options.setRunner(DataflowRunner.class);
    options.setTemplateLocation(existingFile.getPath());
    options.setTempLocation(tmpFolder.getRoot().getPath());
    Pipeline p = Pipeline.create(options);
    p.run();
    expectedLogs.verifyInfo("Template successfully created");
}

Also used : TestCredential(org.apache.beam.sdk.extensions.gcp.auth.TestCredential) DataflowPipelineOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineOptions) File(java.io.File) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Example 27 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class DataflowRunnerTest method testUpdateNonExistentPipeline.

@Test
public void testUpdateNonExistentPipeline() throws IOException {
    thrown.expect(IllegalArgumentException.class);
    thrown.expectMessage("Could not find running job named badjobname");
    DataflowPipelineOptions options = buildPipelineOptions();
    options.setUpdate(true);
    options.setJobName("badJobName");
    Pipeline p = buildDataflowPipeline(options);
    p.run();
}

Also used : DataflowPipelineOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineOptions) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Example 28 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class DataflowRunnerTest method testGcsUploadBufferSizeIsSetForStreamingWhenDefault.

@Test
public void testGcsUploadBufferSizeIsSetForStreamingWhenDefault() throws IOException {
    DataflowPipelineOptions streamingOptions = buildPipelineOptions();
    streamingOptions.setStreaming(true);
    streamingOptions.setRunner(DataflowRunner.class);
    Pipeline p = Pipeline.create(streamingOptions);
    // Instantiation of a runner prior to run() currently has a side effect of mutating the options.
    // This could be tested by DataflowRunner.fromOptions(streamingOptions) but would not ensure
    // that the pipeline itself had the expected options set.
    p.run();
    assertEquals(DataflowRunner.GCS_UPLOAD_BUFFER_SIZE_BYTES_DEFAULT, streamingOptions.getGcsUploadBufferSizeBytes().intValue());
}

Example 29 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class DataflowPipelineTranslatorTest method testWorkerMachineTypeConfig.

@Test
public void testWorkerMachineTypeConfig() throws IOException {
    final String testMachineType = "test-machine-type";
    DataflowPipelineOptions options = buildPipelineOptions();
    options.setWorkerMachineType(testMachineType);
    Pipeline p = buildPipeline(options);
    p.traverseTopologically(new RecordingPipelineVisitor());
    Job job = DataflowPipelineTranslator.fromOptions(options).translate(p, DataflowRunner.fromOptions(options), Collections.<DataflowPackage>emptyList()).getJob();
    assertEquals(1, job.getEnvironment().getWorkerPools().size());
    WorkerPool workerPool = job.getEnvironment().getWorkerPools().get(0);
    assertEquals(testMachineType, workerPool.getMachineType());
}

Also used : WorkerPool(com.google.api.services.dataflow.model.WorkerPool) DataflowPipelineOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineOptions) Structs.getString(org.apache.beam.runners.dataflow.util.Structs.getString) Job(com.google.api.services.dataflow.model.Job) DataflowPackage(com.google.api.services.dataflow.model.DataflowPackage) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Example 30 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class DataflowPipelineTranslatorTest method testDiskSizeGbConfig.

@Test
public void testDiskSizeGbConfig() throws IOException {
    final Integer diskSizeGb = 1234;
    DataflowPipelineOptions options = buildPipelineOptions();
    options.setDiskSizeGb(diskSizeGb);
    Pipeline p = buildPipeline(options);
    p.traverseTopologically(new RecordingPipelineVisitor());
    Job job = DataflowPipelineTranslator.fromOptions(options).translate(p, DataflowRunner.fromOptions(options), Collections.<DataflowPackage>emptyList()).getJob();
    assertEquals(1, job.getEnvironment().getWorkerPools().size());
    assertEquals(diskSizeGb, job.getEnvironment().getWorkerPools().get(0).getDiskSizeGb());
}

Also used : DataflowPipelineOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineOptions) Job(com.google.api.services.dataflow.model.Job) DataflowPackage(com.google.api.services.dataflow.model.DataflowPackage) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Aggregations

Pipeline (org.apache.beam.sdk.Pipeline)184 Test (org.junit.Test)123 TestPipeline (org.apache.beam.sdk.testing.TestPipeline)86 DataflowPipelineOptions (org.apache.beam.runners.dataflow.options.DataflowPipelineOptions)39 KV (org.apache.beam.sdk.values.KV)35 Job (com.google.api.services.dataflow.model.Job)26 DoFn (org.apache.beam.sdk.transforms.DoFn)24 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)22 DataflowPackage (com.google.api.services.dataflow.model.DataflowPackage)21 TableRow (com.google.api.services.bigquery.model.TableRow)16 PipelineResult (org.apache.beam.sdk.PipelineResult)14 Structs.getString (org.apache.beam.runners.dataflow.util.Structs.getString)13 TableSchema (com.google.api.services.bigquery.model.TableSchema)12 ApexPipelineOptions (org.apache.beam.runners.apex.ApexPipelineOptions)12 Map (java.util.Map)11 TableFieldSchema (com.google.api.services.bigquery.model.TableFieldSchema)10 ArrayList (java.util.ArrayList)10 Instant (org.joda.time.Instant)10 TableReference (com.google.api.services.bigquery.model.TableReference)9 JsonSchemaToTableSchema (org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.JsonSchemaToTableSchema)9