Examples with Pipeline - org.apache.beam.sdk.Pipeline

Example 56 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class TrackStreamingSourcesTest method testTrackSingle.

@Test
public void testTrackSingle() {
    options.setRunner(SparkRunner.class);
    JavaSparkContext jsc = SparkContextFactory.getSparkContext(options);
    JavaStreamingContext jssc = new JavaStreamingContext(jsc, new org.apache.spark.streaming.Duration(options.getBatchIntervalMillis()));
    Pipeline p = Pipeline.create(options);
    CreateStream<Integer> emptyStream = CreateStream.of(VarIntCoder.of(), Duration.millis(options.getBatchIntervalMillis())).emptyBatch();
    p.apply(emptyStream).apply(ParDo.of(new PassthroughFn<>()));
    p.traverseTopologically(new StreamingSourceTracker(jssc, p, ParDo.MultiOutput.class, 0));
    assertThat(StreamingSourceTracker.numAssertions, equalTo(1));
}

Also used : JavaStreamingContext(org.apache.spark.streaming.api.java.JavaStreamingContext) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Example 57 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class ForceStreamingTest method test.

@Test
public void test() throws IOException {
    TestSparkPipelineOptions options = PipelineOptionsFactory.create().as(TestSparkPipelineOptions.class);
    options.setRunner(TestSparkRunner.class);
    options.setForceStreaming(true);
    // pipeline with a bounded read.
    Pipeline pipeline = Pipeline.create(options);
    // apply the BoundedReadFromUnboundedSource.
    BoundedReadFromUnboundedSource<?> boundedRead = Read.from(CountingSource.unbounded()).withMaxNumRecords(-1);
    pipeline.apply(boundedRead);
    // adapt reads
    TestSparkRunner runner = TestSparkRunner.fromOptions(options);
    runner.adaptBoundedReads(pipeline);
    UnboundedReadDetector unboundedReadDetector = new UnboundedReadDetector();
    pipeline.traverseTopologically(unboundedReadDetector);
    // assert that the applied BoundedReadFromUnboundedSource
    // is being treated as an unbounded read.
    assertThat("Expected to have an unbounded read.", unboundedReadDetector.isUnbounded);
}

Also used : Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Example 58 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class SparkRuntimeContextTest method testSerializingPipelineOptionsWithCustomUserType.

@Test
public void testSerializingPipelineOptionsWithCustomUserType() throws Exception {
    PipelineOptions options = PipelineOptionsFactory.fromArgs("--jacksonIncompatible=\"testValue\"").as(JacksonIncompatibleOptions.class);
    options.setRunner(CrashingRunner.class);
    Pipeline p = Pipeline.create(options);
    SparkRuntimeContext context = new SparkRuntimeContext(p, options);
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try (ObjectOutputStream outputStream = new ObjectOutputStream(baos)) {
        outputStream.writeObject(context);
    }
    try (ObjectInputStream inputStream = new ObjectInputStream(new ByteArrayInputStream(baos.toByteArray()))) {
        SparkRuntimeContext copy = (SparkRuntimeContext) inputStream.readObject();
        assertEquals("testValue", copy.getPipelineOptions().as(JacksonIncompatibleOptions.class).getJacksonIncompatible().value);
    }
}

Also used : ByteArrayInputStream(java.io.ByteArrayInputStream) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) ByteArrayOutputStream(java.io.ByteArrayOutputStream) ObjectOutputStream(java.io.ObjectOutputStream) Pipeline(org.apache.beam.sdk.Pipeline) ObjectInputStream(java.io.ObjectInputStream) Test(org.junit.Test)

Example 59 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class SparkPipelineStateTest method getPipeline.

private Pipeline getPipeline(final SparkPipelineOptions options) {
    final Pipeline pipeline = Pipeline.create(options);
    final String name = testName.getMethodName() + "(isStreaming=" + options.isStreaming() + ")";
    pipeline.apply(getValues(options)).setCoder(StringUtf8Coder.of()).apply(printParDo(name));
    return pipeline;
}

Also used : Pipeline(org.apache.beam.sdk.Pipeline)

Example 60 with Pipeline

use of org.apache.beam.sdk.Pipeline in project beam by apache.

the class SparkPipelineStateTest method testRunningPipeline.

private void testRunningPipeline(final SparkPipelineOptions options) throws Exception {
    final Pipeline pipeline = getPipeline(options);
    final SparkPipelineResult result = (SparkPipelineResult) pipeline.run();
    assertThat(result.getState(), is(PipelineResult.State.RUNNING));
    result.cancel();
}

Also used : Pipeline(org.apache.beam.sdk.Pipeline)

Aggregations

Pipeline (org.apache.beam.sdk.Pipeline)184 Test (org.junit.Test)123 TestPipeline (org.apache.beam.sdk.testing.TestPipeline)86 DataflowPipelineOptions (org.apache.beam.runners.dataflow.options.DataflowPipelineOptions)39 KV (org.apache.beam.sdk.values.KV)35 Job (com.google.api.services.dataflow.model.Job)26 DoFn (org.apache.beam.sdk.transforms.DoFn)24 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)22 DataflowPackage (com.google.api.services.dataflow.model.DataflowPackage)21 TableRow (com.google.api.services.bigquery.model.TableRow)16 PipelineResult (org.apache.beam.sdk.PipelineResult)14 Structs.getString (org.apache.beam.runners.dataflow.util.Structs.getString)13 TableSchema (com.google.api.services.bigquery.model.TableSchema)12 ApexPipelineOptions (org.apache.beam.runners.apex.ApexPipelineOptions)12 Map (java.util.Map)11 TableFieldSchema (com.google.api.services.bigquery.model.TableFieldSchema)10 ArrayList (java.util.ArrayList)10 Instant (org.joda.time.Instant)10 TableReference (com.google.api.services.bigquery.model.TableReference)9 JsonSchemaToTableSchema (org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.JsonSchemaToTableSchema)9