Examples with SyntheticBoundedSource - org.apache.beam.sdk.io.synthetic.SyntheticBoundedSource

Example 1 with SyntheticBoundedSource

use of org.apache.beam.sdk.io.synthetic.SyntheticBoundedSource in project beam by apache.

the class KafkaIOIT method testKafkaIOReadsAndWritesCorrectlyInStreaming.

@Test
public void testKafkaIOReadsAndWritesCorrectlyInStreaming() throws IOException {
    // Use batch pipeline to write records.
    writePipeline.apply("Generate records", Read.from(new SyntheticBoundedSource(sourceOptions))).apply("Measure write time", ParDo.of(new TimeMonitor<>(NAMESPACE, WRITE_TIME_METRIC_NAME))).apply("Write to Kafka", writeToKafka());
    // Use streaming pipeline to read Kafka records.
    readPipeline.getOptions().as(Options.class).setStreaming(true);
    readPipeline.apply("Read from unbounded Kafka", readFromKafka()).apply("Measure read time", ParDo.of(new TimeMonitor<>(NAMESPACE, READ_TIME_METRIC_NAME))).apply("Map records to strings", MapElements.via(new MapKafkaRecordsToStrings())).apply("Counting element", ParDo.of(new CountingFn(NAMESPACE, READ_ELEMENT_METRIC_NAME)));
    PipelineResult writeResult = writePipeline.run();
    writeResult.waitUntilFinish();
    PipelineResult readResult = readPipeline.run();
    PipelineResult.State readState = readResult.waitUntilFinish(Duration.standardSeconds(options.getReadTimeout()));
    cancelIfTimeouted(readResult, readState);
    assertEquals(sourceOptions.numRecords, readElementMetric(readResult, NAMESPACE, READ_ELEMENT_METRIC_NAME));
    if (!options.isWithTestcontainers()) {
        Set<NamedTestResult> metrics = readMetrics(writeResult, readResult);
        IOITMetrics.publishToInflux(TEST_ID, TIMESTAMP, metrics, settings);
    }
}

Also used : SyntheticBoundedSource(org.apache.beam.sdk.io.synthetic.SyntheticBoundedSource) TimeMonitor(org.apache.beam.sdk.testutils.metrics.TimeMonitor) StreamingOptions(org.apache.beam.sdk.options.StreamingOptions) SyntheticSourceOptions(org.apache.beam.sdk.io.synthetic.SyntheticSourceOptions) IOTestPipelineOptions(org.apache.beam.sdk.io.common.IOTestPipelineOptions) NamedTestResult(org.apache.beam.sdk.testutils.NamedTestResult) PipelineResult(org.apache.beam.sdk.PipelineResult) Test(org.junit.Test)

Example 2 with SyntheticBoundedSource

use of org.apache.beam.sdk.io.synthetic.SyntheticBoundedSource in project beam by apache.

the class BigQueryIOIT method testWrite.

private void testWrite(BigQueryIO.Write<byte[]> writeIO, String metricName) {
    Pipeline pipeline = Pipeline.create(options);
    BigQueryIO.Write.Method method = BigQueryIO.Write.Method.valueOf(options.getWriteMethod());
    pipeline.apply("Read from source", Read.from(new SyntheticBoundedSource(sourceOptions))).apply("Gather time", ParDo.of(new TimeMonitor<>(NAMESPACE, metricName))).apply("Map records", ParDo.of(new MapKVToV())).apply("Write to BQ", writeIO.to(tableQualifier).withCustomGcsTempLocation(ValueProvider.StaticValueProvider.of(tempRoot)).withMethod(method).withSchema(new TableSchema().setFields(Collections.singletonList(new TableFieldSchema().setName("data").setType("BYTES")))));
    PipelineResult pipelineResult = pipeline.run();
    pipelineResult.waitUntilFinish();
    extractAndPublishTime(pipelineResult, metricName);
}

Also used : SyntheticBoundedSource(org.apache.beam.sdk.io.synthetic.SyntheticBoundedSource) TimeMonitor(org.apache.beam.sdk.testutils.metrics.TimeMonitor) TableSchema(com.google.api.services.bigquery.model.TableSchema) PipelineResult(org.apache.beam.sdk.PipelineResult) TableFieldSchema(com.google.api.services.bigquery.model.TableFieldSchema) Pipeline(org.apache.beam.sdk.Pipeline)

Example 3 with SyntheticBoundedSource

use of org.apache.beam.sdk.io.synthetic.SyntheticBoundedSource in project beam by apache.

the class SyntheticDataPublisher method main.

public static void main(String[] args) throws IOException {
    options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
    SyntheticSourceOptions sourceOptions = SyntheticOptions.fromJsonString(options.getSourceOptions(), SyntheticSourceOptions.class);
    Pipeline pipeline = Pipeline.create(options);
    PCollection<KV<byte[], byte[]>> syntheticData = pipeline.apply("Read synthetic data", Read.from(new SyntheticBoundedSource(sourceOptions)));
    if (options.getKafkaBootstrapServerAddress() != null && options.getKafkaTopic() != null) {
        writeToKafka(syntheticData);
    }
    if (options.getPubSubTopic() != null) {
        writeToPubSub(syntheticData);
    }
    if (allKinesisOptionsConfigured()) {
        writeToKinesis(syntheticData);
    }
    pipeline.run().waitUntilFinish();
}

Also used : SyntheticBoundedSource(org.apache.beam.sdk.io.synthetic.SyntheticBoundedSource) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) ApplicationNameOptions(org.apache.beam.sdk.options.ApplicationNameOptions) SyntheticSourceOptions(org.apache.beam.sdk.io.synthetic.SyntheticSourceOptions) SyntheticOptions(org.apache.beam.sdk.io.synthetic.SyntheticOptions) KV(org.apache.beam.sdk.values.KV) SyntheticSourceOptions(org.apache.beam.sdk.io.synthetic.SyntheticSourceOptions) Pipeline(org.apache.beam.sdk.Pipeline)

Example 4 with SyntheticBoundedSource

use of org.apache.beam.sdk.io.synthetic.SyntheticBoundedSource in project beam by apache.

the class KafkaIOIT method testKafkaIOReadsAndWritesCorrectlyInBatch.

@Test
public void testKafkaIOReadsAndWritesCorrectlyInBatch() throws IOException {
    // Map of hashes of set size collections with 100b records - 10b key, 90b values.
    Map<Long, String> expectedHashes = ImmutableMap.of(1000L, "4507649971ee7c51abbb446e65a5c660", 100_000_000L, "0f12c27c9a7672e14775594be66cad9a");
    expectedHashcode = getHashForRecordCount(sourceOptions.numRecords, expectedHashes);
    writePipeline.apply("Generate records", Read.from(new SyntheticBoundedSource(sourceOptions))).apply("Measure write time", ParDo.of(new TimeMonitor<>(NAMESPACE, WRITE_TIME_METRIC_NAME))).apply("Write to Kafka", writeToKafka());
    PCollection<String> hashcode = readPipeline.apply("Read from bounded Kafka", readFromBoundedKafka()).apply("Measure read time", ParDo.of(new TimeMonitor<>(NAMESPACE, READ_TIME_METRIC_NAME))).apply("Map records to strings", MapElements.via(new MapKafkaRecordsToStrings())).apply("Calculate hashcode", Combine.globally(new HashingFn()).withoutDefaults());
    PAssert.thatSingleton(hashcode).isEqualTo(expectedHashcode);
    PipelineResult writeResult = writePipeline.run();
    writeResult.waitUntilFinish();
    PipelineResult readResult = readPipeline.run();
    PipelineResult.State readState = readResult.waitUntilFinish(Duration.standardSeconds(options.getReadTimeout()));
    cancelIfTimeouted(readResult, readState);
    if (!options.isWithTestcontainers()) {
        Set<NamedTestResult> metrics = readMetrics(writeResult, readResult);
        IOITMetrics.publishToInflux(TEST_ID, TIMESTAMP, metrics, settings);
    }
}

Also used : TimeMonitor(org.apache.beam.sdk.testutils.metrics.TimeMonitor) PipelineResult(org.apache.beam.sdk.PipelineResult) SyntheticOptions.fromJsonString(org.apache.beam.sdk.io.synthetic.SyntheticOptions.fromJsonString) HashingFn(org.apache.beam.sdk.io.common.HashingFn) SyntheticBoundedSource(org.apache.beam.sdk.io.synthetic.SyntheticBoundedSource) NamedTestResult(org.apache.beam.sdk.testutils.NamedTestResult) Test(org.junit.Test)

Aggregations

SyntheticBoundedSource (org.apache.beam.sdk.io.synthetic.SyntheticBoundedSource)4 PipelineResult (org.apache.beam.sdk.PipelineResult)3 TimeMonitor (org.apache.beam.sdk.testutils.metrics.TimeMonitor)3 Pipeline (org.apache.beam.sdk.Pipeline)2 SyntheticSourceOptions (org.apache.beam.sdk.io.synthetic.SyntheticSourceOptions)2 NamedTestResult (org.apache.beam.sdk.testutils.NamedTestResult)2 Test (org.junit.Test)2 TableFieldSchema (com.google.api.services.bigquery.model.TableFieldSchema)1 TableSchema (com.google.api.services.bigquery.model.TableSchema)1 HashingFn (org.apache.beam.sdk.io.common.HashingFn)1 IOTestPipelineOptions (org.apache.beam.sdk.io.common.IOTestPipelineOptions)1 SyntheticOptions (org.apache.beam.sdk.io.synthetic.SyntheticOptions)1 SyntheticOptions.fromJsonString (org.apache.beam.sdk.io.synthetic.SyntheticOptions.fromJsonString)1 ApplicationNameOptions (org.apache.beam.sdk.options.ApplicationNameOptions)1 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)1 StreamingOptions (org.apache.beam.sdk.options.StreamingOptions)1 KV (org.apache.beam.sdk.values.KV)1