Search in sources :

Example 1 with TimeMonitor

use of org.apache.beam.sdk.testutils.metrics.TimeMonitor in project beam by apache.

the class BigQueryIOPushDownIT method readUsingDefaultMethod.

@Test
public void readUsingDefaultMethod() {
    sqlEnv.executeDdl(String.format(CREATE_TABLE_STATEMENT, Method.DEFAULT.toString()));
    BeamRelNode beamRelNode = sqlEnv.parseQuery(SELECT_STATEMENT);
    BeamSqlRelUtils.toPCollection(pipeline, beamRelNode).apply(ParDo.of(new TimeMonitor<>(NAMESPACE, READ_TIME_METRIC)));
    PipelineResult result = pipeline.run();
    result.waitUntilFinish();
    collectAndPublishMetrics(result, "_default");
}
Also used : TimeMonitor(org.apache.beam.sdk.testutils.metrics.TimeMonitor) BeamRelNode(org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode) PipelineResult(org.apache.beam.sdk.PipelineResult) Test(org.junit.Test)

Example 2 with TimeMonitor

use of org.apache.beam.sdk.testutils.metrics.TimeMonitor in project beam by apache.

the class MongoDBIOIT method testWriteAndRead.

@Test
public void testWriteAndRead() {
    initialCollectionSize = getCollectionSizeInBytes(collection);
    writePipeline.apply("Generate sequence", GenerateSequence.from(0).to(options.getNumberOfRecords())).apply("Produce documents", MapElements.via(new LongToDocumentFn())).apply("Collect write time metric", ParDo.of(new TimeMonitor<>(NAMESPACE, "write_time"))).apply("Write documents to MongoDB", MongoDbIO.write().withUri(mongoUrl).withDatabase(options.getMongoDBDatabaseName()).withCollection(collection));
    PipelineResult writeResult = writePipeline.run();
    writeResult.waitUntilFinish();
    finalCollectionSize = getCollectionSizeInBytes(collection);
    PCollection<String> consolidatedHashcode = readPipeline.apply("Read all documents", MongoDbIO.read().withUri(mongoUrl).withDatabase(options.getMongoDBDatabaseName()).withCollection(collection)).apply("Collect read time metrics", ParDo.of(new TimeMonitor<>(NAMESPACE, "read_time"))).apply("Map documents to Strings", MapElements.via(new DocumentToStringFn())).apply("Calculate hashcode", Combine.globally(new HashingFn()));
    String expectedHash = getHashForRecordCount(options.getNumberOfRecords(), EXPECTED_HASHES);
    PAssert.thatSingleton(consolidatedHashcode).isEqualTo(expectedHash);
    PipelineResult readResult = readPipeline.run();
    readResult.waitUntilFinish();
    collectAndPublishMetrics(writeResult, readResult);
}
Also used : TimeMonitor(org.apache.beam.sdk.testutils.metrics.TimeMonitor) PipelineResult(org.apache.beam.sdk.PipelineResult) HashingFn(org.apache.beam.sdk.io.common.HashingFn) Test(org.junit.Test)

Example 3 with TimeMonitor

use of org.apache.beam.sdk.testutils.metrics.TimeMonitor in project beam by apache.

the class KafkaIOIT method testKafkaIOReadsAndWritesCorrectlyInStreaming.

@Test
public void testKafkaIOReadsAndWritesCorrectlyInStreaming() throws IOException {
    // Use batch pipeline to write records.
    writePipeline.apply("Generate records", Read.from(new SyntheticBoundedSource(sourceOptions))).apply("Measure write time", ParDo.of(new TimeMonitor<>(NAMESPACE, WRITE_TIME_METRIC_NAME))).apply("Write to Kafka", writeToKafka());
    // Use streaming pipeline to read Kafka records.
    readPipeline.getOptions().as(Options.class).setStreaming(true);
    readPipeline.apply("Read from unbounded Kafka", readFromKafka()).apply("Measure read time", ParDo.of(new TimeMonitor<>(NAMESPACE, READ_TIME_METRIC_NAME))).apply("Map records to strings", MapElements.via(new MapKafkaRecordsToStrings())).apply("Counting element", ParDo.of(new CountingFn(NAMESPACE, READ_ELEMENT_METRIC_NAME)));
    PipelineResult writeResult = writePipeline.run();
    writeResult.waitUntilFinish();
    PipelineResult readResult = readPipeline.run();
    PipelineResult.State readState = readResult.waitUntilFinish(Duration.standardSeconds(options.getReadTimeout()));
    cancelIfTimeouted(readResult, readState);
    assertEquals(sourceOptions.numRecords, readElementMetric(readResult, NAMESPACE, READ_ELEMENT_METRIC_NAME));
    if (!options.isWithTestcontainers()) {
        Set<NamedTestResult> metrics = readMetrics(writeResult, readResult);
        IOITMetrics.publishToInflux(TEST_ID, TIMESTAMP, metrics, settings);
    }
}
Also used : SyntheticBoundedSource(org.apache.beam.sdk.io.synthetic.SyntheticBoundedSource) TimeMonitor(org.apache.beam.sdk.testutils.metrics.TimeMonitor) StreamingOptions(org.apache.beam.sdk.options.StreamingOptions) SyntheticSourceOptions(org.apache.beam.sdk.io.synthetic.SyntheticSourceOptions) IOTestPipelineOptions(org.apache.beam.sdk.io.common.IOTestPipelineOptions) NamedTestResult(org.apache.beam.sdk.testutils.NamedTestResult) PipelineResult(org.apache.beam.sdk.PipelineResult) Test(org.junit.Test)

Example 4 with TimeMonitor

use of org.apache.beam.sdk.testutils.metrics.TimeMonitor in project beam by apache.

the class AvroIOIT method writeThenReadAll.

@Test
public void writeThenReadAll() {
    PCollection<String> testFilenames = pipeline.apply("Generate sequence", GenerateSequence.from(0).to(numberOfTextLines)).apply("Produce text lines", ParDo.of(new FileBasedIOITHelper.DeterministicallyConstructTestTextLineFn())).apply("Produce Avro records", ParDo.of(new DeterministicallyConstructAvroRecordsFn())).setCoder(AvroCoder.of(AVRO_SCHEMA)).apply("Collect start time", ParDo.of(new TimeMonitor<>(AVRO_NAMESPACE, "writeStart"))).apply("Write Avro records to files", AvroIO.writeGenericRecords(AVRO_SCHEMA).to(filenamePrefix).withOutputFilenames().withSuffix(".avro")).getPerDestinationOutputFilenames().apply("Collect middle time", ParDo.of(new TimeMonitor<>(AVRO_NAMESPACE, "middlePoint"))).apply(Values.create());
    PCollection<String> consolidatedHashcode = testFilenames.apply("Match all files", FileIO.matchAll()).apply("Read matches", FileIO.readMatches().withDirectoryTreatment(DirectoryTreatment.PROHIBIT)).apply("Read files", AvroIO.readFilesGenericRecords(AVRO_SCHEMA)).apply("Collect end time", ParDo.of(new TimeMonitor<>(AVRO_NAMESPACE, "endPoint"))).apply("Parse Avro records to Strings", ParDo.of(new ParseAvroRecordsFn())).apply("Calculate hashcode", Combine.globally(new HashingFn()));
    PAssert.thatSingleton(consolidatedHashcode).isEqualTo(expectedHash);
    testFilenames.apply("Delete test files", ParDo.of(new DeleteFileFn()).withSideInputs(consolidatedHashcode.apply(View.asSingleton())));
    PipelineResult result = pipeline.run();
    result.waitUntilFinish();
    collectAndPublishMetrics(result);
}
Also used : TimeMonitor(org.apache.beam.sdk.testutils.metrics.TimeMonitor) FileBasedIOITHelper(org.apache.beam.sdk.io.common.FileBasedIOITHelper) PipelineResult(org.apache.beam.sdk.PipelineResult) HashingFn(org.apache.beam.sdk.io.common.HashingFn) DeleteFileFn(org.apache.beam.sdk.io.common.FileBasedIOITHelper.DeleteFileFn) Test(org.junit.Test)

Example 5 with TimeMonitor

use of org.apache.beam.sdk.testutils.metrics.TimeMonitor in project beam by apache.

the class TextIOIT method writeThenReadAll.

@Test
public void writeThenReadAll() {
    TextIO.TypedWrite<String, Object> write = TextIO.write().to(filenamePrefix).withOutputFilenames().withCompression(compressionType);
    if (numShards != null) {
        write = write.withNumShards(numShards);
    }
    PCollection<String> testFilenames = pipeline.apply("Generate sequence", GenerateSequence.from(0).to(numberOfTextLines)).apply("Produce text lines", ParDo.of(new FileBasedIOITHelper.DeterministicallyConstructTestTextLineFn())).apply("Collect write start time", ParDo.of(new TimeMonitor<>(FILEIOIT_NAMESPACE, "startTime"))).apply("Write content to files", write).getPerDestinationOutputFilenames().apply(Values.create()).apply("Collect write end time", ParDo.of(new TimeMonitor<>(FILEIOIT_NAMESPACE, "middleTime")));
    PCollection<String> consolidatedHashcode = testFilenames.apply("Match all files", FileIO.matchAll()).apply("Read matches", FileIO.readMatches().withDirectoryTreatment(DirectoryTreatment.PROHIBIT)).apply("Read files", TextIO.readFiles()).apply("Collect read end time", ParDo.of(new TimeMonitor<>(FILEIOIT_NAMESPACE, "endTime"))).apply("Calculate hashcode", Combine.globally(new HashingFn()));
    PAssert.thatSingleton(consolidatedHashcode).isEqualTo(expectedHash);
    testFilenames.apply("Delete test files", ParDo.of(new DeleteFileFn()).withSideInputs(consolidatedHashcode.apply(View.asSingleton())));
    PipelineResult result = pipeline.run();
    result.waitUntilFinish();
    collectAndPublishMetrics(result);
}
Also used : TimeMonitor(org.apache.beam.sdk.testutils.metrics.TimeMonitor) FileBasedIOITHelper(org.apache.beam.sdk.io.common.FileBasedIOITHelper) PipelineResult(org.apache.beam.sdk.PipelineResult) TextIO(org.apache.beam.sdk.io.TextIO) HashingFn(org.apache.beam.sdk.io.common.HashingFn) DeleteFileFn(org.apache.beam.sdk.io.common.FileBasedIOITHelper.DeleteFileFn) Test(org.junit.Test)

Aggregations

TimeMonitor (org.apache.beam.sdk.testutils.metrics.TimeMonitor)14 PipelineResult (org.apache.beam.sdk.PipelineResult)13 Test (org.junit.Test)11 HashingFn (org.apache.beam.sdk.io.common.HashingFn)8 FileBasedIOITHelper (org.apache.beam.sdk.io.common.FileBasedIOITHelper)4 BeamRelNode (org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode)3 SyntheticBoundedSource (org.apache.beam.sdk.io.synthetic.SyntheticBoundedSource)3 NamedTestResult (org.apache.beam.sdk.testutils.NamedTestResult)3 ArrayList (java.util.ArrayList)2 Pipeline (org.apache.beam.sdk.Pipeline)2 DeleteFileFn (org.apache.beam.sdk.io.common.FileBasedIOITHelper.DeleteFileFn)2 TestRow (org.apache.beam.sdk.io.common.TestRow)2 TableFieldSchema (com.google.api.services.bigquery.model.TableFieldSchema)1 TableSchema (com.google.api.services.bigquery.model.TableSchema)1 Timestamp (com.google.cloud.Timestamp)1 HashSet (java.util.HashSet)1 List (java.util.List)1 Set (java.util.Set)1 UUID (java.util.UUID)1 Function (java.util.function.Function)1