Search in sources :

Example 1 with DeterministicallyConstructTestRowFn

use of org.apache.beam.sdk.io.common.TestRow.DeterministicallyConstructTestRowFn in project beam by apache.

the class DynamoDBIOIT method runWrite.

/**
 * Write test dataset to DynamoDB.
 */
private void runWrite() {
    int rows = env.options().getNumberOfRows();
    pipelineWrite.apply("Generate Sequence", GenerateSequence.from(0).to(rows)).apply("Prepare TestRows", ParDo.of(new DeterministicallyConstructTestRowFn())).apply("Write to DynamoDB", DynamoDBIO.<TestRow>write().withAwsClientsProvider(clientProvider()).withWriteRequestMapperFn(row -> buildWriteRequest(row)));
    pipelineWrite.run().waitUntilFinish();
}
Also used : Count(org.apache.beam.sdk.transforms.Count) KV(org.apache.beam.sdk.values.KV) PutRequest(com.amazonaws.services.dynamodbv2.model.PutRequest) AttributeDefinition(com.amazonaws.services.dynamodbv2.model.AttributeDefinition) KeySchemaElement(com.amazonaws.services.dynamodbv2.model.KeySchemaElement) Default(org.apache.beam.sdk.options.Default) Combine(org.apache.beam.sdk.transforms.Combine) KeyType(com.amazonaws.services.dynamodbv2.model.KeyType) RunWith(org.junit.runner.RunWith) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) WriteRequest(com.amazonaws.services.dynamodbv2.model.WriteRequest) Regions(com.amazonaws.regions.Regions) Description(org.apache.beam.sdk.options.Description) TableStatus(com.amazonaws.services.dynamodbv2.model.TableStatus) AttributeValue(com.amazonaws.services.dynamodbv2.model.AttributeValue) Map(java.util.Map) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) TypeDescriptors.strings(org.apache.beam.sdk.values.TypeDescriptors.strings) ITEnvironment(org.apache.beam.sdk.io.aws.ITEnvironment) TestRow.getExpectedHashForRowCount(org.apache.beam.sdk.io.common.TestRow.getExpectedHashForRowCount) ClassRule(org.junit.ClassRule) AWSCredentials(com.amazonaws.auth.AWSCredentials) Flatten(org.apache.beam.sdk.transforms.Flatten) MapElements(org.apache.beam.sdk.transforms.MapElements) HashingFn(org.apache.beam.sdk.io.common.HashingFn) DeterministicallyConstructTestRowFn(org.apache.beam.sdk.io.common.TestRow.DeterministicallyConstructTestRowFn) PAssert(org.apache.beam.sdk.testing.PAssert) TestRow(org.apache.beam.sdk.io.common.TestRow) ScanRequest(com.amazonaws.services.dynamodbv2.model.ScanRequest) AmazonDynamoDBClientBuilder(com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder) GenerateSequence(org.apache.beam.sdk.io.GenerateSequence) Test(org.junit.Test) JUnit4(org.junit.runners.JUnit4) AmazonDynamoDB(com.amazonaws.services.dynamodbv2.AmazonDynamoDB) PCollection(org.apache.beam.sdk.values.PCollection) CreateTableRequest(com.amazonaws.services.dynamodbv2.model.CreateTableRequest) ScalarAttributeType(com.amazonaws.services.dynamodbv2.model.ScalarAttributeType) ProvisionedThroughput(com.amazonaws.services.dynamodbv2.model.ProvisionedThroughput) Rule(org.junit.Rule) ExternalResource(org.junit.rules.ExternalResource) ParDo(org.apache.beam.sdk.transforms.ParDo) DYNAMODB(org.testcontainers.containers.localstack.LocalStackContainer.Service.DYNAMODB) DeterministicallyConstructTestRowFn(org.apache.beam.sdk.io.common.TestRow.DeterministicallyConstructTestRowFn)

Example 2 with DeterministicallyConstructTestRowFn

use of org.apache.beam.sdk.io.common.TestRow.DeterministicallyConstructTestRowFn in project beam by apache.

the class S3FileSystemIT method testWriteThenRead.

@Test
public void testWriteThenRead() {
    int rows = env.options().getNumberOfRows();
    // Write test dataset to S3.
    pipelineWrite.apply("Generate Sequence", GenerateSequence.from(0).to(rows)).apply("Prepare TestRows", ParDo.of(new DeterministicallyConstructTestRowFn())).apply("Prepare file rows", ParDo.of(new SelectNameFn())).apply("Write to S3 file", TextIO.write().to("s3://" + s3Bucket.name + "/test"));
    pipelineWrite.run().waitUntilFinish();
    // Read test dataset from S3.
    PCollection<String> output = pipelineRead.apply(TextIO.read().from("s3://" + s3Bucket.name + "/test*"));
    PAssert.thatSingleton(output.apply("Count All", Count.globally())).isEqualTo((long) rows);
    PAssert.that(output.apply(Combine.globally(new HashingFn()).withoutDefaults())).containsInAnyOrder(getExpectedHashForRowCount(rows));
    pipelineRead.run().waitUntilFinish();
}
Also used : DeterministicallyConstructTestRowFn(org.apache.beam.sdk.io.common.TestRow.DeterministicallyConstructTestRowFn) SelectNameFn(org.apache.beam.sdk.io.common.TestRow.SelectNameFn) HashingFn(org.apache.beam.sdk.io.common.HashingFn) Test(org.junit.Test)

Example 3 with DeterministicallyConstructTestRowFn

use of org.apache.beam.sdk.io.common.TestRow.DeterministicallyConstructTestRowFn in project beam by apache.

the class DynamoDBIOIT method runWrite.

/**
 * Write test dataset to DynamoDB.
 */
private void runWrite() {
    int rows = env.options().getNumberOfRows();
    pipelineWrite.apply("Generate Sequence", GenerateSequence.from(0).to(rows)).apply("Prepare TestRows", ParDo.of(new DeterministicallyConstructTestRowFn())).apply("Write to DynamoDB", DynamoDBIO.<TestRow>write().withWriteRequestMapperFn(row -> buildWriteRequest(row)));
    pipelineWrite.run().waitUntilFinish();
}
Also used : Count(org.apache.beam.sdk.transforms.Count) KV(org.apache.beam.sdk.values.KV) Default(org.apache.beam.sdk.options.Default) Combine(org.apache.beam.sdk.transforms.Combine) RunWith(org.junit.runner.RunWith) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) Description(org.apache.beam.sdk.options.Description) ProvisionedThroughput(software.amazon.awssdk.services.dynamodb.model.ProvisionedThroughput) Map(java.util.Map) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) CreateTableRequest(software.amazon.awssdk.services.dynamodb.model.CreateTableRequest) ScanRequest(software.amazon.awssdk.services.dynamodb.model.ScanRequest) TypeDescriptors.strings(org.apache.beam.sdk.values.TypeDescriptors.strings) WriteRequest(software.amazon.awssdk.services.dynamodb.model.WriteRequest) TestRow.getExpectedHashForRowCount(org.apache.beam.sdk.io.common.TestRow.getExpectedHashForRowCount) PutRequest(software.amazon.awssdk.services.dynamodb.model.PutRequest) ClassRule(org.junit.ClassRule) ScalarAttributeType(software.amazon.awssdk.services.dynamodb.model.ScalarAttributeType) Flatten(org.apache.beam.sdk.transforms.Flatten) MapElements(org.apache.beam.sdk.transforms.MapElements) DynamoDbClient(software.amazon.awssdk.services.dynamodb.DynamoDbClient) HashingFn(org.apache.beam.sdk.io.common.HashingFn) DeterministicallyConstructTestRowFn(org.apache.beam.sdk.io.common.TestRow.DeterministicallyConstructTestRowFn) TableStatus(software.amazon.awssdk.services.dynamodb.model.TableStatus) PAssert(org.apache.beam.sdk.testing.PAssert) TestRow(org.apache.beam.sdk.io.common.TestRow) KeyType(software.amazon.awssdk.services.dynamodb.model.KeyType) ITEnvironment(org.apache.beam.sdk.io.aws2.ITEnvironment) GenerateSequence(org.apache.beam.sdk.io.GenerateSequence) Test(org.junit.Test) JUnit4(org.junit.runners.JUnit4) AttributeDefinition(software.amazon.awssdk.services.dynamodb.model.AttributeDefinition) PCollection(org.apache.beam.sdk.values.PCollection) Rule(org.junit.Rule) ExternalResource(org.junit.rules.ExternalResource) KeySchemaElement(software.amazon.awssdk.services.dynamodb.model.KeySchemaElement) ParDo(org.apache.beam.sdk.transforms.ParDo) AttributeValue(software.amazon.awssdk.services.dynamodb.model.AttributeValue) DYNAMODB(org.testcontainers.containers.localstack.LocalStackContainer.Service.DYNAMODB) DeterministicallyConstructTestRowFn(org.apache.beam.sdk.io.common.TestRow.DeterministicallyConstructTestRowFn) TestRow(org.apache.beam.sdk.io.common.TestRow)

Example 4 with DeterministicallyConstructTestRowFn

use of org.apache.beam.sdk.io.common.TestRow.DeterministicallyConstructTestRowFn in project beam by apache.

the class S3FileSystemIT method testWriteThenRead.

@Test
public void testWriteThenRead() {
    int rows = env.options().getNumberOfRows();
    // Write test dataset to S3.
    pipelineWrite.apply("Generate Sequence", GenerateSequence.from(0).to(rows)).apply("Prepare TestRows", ParDo.of(new DeterministicallyConstructTestRowFn())).apply("Prepare file rows", ParDo.of(new SelectNameFn())).apply("Write to S3 file", TextIO.write().to("s3://" + s3Bucket.name + "/test"));
    pipelineWrite.run().waitUntilFinish();
    // Read test dataset from S3.
    PCollection<String> output = pipelineRead.apply(TextIO.read().from("s3://" + s3Bucket.name + "/test*"));
    PAssert.thatSingleton(output.apply(Count.globally())).isEqualTo((long) rows);
    PAssert.that(output.apply(Combine.globally(new HashingFn()).withoutDefaults())).containsInAnyOrder(getExpectedHashForRowCount(rows));
    pipelineRead.run().waitUntilFinish();
}
Also used : DeterministicallyConstructTestRowFn(org.apache.beam.sdk.io.common.TestRow.DeterministicallyConstructTestRowFn) SelectNameFn(org.apache.beam.sdk.io.common.TestRow.SelectNameFn) HashingFn(org.apache.beam.sdk.io.common.HashingFn) Test(org.junit.Test)

Example 5 with DeterministicallyConstructTestRowFn

use of org.apache.beam.sdk.io.common.TestRow.DeterministicallyConstructTestRowFn in project beam by apache.

the class SnsIOIT method testWriteThenRead.

@Test
public void testWriteThenRead() {
    ITOptions opts = env.options();
    int rows = opts.getNumberOfRows();
    // Write test dataset to SNS
    pipelineWrite.apply("Generate Sequence", GenerateSequence.from(0).to(rows)).apply("Prepare TestRows", ParDo.of(new DeterministicallyConstructTestRowFn())).apply("Write to SNS", SnsIO.<TestRow>write().withTopicArn(resources.snsTopic).withPublishRequestBuilder(r -> PublishRequest.builder().message(r.name())));
    // Read test dataset from SQS.
    PCollection<String> output = pipelineRead.apply("Read from SQS", SqsIO.read().withQueueUrl(resources.sqsQueue).withMaxNumRecords(rows)).apply("Extract message", MapElements.into(strings()).via(SnsIOIT::extractMessage));
    PAssert.thatSingleton(output.apply("Count All", Count.globally())).isEqualTo((long) rows);
    PAssert.that(output.apply(Combine.globally(new HashingFn()).withoutDefaults())).containsInAnyOrder(getExpectedHashForRowCount(rows));
    pipelineWrite.run();
    pipelineRead.run();
}
Also used : Count(org.apache.beam.sdk.transforms.Count) Combine(org.apache.beam.sdk.transforms.Combine) RunWith(org.junit.runner.RunWith) PublishRequest(software.amazon.awssdk.services.sns.model.PublishRequest) IOITHelper.executeWithRetry(org.apache.beam.sdk.io.common.IOITHelper.executeWithRetry) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) TypeDescriptors.strings(org.apache.beam.sdk.values.TypeDescriptors.strings) Timeout(org.junit.rules.Timeout) TestRow.getExpectedHashForRowCount(org.apache.beam.sdk.io.common.TestRow.getExpectedHashForRowCount) ClassRule(org.junit.ClassRule) SqsIO(org.apache.beam.sdk.io.aws2.sqs.SqsIO) Service(org.testcontainers.containers.localstack.LocalStackContainer.Service) MapElements(org.apache.beam.sdk.transforms.MapElements) SNS(org.testcontainers.containers.localstack.LocalStackContainer.Service.SNS) HashingFn(org.apache.beam.sdk.io.common.HashingFn) DeterministicallyConstructTestRowFn(org.apache.beam.sdk.io.common.TestRow.DeterministicallyConstructTestRowFn) PAssert(org.apache.beam.sdk.testing.PAssert) TestRow(org.apache.beam.sdk.io.common.TestRow) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper) ITEnvironment(org.apache.beam.sdk.io.aws2.ITEnvironment) SqsClient(software.amazon.awssdk.services.sqs.SqsClient) JsonProcessingException(com.fasterxml.jackson.core.JsonProcessingException) GenerateSequence(org.apache.beam.sdk.io.GenerateSequence) Test(org.junit.Test) JUnit4(org.junit.runners.JUnit4) PCollection(org.apache.beam.sdk.values.PCollection) Serializable(java.io.Serializable) Rule(org.junit.Rule) ExternalResource(org.junit.rules.ExternalResource) ParDo(org.apache.beam.sdk.transforms.ParDo) SnsClient(software.amazon.awssdk.services.sns.SnsClient) SqsMessage(org.apache.beam.sdk.io.aws2.sqs.SqsMessage) SQS(org.testcontainers.containers.localstack.LocalStackContainer.Service.SQS) DeterministicallyConstructTestRowFn(org.apache.beam.sdk.io.common.TestRow.DeterministicallyConstructTestRowFn) HashingFn(org.apache.beam.sdk.io.common.HashingFn) Test(org.junit.Test)

Aggregations

HashingFn (org.apache.beam.sdk.io.common.HashingFn)8 DeterministicallyConstructTestRowFn (org.apache.beam.sdk.io.common.TestRow.DeterministicallyConstructTestRowFn)8 Test (org.junit.Test)8 GenerateSequence (org.apache.beam.sdk.io.GenerateSequence)3 TestRow (org.apache.beam.sdk.io.common.TestRow)3 TestRow.getExpectedHashForRowCount (org.apache.beam.sdk.io.common.TestRow.getExpectedHashForRowCount)3 PAssert (org.apache.beam.sdk.testing.PAssert)3 TestPipeline (org.apache.beam.sdk.testing.TestPipeline)3 Combine (org.apache.beam.sdk.transforms.Combine)3 Count (org.apache.beam.sdk.transforms.Count)3 MapElements (org.apache.beam.sdk.transforms.MapElements)3 ParDo (org.apache.beam.sdk.transforms.ParDo)3 PCollection (org.apache.beam.sdk.values.PCollection)3 TypeDescriptors.strings (org.apache.beam.sdk.values.TypeDescriptors.strings)3 ClassRule (org.junit.ClassRule)3 Rule (org.junit.Rule)3 ExternalResource (org.junit.rules.ExternalResource)3 RunWith (org.junit.runner.RunWith)3 JUnit4 (org.junit.runners.JUnit4)3 Map (java.util.Map)2