Examples with PCollection - org.apache.beam.sdk.values.PCollection

Example 91 with PCollection

use of org.apache.beam.sdk.values.PCollection in project beam by apache.

the class TranslationContextTest method testRegisterInputMessageStreams.

@Test
public void testRegisterInputMessageStreams() {
    final PCollection output = mock(PCollection.class);
    List<String> topics = Arrays.asList("stream1", "stream2");
    List inputDescriptors = topics.stream().map(topicName -> createSamzaInputDescriptor(topicName, topicName)).collect(Collectors.toList());
    translationContext.registerInputMessageStreams(output, inputDescriptors);
    assertNotNull(translationContext.getMessageStream(output));
}

Also used : Arrays(java.util.Arrays) Assert.assertNotNull(org.junit.Assert.assertNotNull) StreamApplicationDescriptorImpl(org.apache.samza.application.descriptors.StreamApplicationDescriptorImpl) GenericSystemDescriptor(org.apache.samza.system.descriptors.GenericSystemDescriptor) HashMap(java.util.HashMap) Serde(org.apache.samza.serializers.Serde) Test(org.junit.Test) OpMessage(org.apache.beam.runners.samza.runtime.OpMessage) PCollection(org.apache.beam.sdk.values.PCollection) GenericInputDescriptor(org.apache.samza.system.descriptors.GenericInputDescriptor) Collectors(java.util.stream.Collectors) MapFunction(org.apache.samza.operators.functions.MapFunction) SamzaPipelineOptions(org.apache.beam.runners.samza.SamzaPipelineOptions) List(java.util.List) PValue(org.apache.beam.sdk.values.PValue) StreamApplicationDescriptor(org.apache.samza.application.descriptors.StreamApplicationDescriptor) Map(java.util.Map) Config(org.apache.samza.config.Config) KVSerde(org.apache.samza.serializers.KVSerde) MapConfig(org.apache.samza.config.MapConfig) KV(org.apache.samza.operators.KV) NoOpSerde(org.apache.samza.serializers.NoOpSerde) Mockito.mock(org.mockito.Mockito.mock) MessageStream(org.apache.samza.operators.MessageStream) PCollection(org.apache.beam.sdk.values.PCollection) List(java.util.List) Test(org.junit.Test)

Example 92 with PCollection

use of org.apache.beam.sdk.values.PCollection in project beam by apache.

the class BigQuerySamplesIT method testTableIO.

@Test
public void testTableIO() throws Exception {
    String table = testName.getMethodName();
    // ===--- Test 1: createTableRow + writeToTable ---===\\
    // The rest of the tests depend on this since this is the one that writes
    // the contents into the BigQuery table, which the other tests then read.
    TableSchema schema = BigQuerySchemaCreate.createSchema();
    PCollection<TableRow> rows = writePipeline.apply(Create.of(Arrays.asList(BigQueryTableRowCreate.createTableRow())));
    BigQueryWriteToTable.writeToTable(PROJECT, DATASET, table, schema, rows);
    writePipeline.run().waitUntilFinish();
    // Check that the BigQuery table has the data using the BigQuery Client Library.
    String query = String.format("SELECT * FROM `%s.%s.%s`", PROJECT, DATASET, table);
    List<String> queryResults = StreamSupport.stream(BIGQUERY.query(QueryJobConfiguration.of(query)).iterateAll().spliterator(), false).flatMap(values -> fieldValueListToStrings(values).stream()).collect(Collectors.toList());
    assertEquals(expected, queryResults);
    // ===--- Test 2: readFromTable ---=== \\
    readAndCheck(BigQueryReadFromTable.readFromTable(PROJECT, DATASET, table, readTablePipeline));
    readTablePipeline.run().waitUntilFinish();
    // ===--- Test 3: readFromQuery ---=== \\
    readAndCheck(BigQueryReadFromQuery.readFromQuery(PROJECT, DATASET, table, readQueryPipeline));
    readQueryPipeline.run().waitUntilFinish();
    // ===--- Test 4: readFromTableWithBigQueryStorageAPI ---=== \\
    readAndCheck(BigQueryReadFromTableWithBigQueryStorageAPI.readFromTableWithBigQueryStorageAPI(PROJECT, DATASET, table, readBQStorageAPIPipeline));
    readBQStorageAPIPipeline.run().waitUntilFinish();
}

Also used : FieldValue(com.google.cloud.bigquery.FieldValue) Arrays(java.util.Arrays) DatasetDeleteOption(com.google.cloud.bigquery.BigQuery.DatasetDeleteOption) BeforeClass(org.junit.BeforeClass) LocalDateTime(java.time.LocalDateTime) RunWith(org.junit.runner.RunWith) DatasetId(com.google.cloud.bigquery.DatasetId) BigQuery(com.google.cloud.bigquery.BigQuery) BigQueryOptions(com.google.cloud.bigquery.BigQueryOptions) SecureRandom(java.security.SecureRandom) BigDecimal(java.math.BigDecimal) FieldValueList(com.google.cloud.bigquery.FieldValueList) Create(org.apache.beam.sdk.transforms.Create) TestName(org.junit.rules.TestName) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) TableRow(com.google.api.services.bigquery.model.TableRow) LocalTime(java.time.LocalTime) TableSchema(com.google.api.services.bigquery.model.TableSchema) StreamSupport(java.util.stream.StreamSupport) AfterClass(org.junit.AfterClass) GcpOptions(org.apache.beam.sdk.extensions.gcp.options.GcpOptions) PAssert(org.apache.beam.sdk.testing.PAssert) QueryJobConfiguration(com.google.cloud.bigquery.QueryJobConfiguration) MyStruct(org.apache.beam.examples.snippets.transforms.io.gcp.bigquery.BigQueryMyData.MyStruct) FlatMapElements(org.apache.beam.sdk.transforms.FlatMapElements) Test(org.junit.Test) JUnit4(org.junit.runners.JUnit4) Instant(java.time.Instant) PCollection(org.apache.beam.sdk.values.PCollection) Collectors(java.util.stream.Collectors) Base64(java.util.Base64) List(java.util.List) Rule(org.junit.Rule) LocalDate(java.time.LocalDate) TypeDescriptors(org.apache.beam.sdk.values.TypeDescriptors) MyData(org.apache.beam.examples.snippets.transforms.io.gcp.bigquery.BigQueryMyData.MyData) DatasetInfo(com.google.cloud.bigquery.DatasetInfo) Assert.assertEquals(org.junit.Assert.assertEquals) TableSchema(com.google.api.services.bigquery.model.TableSchema) TableRow(com.google.api.services.bigquery.model.TableRow) Test(org.junit.Test)

Example 93 with PCollection

use of org.apache.beam.sdk.values.PCollection in project beam by apache.

the class ReadSourcePortableTest method testExecution.

@Test(timeout = 120_000)
public void testExecution() throws Exception {
    PipelineOptions options = PipelineOptionsFactory.fromArgs("--experiments=use_deprecated_read").create();
    options.setRunner(CrashingRunner.class);
    options.as(FlinkPipelineOptions.class).setFlinkMaster("[local]");
    options.as(FlinkPipelineOptions.class).setStreaming(isStreaming);
    options.as(FlinkPipelineOptions.class).setParallelism(2);
    options.as(PortablePipelineOptions.class).setDefaultEnvironmentType(Environments.ENVIRONMENT_EMBEDDED);
    Pipeline p = Pipeline.create(options);
    PCollection<Long> result = p.apply(Read.from(new Source(10))).apply(Window.into(FixedWindows.of(Duration.millis(1))));
    PAssert.that(result).containsInAnyOrder(ImmutableList.of(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L));
    SplittableParDo.convertReadBasedSplittableDoFnsToPrimitiveReads(p);
    RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p);
    List<RunnerApi.PTransform> readTransforms = pipelineProto.getComponents().getTransformsMap().values().stream().filter(transform -> transform.getSpec().getUrn().equals(PTransformTranslation.READ_TRANSFORM_URN)).collect(Collectors.toList());
    assertThat(readTransforms, not(empty()));
    // execute the pipeline
    JobInvocation jobInvocation = FlinkJobInvoker.create(null).createJobInvocation("fakeId", "fakeRetrievalToken", flinkJobExecutor, pipelineProto, options.as(FlinkPipelineOptions.class), new FlinkPipelineRunner(options.as(FlinkPipelineOptions.class), null, Collections.emptyList()));
    jobInvocation.start();
    while (jobInvocation.getState() != JobState.Enum.DONE) {
        assertThat(jobInvocation.getState(), not(JobState.Enum.FAILED));
        Thread.sleep(100);
    }
}

Also used : SerializableCoder(org.apache.beam.sdk.coders.SerializableCoder) BeforeClass(org.junit.BeforeClass) PortablePipelineOptions(org.apache.beam.sdk.options.PortablePipelineOptions) UnboundedSource(org.apache.beam.sdk.io.UnboundedSource) Matchers.not(org.hamcrest.Matchers.not) Duration(org.joda.time.Duration) RunWith(org.junit.runner.RunWith) Parameters(org.junit.runners.Parameterized.Parameters) LoggerFactory(org.slf4j.LoggerFactory) Coder(org.apache.beam.sdk.coders.Coder) PipelineTranslation(org.apache.beam.runners.core.construction.PipelineTranslation) PipelineOptionsFactory(org.apache.beam.sdk.options.PipelineOptionsFactory) Environments(org.apache.beam.runners.core.construction.Environments) JobInvocation(org.apache.beam.runners.jobsubmission.JobInvocation) Read(org.apache.beam.sdk.io.Read) Window(org.apache.beam.sdk.transforms.windowing.Window) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) Pipeline(org.apache.beam.sdk.Pipeline) NoSuchElementException(java.util.NoSuchElementException) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) Nullable(org.checkerframework.checker.nullness.qual.Nullable) Parameterized(org.junit.runners.Parameterized) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) Matchers.empty(org.hamcrest.Matchers.empty) AfterClass(org.junit.AfterClass) PTransformTranslation(org.apache.beam.runners.core.construction.PTransformTranslation) MoreExecutors(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.MoreExecutors) Logger(org.slf4j.Logger) PAssert(org.apache.beam.sdk.testing.PAssert) Parameter(org.junit.runners.Parameterized.Parameter) FixedWindows(org.apache.beam.sdk.transforms.windowing.FixedWindows) SplittableParDo(org.apache.beam.runners.core.construction.SplittableParDo) Test(org.junit.Test) PCollection(org.apache.beam.sdk.values.PCollection) Collectors(java.util.stream.Collectors) Executors(java.util.concurrent.Executors) Serializable(java.io.Serializable) TimeUnit(java.util.concurrent.TimeUnit) CrashingRunner(org.apache.beam.sdk.testing.CrashingRunner) List(java.util.List) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) Instant(org.joda.time.Instant) ListeningExecutorService(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ListeningExecutorService) ImmutableList(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList) Collections(java.util.Collections) JobState(org.apache.beam.model.jobmanagement.v1.JobApi.JobState) JobInvocation(org.apache.beam.runners.jobsubmission.JobInvocation) UnboundedSource(org.apache.beam.sdk.io.UnboundedSource) Pipeline(org.apache.beam.sdk.Pipeline) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) PortablePipelineOptions(org.apache.beam.sdk.options.PortablePipelineOptions) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) PortablePipelineOptions(org.apache.beam.sdk.options.PortablePipelineOptions) Test(org.junit.Test)

Example 94 with PCollection

use of org.apache.beam.sdk.values.PCollection in project beam by apache.

the class DirectGroupByKeyOverrideFactoryTest method getInputSucceeds.

@Test
public void getInputSucceeds() {
    TestPipeline p = TestPipeline.create();
    PCollection<KV<String, Integer>> input = p.apply(Create.of(KV.of("foo", 1)).withCoder(KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of())));
    PCollection<KV<String, Iterable<Integer>>> grouped = input.apply(GroupByKey.create());
    AppliedPTransform<?, ?, ?> producer = DirectGraphs.getProducer(grouped);
    PTransformReplacement<PCollection<KV<String, Integer>>, PCollection<KV<String, Iterable<Integer>>>> replacement = factory.getReplacementTransform((AppliedPTransform) producer);
    assertThat(replacement.getInput(), Matchers.<PCollection<?>>equalTo(input));
}

Also used : PCollection(org.apache.beam.sdk.values.PCollection) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) KV(org.apache.beam.sdk.values.KV) Test(org.junit.Test)

Example 95 with PCollection

use of org.apache.beam.sdk.values.PCollection in project beam by apache.

the class ReadSourceTranslatorBatch method translateTransform.

@SuppressWarnings("unchecked")
@Override
public void translateTransform(PTransform<PBegin, PCollection<T>> transform, AbstractTranslationContext context) {
    AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>> rootTransform = (AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>>) context.getCurrentTransform();
    BoundedSource<T> source;
    try {
        source = ReadTranslation.boundedSourceFromTransform(rootTransform);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
    SparkSession sparkSession = context.getSparkSession();
    String serializedSource = Base64Serializer.serializeUnchecked(source);
    Dataset<Row> rowDataset = sparkSession.read().format(sourceProviderClass).option(BEAM_SOURCE_OPTION, serializedSource).option(DEFAULT_PARALLELISM, String.valueOf(context.getSparkSession().sparkContext().defaultParallelism())).option(PIPELINE_OPTIONS, context.getSerializableOptions().toString()).load();
    // extract windowedValue from Row
    WindowedValue.FullWindowedValueCoder<T> windowedValueCoder = WindowedValue.FullWindowedValueCoder.of(source.getOutputCoder(), GlobalWindow.Coder.INSTANCE);
    Dataset<WindowedValue<T>> dataset = rowDataset.map(RowHelpers.extractWindowedValueFromRowMapFunction(windowedValueCoder), EncoderHelpers.fromBeamCoder(windowedValueCoder));
    PCollection<T> output = (PCollection<T>) context.getOutput();
    context.putDataset(output, dataset);
}

Also used : SparkSession(org.apache.spark.sql.SparkSession) IOException(java.io.IOException) PBegin(org.apache.beam.sdk.values.PBegin) PCollection(org.apache.beam.sdk.values.PCollection) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform) WindowedValue(org.apache.beam.sdk.util.WindowedValue) Row(org.apache.spark.sql.Row) PTransform(org.apache.beam.sdk.transforms.PTransform) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform)

Aggregations

PCollection (org.apache.beam.sdk.values.PCollection)199 Test (org.junit.Test)133 KV (org.apache.beam.sdk.values.KV)62 TestPipeline (org.apache.beam.sdk.testing.TestPipeline)61 Map (java.util.Map)59 List (java.util.List)58 Rule (org.junit.Rule)57 RunWith (org.junit.runner.RunWith)54 PAssert (org.apache.beam.sdk.testing.PAssert)52 Instant (org.joda.time.Instant)46 Duration (org.joda.time.Duration)45 JUnit4 (org.junit.runners.JUnit4)45 ParDo (org.apache.beam.sdk.transforms.ParDo)44 TupleTag (org.apache.beam.sdk.values.TupleTag)42 Pipeline (org.apache.beam.sdk.Pipeline)41 Create (org.apache.beam.sdk.transforms.Create)41 ArrayList (java.util.ArrayList)40 Serializable (java.io.Serializable)39 PTransform (org.apache.beam.sdk.transforms.PTransform)37 Row (org.apache.beam.sdk.values.Row)37