Search in sources :

Example 11 with SdkComponents

use of org.apache.beam.runners.core.construction.SdkComponents in project beam by apache.

the class WindowUtilsTest method testGetWindowStrategy.

@Test
public void testGetWindowStrategy() throws IOException {
    SdkComponents components = SdkComponents.create();
    String environmentId = components.registerEnvironment(Environments.createDockerEnvironment("java"));
    WindowingStrategy<Object, IntervalWindow> expected = WindowingStrategy.of(FixedWindows.of(Duration.standardMinutes(1))).withMode(WindowingStrategy.AccumulationMode.DISCARDING_FIRED_PANES).withTimestampCombiner(TimestampCombiner.END_OF_WINDOW).withAllowedLateness(Duration.ZERO).withEnvironmentId(environmentId);
    components.registerWindowingStrategy(expected);
    String collectionId = components.registerPCollection(PCollection.createPrimitiveOutputInternal(Pipeline.create(), expected, PCollection.IsBounded.BOUNDED, VoidCoder.of()).setName("name"));
    WindowingStrategy<?, ?> actual = WindowUtils.getWindowStrategy(collectionId, components.toComponents());
    assertEquals(expected, actual);
}
Also used : SdkComponents(org.apache.beam.runners.core.construction.SdkComponents) IntervalWindow(org.apache.beam.sdk.transforms.windowing.IntervalWindow) Test(org.junit.Test)

Example 12 with SdkComponents

use of org.apache.beam.runners.core.construction.SdkComponents in project beam by apache.

the class WorkerCustomSourcesTest method translateIOToCloudSource.

static com.google.api.services.dataflow.model.Source translateIOToCloudSource(BoundedSource<?> io, DataflowPipelineOptions options) throws Exception {
    DataflowPipelineTranslator translator = DataflowPipelineTranslator.fromOptions(options);
    Pipeline p = Pipeline.create(options);
    p.begin().apply(Read.from(io));
    // Note that we specifically perform this replacement since this is what the DataflowRunner
    // does and the DataflowRunner class does not expose a way to perform these replacements
    // without running the pipeline.
    p.replaceAll(Collections.singletonList(SplittableParDo.PRIMITIVE_BOUNDED_READ_OVERRIDE));
    DataflowRunner runner = DataflowRunner.fromOptions(options);
    SdkComponents sdkComponents = SdkComponents.create();
    RunnerApi.Environment defaultEnvironmentForDataflow = Environments.createDockerEnvironment("dummy-image-url");
    sdkComponents.registerEnvironment(defaultEnvironmentForDataflow);
    RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p, sdkComponents, true);
    Job workflow = translator.translate(p, pipelineProto, sdkComponents, runner, new ArrayList<DataflowPackage>()).getJob();
    Step step = workflow.getSteps().get(0);
    return stepToCloudSource(step);
}
Also used : RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) ArrayList(java.util.ArrayList) DataflowRunner(org.apache.beam.runners.dataflow.DataflowRunner) Step(com.google.api.services.dataflow.model.Step) DataflowPipelineTranslator(org.apache.beam.runners.dataflow.DataflowPipelineTranslator) SdkComponents(org.apache.beam.runners.core.construction.SdkComponents) Job(com.google.api.services.dataflow.model.Job) Pipeline(org.apache.beam.sdk.Pipeline)

Example 13 with SdkComponents

use of org.apache.beam.runners.core.construction.SdkComponents in project beam by apache.

the class LengthPrefixUnknownCodersTest method test.

@Test
public void test() throws IOException {
    SdkComponents sdkComponents = SdkComponents.create();
    sdkComponents.registerEnvironment(Environments.createDockerEnvironment("java"));
    String coderId = sdkComponents.registerCoder(original);
    Components.Builder components = sdkComponents.toComponents().toBuilder();
    String updatedCoderId = LengthPrefixUnknownCoders.addLengthPrefixedCoder(coderId, components, replaceWithByteArray);
    assertEquals(expected, RehydratedComponents.forComponents(components.build()).getCoder(updatedCoderId));
}
Also used : SdkComponents(org.apache.beam.runners.core.construction.SdkComponents) RehydratedComponents(org.apache.beam.runners.core.construction.RehydratedComponents) Components(org.apache.beam.model.pipeline.v1.RunnerApi.Components) SdkComponents(org.apache.beam.runners.core.construction.SdkComponents) Test(org.junit.Test)

Example 14 with SdkComponents

use of org.apache.beam.runners.core.construction.SdkComponents in project beam by apache.

the class DataflowRunnerTest method testSdkHarnessConfiguration.

@Test
public void testSdkHarnessConfiguration() throws IOException {
    DataflowPipelineOptions options = buildPipelineOptions();
    ExperimentalOptions.addExperiment(options, "use_runner_v2");
    Pipeline p = Pipeline.create(options);
    p.apply(Create.of(Arrays.asList(1, 2, 3)));
    String defaultSdkContainerImage = DataflowRunner.getContainerImageForJob(options);
    SdkComponents sdkComponents = SdkComponents.create();
    RunnerApi.Environment defaultEnvironmentForDataflow = Environments.createDockerEnvironment(defaultSdkContainerImage);
    sdkComponents.registerEnvironment(defaultEnvironmentForDataflow.toBuilder().build());
    RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p, sdkComponents, true);
    Job job = DataflowPipelineTranslator.fromOptions(options).translate(p, pipelineProto, sdkComponents, DataflowRunner.fromOptions(options), Collections.emptyList()).getJob();
    DataflowRunner.configureSdkHarnessContainerImages(options, pipelineProto, job);
    List<SdkHarnessContainerImage> sdks = job.getEnvironment().getWorkerPools().get(0).getSdkHarnessContainerImages();
    Map<String, String> expectedEnvIdsAndContainerImages = pipelineProto.getComponents().getEnvironmentsMap().entrySet().stream().filter(x -> BeamUrns.getUrn(RunnerApi.StandardEnvironments.Environments.DOCKER).equals(x.getValue().getUrn())).collect(Collectors.toMap(x -> x.getKey(), x -> {
        RunnerApi.DockerPayload payload;
        try {
            payload = RunnerApi.DockerPayload.parseFrom(x.getValue().getPayload());
        } catch (InvalidProtocolBufferException e) {
            throw new RuntimeException(e);
        }
        return payload.getContainerImage();
    }));
    assertEquals(1, expectedEnvIdsAndContainerImages.size());
    assertEquals(1, sdks.size());
    assertEquals(expectedEnvIdsAndContainerImages, sdks.stream().collect(Collectors.toMap(SdkHarnessContainerImage::getEnvironmentId, SdkHarnessContainerImage::getContainerImage)));
}
Also used : ExpectedLogs(org.apache.beam.sdk.testing.ExpectedLogs) Arrays(java.util.Arrays) Matchers.not(org.hamcrest.Matchers.not) ValueState(org.apache.beam.sdk.state.ValueState) SimpleFunction(org.apache.beam.sdk.transforms.SimpleFunction) DefaultGcpRegionFactory(org.apache.beam.runners.dataflow.options.DefaultGcpRegionFactory) Create(org.apache.beam.sdk.transforms.Create) DataflowPipelineDebugOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineDebugOptions) Map(java.util.Map) Node(org.apache.beam.sdk.runners.TransformHierarchy.Node) Window(org.apache.beam.sdk.transforms.windowing.Window) JsonSerializer(com.fasterxml.jackson.databind.JsonSerializer) JsonNode(com.fasterxml.jackson.databind.JsonNode) Dataflow(com.google.api.services.dataflow.Dataflow) TimestampedValue(org.apache.beam.sdk.values.TimestampedValue) ValueProvider(org.apache.beam.sdk.options.ValueProvider) Files.getFileExtension(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Files.getFileExtension) GcpOptions(org.apache.beam.sdk.extensions.gcp.options.GcpOptions) StreamingOptions(org.apache.beam.sdk.options.StreamingOptions) GcsUtil(org.apache.beam.sdk.extensions.gcp.util.GcsUtil) ShardedKey(org.apache.beam.sdk.util.ShardedKey) NoopCredentialFactory(org.apache.beam.sdk.extensions.gcp.auth.NoopCredentialFactory) Category(org.junit.experimental.categories.Category) Serializable(java.io.Serializable) Matchers.any(org.mockito.Matchers.any) Assert.assertFalse(org.junit.Assert.assertFalse) StreamingShardedWriteFactory(org.apache.beam.runners.dataflow.DataflowRunner.StreamingShardedWriteFactory) NoopPathValidator(org.apache.beam.sdk.extensions.gcp.storage.NoopPathValidator) PipelineVisitor(org.apache.beam.sdk.Pipeline.PipelineVisitor) Matchers.is(org.hamcrest.Matchers.is) Matchers.containsString(org.hamcrest.Matchers.containsString) Matchers.endsWith(org.hamcrest.Matchers.endsWith) JsonDeserialize(com.fasterxml.jackson.databind.annotation.JsonDeserialize) Mockito.mock(org.mockito.Mockito.mock) KV(org.apache.beam.sdk.values.KV) DataflowPipelineWorkerPoolOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions) SetState(org.apache.beam.sdk.state.SetState) ExperimentalOptions(org.apache.beam.sdk.options.ExperimentalOptions) JsonGenerator(com.fasterxml.jackson.core.JsonGenerator) Assume.assumeFalse(org.junit.Assume.assumeFalse) Duration(org.joda.time.Duration) RunWith(org.junit.runner.RunWith) Environments(org.apache.beam.runners.core.construction.Environments) ArrayList(java.util.ArrayList) Matchers.hasProperty(org.hamcrest.Matchers.hasProperty) FileBasedSink(org.apache.beam.sdk.io.FileBasedSink) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) ValidatesRunner(org.apache.beam.sdk.testing.ValidatesRunner) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) Pipeline(org.apache.beam.sdk.Pipeline) PowerMockito(org.powermock.api.mockito.PowerMockito) AppliedPTransform(org.apache.beam.sdk.runners.AppliedPTransform) InvalidProtocolBufferException(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.InvalidProtocolBufferException) Before(org.junit.Before) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) DoFn(org.apache.beam.sdk.transforms.DoFn) DeserializationContext(com.fasterxml.jackson.databind.DeserializationContext) ResourceId(org.apache.beam.sdk.io.fs.ResourceId) Files(java.nio.file.Files) PAssert(org.apache.beam.sdk.testing.PAssert) Assert.assertTrue(org.junit.Assert.assertTrue) IOException(java.io.IOException) Test(org.junit.Test) DataflowPipelineOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineOptions) File(java.io.File) BeamUrns(org.apache.beam.runners.core.construction.BeamUrns) ResourceHints(org.apache.beam.sdk.transforms.resourcehints.ResourceHints) ReplacementOutput(org.apache.beam.sdk.runners.PTransformOverrideFactory.ReplacementOutput) Matchers.hasItem(org.hamcrest.Matchers.hasItem) Assert.assertNull(org.junit.Assert.assertNull) AutoService(com.google.auto.service.AutoService) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) VoidCoder(org.apache.beam.sdk.coders.VoidCoder) FileSystems(org.apache.beam.sdk.io.FileSystems) Assert.assertEquals(org.junit.Assert.assertEquals) TextIO(org.apache.beam.sdk.io.TextIO) Module(com.fasterxml.jackson.databind.Module) DynamicFileDestinations(org.apache.beam.sdk.io.DynamicFileDestinations) StateSpec(org.apache.beam.sdk.state.StateSpec) UsesStatefulParDo(org.apache.beam.sdk.testing.UsesStatefulParDo) WriteFilesResult(org.apache.beam.sdk.io.WriteFilesResult) SdkHarnessContainerImage(com.google.api.services.dataflow.model.SdkHarnessContainerImage) WriteFiles(org.apache.beam.sdk.io.WriteFiles) Matchers.hasKey(org.hamcrest.Matchers.hasKey) Job(com.google.api.services.dataflow.model.Job) Sessions(org.apache.beam.sdk.transforms.windowing.Sessions) SimpleModule(com.fasterxml.jackson.databind.module.SimpleModule) Matchers.eq(org.mockito.Matchers.eq) ListJobsResponse(com.google.api.services.dataflow.model.ListJobsResponse) Iterables(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables) Assert.fail(org.junit.Assert.fail) JsonDeserializer(com.fasterxml.jackson.databind.JsonDeserializer) StorageObject(com.google.api.services.storage.model.StorageObject) MapElements(org.apache.beam.sdk.transforms.MapElements) Matchers.isA(org.mockito.Matchers.isA) Matchers.lessThanOrEqualTo(org.hamcrest.Matchers.lessThanOrEqualTo) PaneInfo(org.apache.beam.sdk.transforms.windowing.PaneInfo) BigEndianIntegerCoder(org.apache.beam.sdk.coders.BigEndianIntegerCoder) StandardOpenOption(java.nio.file.StandardOpenOption) PValues(org.apache.beam.sdk.values.PValues) Collectors(java.util.stream.Collectors) CheckEnabled(org.apache.beam.sdk.options.PipelineOptions.CheckEnabled) TransformHierarchy(org.apache.beam.sdk.runners.TransformHierarchy) TypeSafeMatcher(org.hamcrest.TypeSafeMatcher) FileNotFoundException(java.io.FileNotFoundException) List(java.util.List) ParDo(org.apache.beam.sdk.transforms.ParDo) Matchers.containsInAnyOrder(org.hamcrest.Matchers.containsInAnyOrder) Matchers.equalTo(org.hamcrest.Matchers.equalTo) ImmutableList(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList) Pattern(java.util.regex.Pattern) WindowingStrategy(org.apache.beam.sdk.values.WindowingStrategy) DataflowPackage(com.google.api.services.dataflow.model.DataflowPackage) PowerMockito.mockStatic(org.powermock.api.mockito.PowerMockito.mockStatic) Assert.assertThrows(org.junit.Assert.assertThrows) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) HashMap(java.util.HashMap) PipelineTranslation(org.apache.beam.runners.core.construction.PipelineTranslation) PipelineOptionsFactory(org.apache.beam.sdk.options.PipelineOptionsFactory) SerializableFunctions(org.apache.beam.sdk.transforms.SerializableFunctions) DataflowRunner.getContainerImageForJob(org.apache.beam.runners.dataflow.DataflowRunner.getContainerImageForJob) PTransform(org.apache.beam.sdk.transforms.PTransform) Matchers.matchesRegex(org.hamcrest.Matchers.matchesRegex) ArgumentCaptor(org.mockito.ArgumentCaptor) MapState(org.apache.beam.sdk.state.MapState) JsonSerialize(com.fasterxml.jackson.databind.annotation.JsonSerialize) PrepareForTest(org.powermock.core.classloader.annotations.PrepareForTest) SerializerProvider(com.fasterxml.jackson.databind.SerializerProvider) PowerMockRunner(org.powermock.modules.junit4.PowerMockRunner) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) ExpectedException(org.junit.rules.ExpectedException) Nullable(org.checkerframework.checker.nullness.qual.Nullable) GroupIntoBatches(org.apache.beam.sdk.transforms.GroupIntoBatches) Matchers.hasEntry(org.hamcrest.Matchers.hasEntry) GcsPath(org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath) Description(org.hamcrest.Description) SdkComponents(org.apache.beam.runners.core.construction.SdkComponents) JsonParser(com.fasterxml.jackson.core.JsonParser) Assert.assertNotNull(org.junit.Assert.assertNotNull) Matchers.anyListOf(org.mockito.Matchers.anyListOf) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper) Matchers(org.hamcrest.Matchers) JsonProcessingException(com.fasterxml.jackson.core.JsonProcessingException) Mockito.when(org.mockito.Mockito.when) JUnit4(org.junit.runners.JUnit4) PCollection(org.apache.beam.sdk.values.PCollection) TestCredential(org.apache.beam.sdk.extensions.gcp.auth.TestCredential) Mockito(org.mockito.Mockito) StateSpecs(org.apache.beam.sdk.state.StateSpecs) Rule(org.junit.Rule) Instant(org.joda.time.Instant) StaticValueProvider(org.apache.beam.sdk.options.ValueProvider.StaticValueProvider) FileChannel(java.nio.channels.FileChannel) Collections(java.util.Collections) TemporaryFolder(org.junit.rules.TemporaryFolder) PropertyNames(org.apache.beam.runners.dataflow.util.PropertyNames) DataflowPipelineOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineOptions) SdkHarnessContainerImage(com.google.api.services.dataflow.model.SdkHarnessContainerImage) InvalidProtocolBufferException(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.InvalidProtocolBufferException) Matchers.containsString(org.hamcrest.Matchers.containsString) SdkComponents(org.apache.beam.runners.core.construction.SdkComponents) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Pipeline(org.apache.beam.sdk.Pipeline) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) Job(com.google.api.services.dataflow.model.Job) DataflowRunner.getContainerImageForJob(org.apache.beam.runners.dataflow.DataflowRunner.getContainerImageForJob) Test(org.junit.Test) PrepareForTest(org.powermock.core.classloader.annotations.PrepareForTest)

Example 15 with SdkComponents

use of org.apache.beam.runners.core.construction.SdkComponents in project beam by apache.

the class DataflowPipelineTranslatorTest method testMultiGraphPipelineSerialization.

@Test
public void testMultiGraphPipelineSerialization() throws Exception {
    DataflowPipelineOptions options = buildPipelineOptions();
    Pipeline p = Pipeline.create(options);
    PCollection<Integer> input = p.begin().apply(Create.of(1, 2, 3));
    input.apply(new UnrelatedOutputCreator());
    input.apply(new UnboundOutputCreator());
    DataflowPipelineTranslator t = DataflowPipelineTranslator.fromOptions(PipelineOptionsFactory.as(DataflowPipelineOptions.class));
    // Check that translation doesn't fail.
    SdkComponents sdkComponents = createSdkComponents(options);
    RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p, sdkComponents, true);
    JobSpecification jobSpecification = t.translate(p, pipelineProto, sdkComponents, DataflowRunner.fromOptions(options), Collections.emptyList());
    assertAllStepOutputsHaveUniqueIds(jobSpecification.getJob());
}
Also used : RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) DataflowPipelineOptions(org.apache.beam.runners.dataflow.options.DataflowPipelineOptions) JobSpecification(org.apache.beam.runners.dataflow.DataflowPipelineTranslator.JobSpecification) SdkComponents(org.apache.beam.runners.core.construction.SdkComponents) Pipeline(org.apache.beam.sdk.Pipeline) Test(org.junit.Test)

Aggregations

SdkComponents (org.apache.beam.runners.core.construction.SdkComponents)61 RunnerApi (org.apache.beam.model.pipeline.v1.RunnerApi)48 Test (org.junit.Test)46 Pipeline (org.apache.beam.sdk.Pipeline)37 DataflowPipelineOptions (org.apache.beam.runners.dataflow.options.DataflowPipelineOptions)36 Job (com.google.api.services.dataflow.model.Job)25 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)25 Structs.getString (org.apache.beam.runners.dataflow.util.Structs.getString)21 KV (org.apache.beam.sdk.values.KV)14 Map (java.util.Map)12 Step (com.google.api.services.dataflow.model.Step)11 ArrayList (java.util.ArrayList)11 List (java.util.List)9 CloudObject (org.apache.beam.runners.dataflow.util.CloudObject)9 HashMap (java.util.HashMap)8 ImmutableList (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList)8 WindowedValue (org.apache.beam.sdk.util.WindowedValue)7 ImmutableMap (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap)7 InstructionOutput (com.google.api.services.dataflow.model.InstructionOutput)6 ParDoInstruction (com.google.api.services.dataflow.model.ParDoInstruction)6