Search in sources :

Example 6 with PortablePipelineOptions

use of org.apache.beam.sdk.options.PortablePipelineOptions in project beam by apache.

the class EnvironmentsTest method createEnvironmentProcessFromEnvironmentConfig.

@Test
public void createEnvironmentProcessFromEnvironmentConfig() throws IOException {
    PortablePipelineOptions options = PipelineOptionsFactory.as(PortablePipelineOptions.class);
    options.setDefaultEnvironmentType(Environments.ENVIRONMENT_PROCESS);
    options.setDefaultEnvironmentConfig("{\"os\": \"linux\", \"arch\": \"amd64\", \"command\": \"run.sh\", \"env\":{\"k1\": \"v1\", \"k2\": \"v2\"} }");
    assertThat(Environments.createOrGetDefaultEnvironment(options), is(Environment.newBuilder().setUrn(BeamUrns.getUrn(StandardEnvironments.Environments.PROCESS)).setPayload(ProcessPayload.newBuilder().setOs("linux").setArch("amd64").setCommand("run.sh").putEnv("k1", "v1").putEnv("k2", "v2").build().toByteString()).addAllCapabilities(Environments.getJavaCapabilities()).build()));
    options.setDefaultEnvironmentType(Environments.ENVIRONMENT_PROCESS);
    options.setDefaultEnvironmentConfig("{\"command\": \"run.sh\"}");
    assertThat(Environments.createOrGetDefaultEnvironment(options), is(Environment.newBuilder().setUrn(BeamUrns.getUrn(StandardEnvironments.Environments.PROCESS)).setPayload(ProcessPayload.newBuilder().setCommand("run.sh").build().toByteString()).addAllCapabilities(Environments.getJavaCapabilities()).build()));
}
Also used : PortablePipelineOptions(org.apache.beam.sdk.options.PortablePipelineOptions) Test(org.junit.Test)

Example 7 with PortablePipelineOptions

use of org.apache.beam.sdk.options.PortablePipelineOptions in project beam by apache.

the class TestUniversalRunner method run.

@Override
public PipelineResult run(Pipeline pipeline) {
    Options testOptions = options.as(Options.class);
    if (testOptions.getLocalJobServicePortFile() != null) {
        String localServicePortFilePath = testOptions.getLocalJobServicePortFile();
        try {
            testOptions.setJobEndpoint("localhost:" + new String(Files.readAllBytes(Paths.get(localServicePortFilePath)), Charsets.UTF_8).trim());
        } catch (IOException e) {
            throw new RuntimeException(String.format("Error reading local job service port file %s", localServicePortFilePath), e);
        }
    }
    PortablePipelineOptions portableOptions = options.as(PortablePipelineOptions.class);
    portableOptions.setRunner(PortableRunner.class);
    PortableRunner runner = PortableRunner.fromOptions(portableOptions);
    PipelineResult result = runner.run(pipeline);
    assertThat("Pipeline did not succeed.", result.waitUntilFinish(), Matchers.is(PipelineResult.State.DONE));
    return result;
}
Also used : TestPipelineOptions(org.apache.beam.sdk.testing.TestPipelineOptions) PortablePipelineOptions(org.apache.beam.sdk.options.PortablePipelineOptions) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) PortablePipelineOptions(org.apache.beam.sdk.options.PortablePipelineOptions) PipelineResult(org.apache.beam.sdk.PipelineResult) IOException(java.io.IOException) PortableRunner(org.apache.beam.runners.portability.PortableRunner)

Example 8 with PortablePipelineOptions

use of org.apache.beam.sdk.options.PortablePipelineOptions in project beam by apache.

the class PortablePipelineJarCreator method run.

/**
 * <em>Does not actually run the pipeline.</em> Instead bundles the input pipeline along with all
 * dependencies, artifacts, etc. required to run the pipeline into a jar that can be executed
 * later.
 */
@Override
public PortablePipelineResult run(Pipeline pipeline, JobInfo jobInfo) throws Exception {
    PortablePipelineOptions pipelineOptions = PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions()).as(PortablePipelineOptions.class);
    final String jobName = jobInfo.jobName();
    File outputFile = new File(checkArgumentNotNull(pipelineOptions.getOutputExecutablePath()));
    LOG.info("Creating jar {} for job {}", outputFile.getAbsolutePath(), jobName);
    outputStream = new JarOutputStream(new FileOutputStream(outputFile), createManifest(mainClass, jobName));
    outputChannel = Channels.newChannel(outputStream);
    PortablePipelineJarUtils.writeDefaultJobName(outputStream, jobName);
    copyResourcesFromJar(new JarFile(mainClass.getProtectionDomain().getCodeSource().getLocation().getPath()));
    writeAsJson(PipelineOptionsTranslation.toProto(pipelineOptions), PortablePipelineJarUtils.getPipelineOptionsUri(jobName));
    Pipeline pipelineWithClasspathArtifacts = writeArtifacts(pipeline, jobName);
    writeAsJson(pipelineWithClasspathArtifacts, PortablePipelineJarUtils.getPipelineUri(jobName));
    // Closing the channel also closes the underlying stream.
    outputChannel.close();
    LOG.info("Jar {} created successfully.", outputFile.getAbsolutePath());
    return new JarCreatorPipelineResult();
}
Also used : PortablePipelineOptions(org.apache.beam.sdk.options.PortablePipelineOptions) FileOutputStream(java.io.FileOutputStream) JarOutputStream(java.util.jar.JarOutputStream) JarFile(java.util.jar.JarFile) JarFile(java.util.jar.JarFile) File(java.io.File) Pipeline(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline)

Example 9 with PortablePipelineOptions

use of org.apache.beam.sdk.options.PortablePipelineOptions in project flink by apache.

the class BeamPythonFunctionRunner method open.

// ------------------------------------------------------------------------
@Override
public void open(PythonConfig config) throws Exception {
    this.bundleStarted = false;
    this.resultBuffer = new LinkedBlockingQueue<>();
    this.reusableResultTuple = new Tuple2<>();
    // The creation of stageBundleFactory depends on the initialized environment manager.
    environmentManager.open();
    PortablePipelineOptions portableOptions = PipelineOptionsFactory.as(PortablePipelineOptions.class);
    if (jobOptions.containsKey(PythonOptions.STATE_CACHE_SIZE.key())) {
        portableOptions.as(ExperimentalOptions.class).setExperiments(Collections.singletonList(ExperimentalOptions.STATE_CACHE_SIZE + "=" + jobOptions.get(PythonOptions.STATE_CACHE_SIZE.key())));
    }
    Struct pipelineOptions = PipelineOptionsTranslation.toProto(portableOptions);
    if (memoryManager != null && config.isUsingManagedMemory()) {
        Preconditions.checkArgument(managedMemoryFraction > 0 && managedMemoryFraction <= 1.0, "The configured managed memory fraction for Python worker process must be within (0, 1], was: %s. " + "It may be because the consumer type \"Python\" was missing or set to 0 for the config option \"taskmanager.memory.managed.consumer-weights\"." + managedMemoryFraction);
        final LongFunctionWithException<PythonSharedResources, Exception> initializer = (size) -> new PythonSharedResources(createJobBundleFactory(pipelineOptions), createPythonExecutionEnvironment(size));
        sharedResources = memoryManager.getSharedMemoryResourceForManagedMemory(MANAGED_MEMORY_RESOURCE_ID, initializer, managedMemoryFraction);
        LOG.info("Obtained shared Python process of size {} bytes", sharedResources.getSize());
        sharedResources.getResourceHandle().addPythonEnvironmentManager(environmentManager);
        JobBundleFactory jobBundleFactory = sharedResources.getResourceHandle().getJobBundleFactory();
        RunnerApi.Environment environment = sharedResources.getResourceHandle().getEnvironment();
        stageBundleFactory = createStageBundleFactory(jobBundleFactory, environment);
    } else {
        // there is no way to access the MemoryManager for the batch job of old planner,
        // fallback to the way that spawning a Python process for each Python operator
        jobBundleFactory = createJobBundleFactory(pipelineOptions);
        stageBundleFactory = createStageBundleFactory(jobBundleFactory, createPythonExecutionEnvironment(-1));
    }
    progressHandler = getProgressHandler(flinkMetricContainer);
}
Also used : PythonOptions(org.apache.flink.python.PythonOptions) OpaqueMemoryResource(org.apache.flink.runtime.memory.OpaqueMemoryResource) Arrays(java.util.Arrays) WindowedValue(org.apache.beam.sdk.util.WindowedValue) Tuple2(org.apache.flink.api.java.tuple.Tuple2) PortablePipelineOptions(org.apache.beam.sdk.options.PortablePipelineOptions) LoggerFactory(org.slf4j.LoggerFactory) TimerInternals(org.apache.beam.runners.core.TimerInternals) UserStateReference(org.apache.beam.runners.core.construction.graph.UserStateReference) PythonFunctionRunner(org.apache.flink.python.PythonFunctionRunner) WINDOW_CODER_ID(org.apache.flink.python.Constants.WINDOW_CODER_ID) SideInputReference(org.apache.beam.runners.core.construction.graph.SideInputReference) JobBundleFactory(org.apache.beam.runners.fnexecution.control.JobBundleFactory) Iterables(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables) Map(java.util.Map) TimerReference(org.apache.beam.runners.core.construction.graph.TimerReference) GlobalWindow(org.apache.beam.sdk.transforms.windowing.GlobalWindow) FlinkFnApi(org.apache.flink.fnexecution.v1.FlinkFnApi) JobInfo(org.apache.beam.runners.fnexecution.provisioning.JobInfo) TimerReceiverFactory(org.apache.beam.runners.fnexecution.control.TimerReceiverFactory) TimerRegistration(org.apache.flink.streaming.api.operators.python.timer.TimerRegistration) INPUT_COLLECTION_ID(org.apache.flink.python.Constants.INPUT_COLLECTION_ID) TypeSerializer(org.apache.flink.api.common.typeutils.TypeSerializer) StageBundleFactory(org.apache.beam.runners.fnexecution.control.StageBundleFactory) PythonEnvironment(org.apache.flink.python.env.PythonEnvironment) FnDataReceiver(org.apache.beam.sdk.fn.data.FnDataReceiver) Collection(java.util.Collection) ImmutableExecutableStage(org.apache.beam.runners.core.construction.graph.ImmutableExecutableStage) BundleProgressHandler(org.apache.beam.runners.fnexecution.control.BundleProgressHandler) FlinkMetricContainer(org.apache.flink.python.metric.FlinkMetricContainer) BeamFnApi(org.apache.beam.model.fnexecution.v1.BeamFnApi) ExecutableStage(org.apache.beam.runners.core.construction.graph.ExecutableStage) Preconditions(org.apache.flink.util.Preconditions) LinkedBlockingQueue(java.util.concurrent.LinkedBlockingQueue) Collectors(java.util.stream.Collectors) ModelCoders(org.apache.beam.runners.core.construction.ModelCoders) LongFunctionWithException(org.apache.flink.util.function.LongFunctionWithException) List(java.util.List) WINDOW_STRATEGY(org.apache.flink.python.Constants.WINDOW_STRATEGY) Optional(java.util.Optional) OUTPUT_COLLECTION_ID(org.apache.flink.python.Constants.OUTPUT_COLLECTION_ID) MemoryManager(org.apache.flink.runtime.memory.MemoryManager) ByteArrayOutputStream(java.io.ByteArrayOutputStream) ExperimentalOptions(org.apache.beam.sdk.options.ExperimentalOptions) Coder(org.apache.beam.sdk.coders.Coder) ProcessPythonEnvironmentManager(org.apache.flink.python.env.process.ProcessPythonEnvironmentManager) PipelineOptionsTranslation(org.apache.beam.runners.core.construction.PipelineOptionsTranslation) PipelineOptionsFactory(org.apache.beam.sdk.options.PipelineOptionsFactory) Environments(org.apache.beam.runners.core.construction.Environments) WRAPPER_TIMER_CODER_ID(org.apache.flink.python.Constants.WRAPPER_TIMER_CODER_ID) RemoteBundle(org.apache.beam.runners.fnexecution.control.RemoteBundle) BiConsumer(java.util.function.BiConsumer) DefaultJobBundleFactory(org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory) StateRequestHandler(org.apache.beam.runners.fnexecution.state.StateRequestHandler) Nullable(javax.annotation.Nullable) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) Logger(org.slf4j.Logger) ProtoUtils.createCoderProto(org.apache.flink.streaming.api.utils.ProtoUtils.createCoderProto) OutputReceiverFactory(org.apache.beam.runners.fnexecution.control.OutputReceiverFactory) ProcessPythonEnvironment(org.apache.flink.python.env.process.ProcessPythonEnvironment) IOException(java.io.IOException) KeyedStateBackend(org.apache.flink.runtime.state.KeyedStateBackend) VisibleForTesting(org.apache.flink.annotation.VisibleForTesting) Timer(org.apache.beam.runners.core.construction.Timer) ByteArrayCoder(org.apache.beam.sdk.coders.ByteArrayCoder) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) PipelineNode(org.apache.beam.runners.core.construction.graph.PipelineNode) TIMER_CODER_ID(org.apache.flink.python.Constants.TIMER_CODER_ID) Internal(org.apache.flink.annotation.Internal) Struct(org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct) PythonConfig(org.apache.flink.python.PythonConfig) Collections(java.util.Collections) BeamUrns.getUrn(org.apache.beam.runners.core.construction.BeamUrns.getUrn) JobBundleFactory(org.apache.beam.runners.fnexecution.control.JobBundleFactory) DefaultJobBundleFactory(org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) PortablePipelineOptions(org.apache.beam.sdk.options.PortablePipelineOptions) ExperimentalOptions(org.apache.beam.sdk.options.ExperimentalOptions) LongFunctionWithException(org.apache.flink.util.function.LongFunctionWithException) IOException(java.io.IOException) Struct(org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.Struct)

Example 10 with PortablePipelineOptions

use of org.apache.beam.sdk.options.PortablePipelineOptions in project beam by apache.

the class SdkComponents method create.

public static SdkComponents create(PipelineOptions options) {
    SdkComponents sdkComponents = new SdkComponents(RunnerApi.Components.getDefaultInstance(), null, "");
    PortablePipelineOptions portablePipelineOptions = options.as(PortablePipelineOptions.class);
    sdkComponents.registerEnvironment(Environments.createOrGetDefaultEnvironment(portablePipelineOptions));
    return sdkComponents;
}
Also used : PortablePipelineOptions(org.apache.beam.sdk.options.PortablePipelineOptions)

Aggregations

PortablePipelineOptions (org.apache.beam.sdk.options.PortablePipelineOptions)21 Test (org.junit.Test)10 Struct (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.Struct)4 IOException (java.io.IOException)3 StateRequestHandler (org.apache.beam.runners.fnexecution.state.StateRequestHandler)3 ExperimentalOptions (org.apache.beam.sdk.options.ExperimentalOptions)3 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)3 Optional (java.util.Optional)2 Environment (org.apache.beam.model.pipeline.v1.RunnerApi.Environment)2 EnvironmentFactory (org.apache.beam.runners.fnexecution.environment.EnvironmentFactory)2 Provider (org.apache.beam.runners.fnexecution.environment.EnvironmentFactory.Provider)2 RemoteEnvironment (org.apache.beam.runners.fnexecution.environment.RemoteEnvironment)2 ServerFactory (org.apache.beam.sdk.fn.server.ServerFactory)2 Matchers.containsString (org.hamcrest.Matchers.containsString)2 ByteArrayOutputStream (java.io.ByteArrayOutputStream)1 File (java.io.File)1 FileOutputStream (java.io.FileOutputStream)1 Arrays (java.util.Arrays)1 Collection (java.util.Collection)1 Collections (java.util.Collections)1