Search in sources :

Example 1 with StagedFile

use of org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile in project beam by apache.

the class DataflowRunner method stageArtifacts.

protected List<DataflowPackage> stageArtifacts(RunnerApi.Pipeline pipeline) {
    ImmutableList.Builder<StagedFile> filesToStageBuilder = ImmutableList.builder();
    Set<String> stagedNames = new HashSet<>();
    for (Map.Entry<String, RunnerApi.Environment> entry : pipeline.getComponents().getEnvironmentsMap().entrySet()) {
        for (RunnerApi.ArtifactInformation info : entry.getValue().getDependenciesList()) {
            if (!BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn())) {
                throw new RuntimeException(String.format("unsupported artifact type %s", info.getTypeUrn()));
            }
            RunnerApi.ArtifactFilePayload filePayload;
            try {
                filePayload = RunnerApi.ArtifactFilePayload.parseFrom(info.getTypePayload());
            } catch (InvalidProtocolBufferException e) {
                throw new RuntimeException("Error parsing artifact file payload.", e);
            }
            String stagedName;
            if (BeamUrns.getUrn(RunnerApi.StandardArtifacts.Roles.STAGING_TO).equals(info.getRoleUrn())) {
                try {
                    RunnerApi.ArtifactStagingToRolePayload stagingPayload = RunnerApi.ArtifactStagingToRolePayload.parseFrom(info.getRolePayload());
                    stagedName = stagingPayload.getStagedName();
                } catch (InvalidProtocolBufferException e) {
                    throw new RuntimeException("Error parsing artifact staging_to role payload.", e);
                }
            } else {
                try {
                    File source = new File(filePayload.getPath());
                    HashCode hashCode = Files.asByteSource(source).hash(Hashing.sha256());
                    stagedName = Environments.createStagingFileName(source, hashCode);
                } catch (IOException e) {
                    throw new RuntimeException(String.format("Error creating staged name for artifact %s", filePayload.getPath()), e);
                }
            }
            if (stagedNames.contains(stagedName)) {
                continue;
            } else {
                stagedNames.add(stagedName);
            }
            filesToStageBuilder.add(StagedFile.of(filePayload.getPath(), filePayload.getSha256(), stagedName));
        }
    }
    return options.getStager().stageFiles(filesToStageBuilder.build());
}
Also used : StagedFile(org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile) ImmutableList(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList) InvalidProtocolBufferException(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.InvalidProtocolBufferException) StringUtils.byteArrayToJsonString(org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString) IOException(java.io.IOException) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) HashCode(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) Map(java.util.Map) File(java.io.File) StagedFile(org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile) HashSet(java.util.HashSet)

Example 2 with StagedFile

use of org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile in project beam by apache.

the class PackageUtilTest method makeStagedFile.

private static StagedFile makeStagedFile(String source, String destName) throws IOException {
    File file = new File(source);
    File sourceFile;
    HashCode hashCode;
    if (file.exists()) {
        sourceFile = file.isDirectory() ? zipDirectory(file) : file;
        hashCode = Files.asByteSource(sourceFile).hash(Hashing.sha256());
    } else {
        sourceFile = file;
        hashCode = Hashing.sha256().hashBytes(new byte[] {});
    }
    String destination = destName == null ? Environments.createStagingFileName(file, hashCode) : destName;
    return StagedFile.of(sourceFile.getPath(), hashCode.toString(), destination);
}
Also used : HashCode(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode) StagedFile(org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile) File(java.io.File)

Example 3 with StagedFile

use of org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile in project beam by apache.

the class PackageUtilTest method makePackageAttributes.

private static PackageAttributes makePackageAttributes(File file, @Nullable String overridePackageName) throws IOException {
    StagedFile stagedFile = makeStagedFile(file.getPath());
    PackageAttributes attributes = PackageUtil.PackageAttributes.forFileToStage(stagedFile.getSource(), stagedFile.getSha256(), stagedFile.getDestination(), STAGING_PATH);
    if (overridePackageName != null) {
        attributes = attributes.withPackageName(overridePackageName);
    }
    return attributes;
}
Also used : StagedFile(org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile) PackageAttributes(org.apache.beam.runners.dataflow.util.PackageUtil.PackageAttributes)

Example 4 with StagedFile

use of org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile in project beam by apache.

the class PackageUtilTest method testPackageUploadWithLargeClasspathLogsWarning.

@Test
public void testPackageUploadWithLargeClasspathLogsWarning() throws Exception {
    File tmpFile = makeFileWithContents("file.txt", "This is a test!");
    when(mockGcsUtil.getObjects(anyListOf(GcsPath.class))).thenReturn(ImmutableList.of(StorageObjectOrIOException.create(createStorageObject(STAGING_PATH, tmpFile.length()))));
    List<StagedFile> classpathElements = Lists.newLinkedList();
    for (int i = 0; i < 1005; ++i) {
        String eltName = "element" + i;
        classpathElements.add(makeStagedFile(tmpFile.getAbsolutePath(), eltName));
    }
    defaultPackageUtil.stageClasspathElements(classpathElements, STAGING_PATH, createOptions);
    logged.verifyWarn("Your classpath contains 1005 elements, which Google Cloud Dataflow");
}
Also used : StagedFile(org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile) GcsPath(org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath) StagedFile(org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile) File(java.io.File) Test(org.junit.Test)

Aggregations

StagedFile (org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile)4 File (java.io.File)3 HashCode (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode)2 IOException (java.io.IOException)1 HashSet (java.util.HashSet)1 Map (java.util.Map)1 RunnerApi (org.apache.beam.model.pipeline.v1.RunnerApi)1 PackageAttributes (org.apache.beam.runners.dataflow.util.PackageUtil.PackageAttributes)1 GcsPath (org.apache.beam.sdk.extensions.gcp.util.gcsfs.GcsPath)1 StringUtils.byteArrayToJsonString (org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString)1 InvalidProtocolBufferException (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.InvalidProtocolBufferException)1 ImmutableList (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList)1 ImmutableMap (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap)1 Test (org.junit.Test)1