Search in sources :

Example 1 with HashCode

use of org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode in project beam by apache.

the class BigqueryMatcher method generateHash.

private String generateHash(@Nonnull List<TableRow> rows) {
    List<HashCode> rowHashes = Lists.newArrayList();
    for (TableRow row : rows) {
        List<String> cellsInOneRow = Lists.newArrayList();
        for (TableCell cell : row.getF()) {
            cellsInOneRow.add(Objects.toString(cell.getV()));
            Collections.sort(cellsInOneRow);
        }
        rowHashes.add(Hashing.sha1().hashString(cellsInOneRow.toString(), StandardCharsets.UTF_8));
    }
    return Hashing.combineUnordered(rowHashes).toString();
}
Also used : HashCode(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode) TableCell(com.google.api.services.bigquery.model.TableCell) TableRow(com.google.api.services.bigquery.model.TableRow)

Example 2 with HashCode

use of org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode in project beam by apache.

the class HashingFn method addInput.

@Override
public Accum addInput(Accum accum, String input) {
    List<HashCode> elementHashes = Lists.newArrayList();
    if (accum.hashCode != null) {
        elementHashes.add(accum.hashCode);
    }
    HashCode inputHashCode = Hashing.murmur3_128().hashString(input, StandardCharsets.UTF_8);
    elementHashes.add(inputHashCode);
    accum.hashCode = Hashing.combineUnordered(elementHashes);
    return accum;
}
Also used : HashCode(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode)

Example 3 with HashCode

use of org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode in project beam by apache.

the class Environments method getArtifacts.

public static List<ArtifactInformation> getArtifacts(List<String> stagingFiles) {
    ImmutableList.Builder<ArtifactInformation> artifactsBuilder = ImmutableList.builder();
    Set<String> deduplicatedStagingFiles = new LinkedHashSet<>(stagingFiles);
    for (String path : deduplicatedStagingFiles) {
        File file;
        String stagedName = null;
        if (path.contains("=")) {
            String[] components = path.split("=", 2);
            file = new File(components[1]);
            stagedName = components[0];
        } else {
            file = new File(path);
        }
        // Spurious items get added to the classpath. Filter by just those that exist.
        if (file.exists()) {
            ArtifactInformation.Builder artifactBuilder = ArtifactInformation.newBuilder();
            artifactBuilder.setTypeUrn(BeamUrns.getUrn(StandardArtifacts.Types.FILE));
            artifactBuilder.setRoleUrn(BeamUrns.getUrn(StandardArtifacts.Roles.STAGING_TO));
            HashCode hashCode;
            if (file.isDirectory()) {
                File zippedFile;
                try {
                    zippedFile = zipDirectory(file);
                    hashCode = Files.asByteSource(zippedFile).hash(Hashing.sha256());
                } catch (IOException e) {
                    throw new RuntimeException(e);
                }
                artifactBuilder.setTypePayload(RunnerApi.ArtifactFilePayload.newBuilder().setPath(zippedFile.getPath()).setSha256(hashCode.toString()).build().toByteString());
            } else {
                try {
                    hashCode = Files.asByteSource(file).hash(Hashing.sha256());
                } catch (IOException e) {
                    throw new RuntimeException(e);
                }
                artifactBuilder.setTypePayload(RunnerApi.ArtifactFilePayload.newBuilder().setPath(file.getPath()).setSha256(hashCode.toString()).build().toByteString());
            }
            if (stagedName == null) {
                stagedName = createStagingFileName(file, hashCode);
            }
            artifactBuilder.setRolePayload(RunnerApi.ArtifactStagingToRolePayload.newBuilder().setStagedName(stagedName).build().toByteString());
            artifactsBuilder.add(artifactBuilder.build());
        }
    }
    return artifactsBuilder.build();
}
Also used : LinkedHashSet(java.util.LinkedHashSet) HashCode(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode) ImmutableList(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList) ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) IOException(java.io.IOException) ArtifactInformation(org.apache.beam.model.pipeline.v1.RunnerApi.ArtifactInformation) File(java.io.File)

Example 4 with HashCode

use of org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode in project beam by apache.

the class DataflowRunner method stageArtifacts.

protected List<DataflowPackage> stageArtifacts(RunnerApi.Pipeline pipeline) {
    ImmutableList.Builder<StagedFile> filesToStageBuilder = ImmutableList.builder();
    Set<String> stagedNames = new HashSet<>();
    for (Map.Entry<String, RunnerApi.Environment> entry : pipeline.getComponents().getEnvironmentsMap().entrySet()) {
        for (RunnerApi.ArtifactInformation info : entry.getValue().getDependenciesList()) {
            if (!BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn())) {
                throw new RuntimeException(String.format("unsupported artifact type %s", info.getTypeUrn()));
            }
            RunnerApi.ArtifactFilePayload filePayload;
            try {
                filePayload = RunnerApi.ArtifactFilePayload.parseFrom(info.getTypePayload());
            } catch (InvalidProtocolBufferException e) {
                throw new RuntimeException("Error parsing artifact file payload.", e);
            }
            String stagedName;
            if (BeamUrns.getUrn(RunnerApi.StandardArtifacts.Roles.STAGING_TO).equals(info.getRoleUrn())) {
                try {
                    RunnerApi.ArtifactStagingToRolePayload stagingPayload = RunnerApi.ArtifactStagingToRolePayload.parseFrom(info.getRolePayload());
                    stagedName = stagingPayload.getStagedName();
                } catch (InvalidProtocolBufferException e) {
                    throw new RuntimeException("Error parsing artifact staging_to role payload.", e);
                }
            } else {
                try {
                    File source = new File(filePayload.getPath());
                    HashCode hashCode = Files.asByteSource(source).hash(Hashing.sha256());
                    stagedName = Environments.createStagingFileName(source, hashCode);
                } catch (IOException e) {
                    throw new RuntimeException(String.format("Error creating staged name for artifact %s", filePayload.getPath()), e);
                }
            }
            if (stagedNames.contains(stagedName)) {
                continue;
            } else {
                stagedNames.add(stagedName);
            }
            filesToStageBuilder.add(StagedFile.of(filePayload.getPath(), filePayload.getSha256(), stagedName));
        }
    }
    return options.getStager().stageFiles(filesToStageBuilder.build());
}
Also used : StagedFile(org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile) ImmutableList(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList) InvalidProtocolBufferException(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.InvalidProtocolBufferException) StringUtils.byteArrayToJsonString(org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString) IOException(java.io.IOException) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) HashCode(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) Map(java.util.Map) File(java.io.File) StagedFile(org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile) HashSet(java.util.HashSet)

Example 5 with HashCode

use of org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode in project beam by apache.

the class DataflowRunner method resolveArtifacts.

@VisibleForTesting
protected RunnerApi.Pipeline resolveArtifacts(RunnerApi.Pipeline pipeline) {
    RunnerApi.Pipeline.Builder pipelineBuilder = pipeline.toBuilder();
    RunnerApi.Components.Builder componentsBuilder = pipelineBuilder.getComponentsBuilder();
    componentsBuilder.clearEnvironments();
    for (Map.Entry<String, RunnerApi.Environment> entry : pipeline.getComponents().getEnvironmentsMap().entrySet()) {
        RunnerApi.Environment.Builder environmentBuilder = entry.getValue().toBuilder();
        environmentBuilder.clearDependencies();
        for (RunnerApi.ArtifactInformation info : entry.getValue().getDependenciesList()) {
            if (!BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn())) {
                throw new RuntimeException(String.format("unsupported artifact type %s", info.getTypeUrn()));
            }
            RunnerApi.ArtifactFilePayload filePayload;
            try {
                filePayload = RunnerApi.ArtifactFilePayload.parseFrom(info.getTypePayload());
            } catch (InvalidProtocolBufferException e) {
                throw new RuntimeException("Error parsing artifact file payload.", e);
            }
            String stagedName;
            if (BeamUrns.getUrn(RunnerApi.StandardArtifacts.Roles.STAGING_TO).equals(info.getRoleUrn())) {
                try {
                    RunnerApi.ArtifactStagingToRolePayload stagingPayload = RunnerApi.ArtifactStagingToRolePayload.parseFrom(info.getRolePayload());
                    stagedName = stagingPayload.getStagedName();
                } catch (InvalidProtocolBufferException e) {
                    throw new RuntimeException("Error parsing artifact staging_to role payload.", e);
                }
            } else {
                try {
                    File source = new File(filePayload.getPath());
                    HashCode hashCode = Files.asByteSource(source).hash(Hashing.sha256());
                    stagedName = Environments.createStagingFileName(source, hashCode);
                } catch (IOException e) {
                    throw new RuntimeException(String.format("Error creating staged name for artifact %s", filePayload.getPath()), e);
                }
            }
            environmentBuilder.addDependencies(info.toBuilder().setTypeUrn(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.URL)).setTypePayload(RunnerApi.ArtifactUrlPayload.newBuilder().setUrl(FileSystems.matchNewResource(options.getStagingLocation(), true).resolve(stagedName, ResolveOptions.StandardResolveOptions.RESOLVE_FILE).toString()).setSha256(filePayload.getSha256()).build().toByteString()));
        }
        componentsBuilder.putEnvironments(entry.getKey(), environmentBuilder.build());
    }
    return pipelineBuilder.build();
}
Also used : InvalidProtocolBufferException(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.InvalidProtocolBufferException) StringUtils.byteArrayToJsonString(org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString) IOException(java.io.IOException) Pipeline(org.apache.beam.sdk.Pipeline) SdkComponents(org.apache.beam.runners.core.construction.SdkComponents) RunnerApi(org.apache.beam.model.pipeline.v1.RunnerApi) HashCode(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode) ImmutableMap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap) Map(java.util.Map) File(java.io.File) StagedFile(org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile) VisibleForTesting(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting)

Aggregations

HashCode (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.HashCode)7 File (java.io.File)5 IOException (java.io.IOException)4 StagedFile (org.apache.beam.runners.dataflow.util.PackageUtil.StagedFile)3 Map (java.util.Map)2 RunnerApi (org.apache.beam.model.pipeline.v1.RunnerApi)2 StringUtils.byteArrayToJsonString (org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString)2 InvalidProtocolBufferException (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.InvalidProtocolBufferException)2 ImmutableList (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList)2 ImmutableMap (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap)2 TableCell (com.google.api.services.bigquery.model.TableCell)1 TableRow (com.google.api.services.bigquery.model.TableRow)1 UncheckedIOException (java.io.UncheckedIOException)1 HashSet (java.util.HashSet)1 LinkedHashSet (java.util.LinkedHashSet)1 ArtifactInformation (org.apache.beam.model.pipeline.v1.RunnerApi.ArtifactInformation)1 SdkComponents (org.apache.beam.runners.core.construction.SdkComponents)1 DataflowPipelineOptions (org.apache.beam.runners.dataflow.options.DataflowPipelineOptions)1 Pipeline (org.apache.beam.sdk.Pipeline)1 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)1