Search in sources :

Example 11 with PipelineOptions

use of com.google.cloud.dataflow.sdk.options.PipelineOptions in project gatk by broadinstitute.

the class PathSeqKmerSpark method runTool.

/** Get the list of distinct kmers in the reference, and write them to a file as a HopScotchSet. */
@Override
protected void runTool(final JavaSparkContext ctx) {
    final SAMFileHeader hdr = getHeaderForReads();
    SAMSequenceDictionary dict = null;
    if (hdr != null)
        dict = hdr.getSequenceDictionary();
    final PipelineOptions options = getAuthenticatedGCSOptions();
    final ReferenceMultiSource referenceMultiSource = getReference();
    final List<SVKmer> kmerList = findKmers(ctx, KMER_SIZE, referenceMultiSource, options, dict);
    final HopscotchSet<SVKmer> kmerSet = new HopscotchSet<>(kmerList);
    final Output output = new Output(BucketUtils.createFile(OUTPUT_FILE));
    final Kryo kryo = new Kryo();
    kryo.setReferences(false);
    kryo.writeClassAndObject(output, kmerSet);
    output.close();
}
Also used : HopscotchSet(org.broadinstitute.hellbender.tools.spark.utils.HopscotchSet) ReferenceMultiSource(org.broadinstitute.hellbender.engine.datasources.ReferenceMultiSource) PipelineOptions(com.google.cloud.dataflow.sdk.options.PipelineOptions) Output(com.esotericsoftware.kryo.io.Output) SAMFileHeader(htsjdk.samtools.SAMFileHeader) SAMSequenceDictionary(htsjdk.samtools.SAMSequenceDictionary) Kryo(com.esotericsoftware.kryo.Kryo)

Example 12 with PipelineOptions

use of com.google.cloud.dataflow.sdk.options.PipelineOptions in project gatk by broadinstitute.

the class ReferenceAPISource method fromReferenceSetAssemblyID.

/**
     * Creates this ReferenceAPISource from an assembly ID by querying in the cloud APIs.
     */
public static ReferenceAPISource fromReferenceSetAssemblyID(final PipelineOptions pipelineOptions, final String referenceSetAssemblyID) {
    Utils.nonNull(pipelineOptions);
    Utils.nonNull(referenceSetAssemblyID);
    final SearchReferenceSetsRequest content = new SearchReferenceSetsRequest();
    content.setAssemblyId(referenceSetAssemblyID);
    try {
        final Genomics genomicsService = createGenomicsService(pipelineOptions);
        final SearchReferenceSetsResponse found = genomicsService.referencesets().search(content).execute();
        final Set<String> referenceSetIds = found.getReferenceSets().stream().map(rs -> rs.getId()).collect(Collectors.toSet());
        if (referenceSetIds.isEmpty()) {
            throw new UserException.UnknownReferenceSet(referenceSetAssemblyID);
        }
        if (referenceSetIds.size() > 1) {
            throw new UserException.MultipleReferenceSets(referenceSetAssemblyID, referenceSetIds);
        }
        final Map<String, Reference> ret = new LinkedHashMap<>();
        for (final String rId : referenceSetIds) {
            final SearchReferencesRequest query = new SearchReferencesRequest().setReferenceSetId(rId);
            ret.putAll(genomicsService.references().search(query).execute().getReferences().stream().collect(Collectors.toMap(r -> r.getName(), r -> r)));
        }
        return new ReferenceAPISource(pipelineOptions, ret);
    } catch (final IOException e) {
        throw new UserException("Error while looking up reference set " + referenceSetAssemblyID, e);
    }
}
Also used : java.util(java.util) GATKGCSOptions(org.broadinstitute.hellbender.utils.gcs.GATKGCSOptions) ObjectInputStream(java.io.ObjectInputStream) GCSOptions(com.google.cloud.genomics.dataflow.utils.GCSOptions) Genomics(com.google.api.services.genomics.Genomics) ReferenceBases(org.broadinstitute.hellbender.utils.reference.ReferenceBases) GeneralSecurityException(java.security.GeneralSecurityException) ObjectOutputStream(java.io.ObjectOutputStream) GenomicsFactory(com.google.cloud.genomics.utils.GenomicsFactory) SAMSequenceDictionary(htsjdk.samtools.SAMSequenceDictionary) IOException(java.io.IOException) SimpleInterval(org.broadinstitute.hellbender.utils.SimpleInterval) Collectors(java.util.stream.Collectors) Bytes(com.google.common.primitives.Bytes) Serializable(java.io.Serializable) PipelineOptionsFactory(com.google.cloud.dataflow.sdk.options.PipelineOptionsFactory) Logger(org.apache.logging.log4j.Logger) UserException(org.broadinstitute.hellbender.exceptions.UserException) JsonFactory(com.google.api.client.json.JsonFactory) PipelineOptions(com.google.cloud.dataflow.sdk.options.PipelineOptions) Utils(org.broadinstitute.hellbender.utils.Utils) VisibleForTesting(com.google.common.annotations.VisibleForTesting) SAMSequenceRecord(htsjdk.samtools.SAMSequenceRecord) com.google.api.services.genomics.model(com.google.api.services.genomics.model) LogManager(org.apache.logging.log4j.LogManager) Genomics(com.google.api.services.genomics.Genomics) IOException(java.io.IOException) UserException(org.broadinstitute.hellbender.exceptions.UserException)

Aggregations

PipelineOptions (com.google.cloud.dataflow.sdk.options.PipelineOptions)12 SAMSequenceDictionary (htsjdk.samtools.SAMSequenceDictionary)5 ReferenceMultiSource (org.broadinstitute.hellbender.engine.datasources.ReferenceMultiSource)5 BaseTest (org.broadinstitute.hellbender.utils.test.BaseTest)5 Test (org.testng.annotations.Test)5 SAMFileHeader (htsjdk.samtools.SAMFileHeader)4 Kryo (com.esotericsoftware.kryo.Kryo)3 IOException (java.io.IOException)3 HopscotchSet (org.broadinstitute.hellbender.tools.spark.utils.HopscotchSet)3 Input (com.esotericsoftware.kryo.io.Input)2 Output (com.esotericsoftware.kryo.io.Output)2 VisibleForTesting (com.google.common.annotations.VisibleForTesting)2 File (java.io.File)2 java.util (java.util)2 UserException (org.broadinstitute.hellbender.exceptions.UserException)2 DefaultSerializer (com.esotericsoftware.kryo.DefaultSerializer)1 JsonFactory (com.google.api.client.json.JsonFactory)1 Genomics (com.google.api.services.genomics.Genomics)1 com.google.api.services.genomics.model (com.google.api.services.genomics.model)1 PipelineOptionsFactory (com.google.cloud.dataflow.sdk.options.PipelineOptionsFactory)1