Search in sources :

Example 6 with SubStageInputOutput

use of com.hartwig.pipeline.stages.SubStageInputOutput in project pipeline5 by hartwigmedical.

the class GermlineCaller method germlineCallerCommands.

public List<BashCommand> germlineCallerCommands(final SingleSampleRunMetadata metadata) {
    String referenceFasta = resourceFiles.refGenomeFile();
    SubStageInputOutput callerOutput = new GatkGermlineCaller(bamDownload.getLocalTargetPath(), referenceFasta).andThen(new GenotypeGVCFs(referenceFasta)).apply(SubStageInputOutput.empty(metadata.sampleName()));
    SubStageInputOutput snpFilterOutput = new SelectVariants("snp", Lists.newArrayList("SNP", "NO_VARIATION"), referenceFasta).andThen(new VariantFiltration("snp", SNP_FILTER_EXPRESSION, referenceFasta)).apply(callerOutput);
    SubStageInputOutput indelFilterOutput = new SelectVariants("indels", Lists.newArrayList("INDEL", "MIXED"), referenceFasta).andThen(new VariantFiltration("indels", INDEL_FILTER_EXPRESSION, referenceFasta)).apply(SubStageInputOutput.of(metadata.sampleName(), callerOutput.outputFile(), Collections.emptyList()));
    SubStageInputOutput combinedFilters = snpFilterOutput.combine(indelFilterOutput);
    SubStageInputOutput finalOutput = new CombineFilteredVariants(indelFilterOutput.outputFile().path(), referenceFasta).andThen(new GermlineZipIndex()).apply(combinedFilters);
    return ImmutableList.<BashCommand>builder().addAll(finalOutput.bash()).add(new MvCommand(finalOutput.outputFile().path(), outputFile.path())).add(new MvCommand(finalOutput.outputFile().path() + ".tbi", outputFile.path() + ".tbi")).build();
}
Also used : BashCommand(com.hartwig.pipeline.execution.vm.BashCommand) SubStageInputOutput(com.hartwig.pipeline.stages.SubStageInputOutput) MvCommand(com.hartwig.pipeline.execution.vm.unix.MvCommand)

Example 7 with SubStageInputOutput

use of com.hartwig.pipeline.stages.SubStageInputOutput in project pipeline5 by hartwigmedical.

the class GridssBackport method execute.

@Override
public VirtualMachineJobDefinition execute(final InputBundle inputs, final RuntimeBucket runtimeBucket, final BashStartupScript startupScript, final RuntimeFiles executionFlags) {
    final ResourceFiles resourceFiles = ResourceFilesFactory.buildResourceFiles(RefGenomeVersion.V37);
    final InputFileDescriptor template = inputs.get("set");
    final String set = inputs.get("set").inputValue();
    final String sample = inputs.get("tumor_sample").inputValue();
    final String bamFile = String.format("gs://hmf-gridss/assembly/%s/%s.assembly.bam.sv.bam", set, sample);
    final String vcfFile = String.format("gs://hmf-gridss/original/%s/%s.gridss.unfiltered.vcf.gz", set, sample);
    final InputFileDescriptor inputBam = ImmutableInputFileDescriptor.builder().from(template).inputValue(bamFile).build();
    final InputFileDescriptor inputBamIndex = inputBam.index();
    final InputFileDescriptor inputVcf = ImmutableInputFileDescriptor.builder().from(template).inputValue(vcfFile).build();
    final InputFileDescriptor inputVcfIndex = inputVcf.index();
    // 1. Set up paths
    startupScript.addCommand(new ExportPathCommand(new BwaCommand()));
    startupScript.addCommand(new ExportPathCommand(new SamtoolsCommand()));
    // 2. Download input files
    startupScript.addCommand(inputBam::copyToLocalDestinationCommand);
    startupScript.addCommand(inputBamIndex::copyToLocalDestinationCommand);
    startupScript.addCommand(inputVcf::copyToLocalDestinationCommand);
    startupScript.addCommand(inputVcfIndex::copyToLocalDestinationCommand);
    // 3. Get sample names
    startupScript.addCommand(() -> format("sampleNames=$(zgrep -m1 CHROM %s)", inputVcf.localDestination()));
    startupScript.addCommand(() -> "sample0=$(echo $sampleNames | cut -d \" \" -f 10)");
    startupScript.addCommand(() -> "sample1=$(echo $sampleNames | cut -d \" \" -f 11)");
    // 4. Create empty bams (and their working directories)
    final String emptyBam1 = String.format("%s/${%s}", VmDirectories.INPUT, "sample0");
    final String emptyBam1Working = workingDir(emptyBam1) + ".sv.bam";
    final String emptyBam2 = String.format("%s/${%s}", VmDirectories.INPUT, "sample1");
    final String emptyBam2Working = workingDir(emptyBam2) + ".sv.bam";
    startupScript.addCommand(() -> format("samtools view -H %s | samtools view -o %s", inputBam.localDestination(), emptyBam1));
    startupScript.addCommand(() -> format("samtools view -H %s | samtools view -o %s", inputBam.localDestination(), emptyBam2));
    startupScript.addCommand(() -> format("mkdir -p %s", dirname(emptyBam1Working)));
    startupScript.addCommand(() -> format("mkdir -p %s", dirname(emptyBam2Working)));
    startupScript.addCommand(() -> format("cp %s %s", emptyBam1, emptyBam1Working));
    startupScript.addCommand(() -> format("cp %s %s", emptyBam2, emptyBam2Working));
    // 5. SoftClipsToSplitReads
    final String newAssemblyBam = workingDir(inputBam.localDestination());
    startupScript.addCommand(() -> format("mkdir -p %s", dirname(newAssemblyBam)));
    startupScript.addCommand(new SoftClipsToSplitReads(inputBam.localDestination(), resourceFiles.refGenomeFile(), newAssemblyBam));
    // 6. Allocate Evidence
    final OutputFile newRawVcf = OutputFile.of(sample, "gridss_" + Versions.GRIDSS.replace(".", "_") + ".raw", FileTypes.GZIPPED_VCF);
    startupScript.addCommand(new AllocateEvidence(emptyBam1, emptyBam2, newAssemblyBam, inputVcf.localDestination(), newRawVcf.path(), resourceFiles.refGenomeFile(), resourceFiles.gridssPropertiesFile()));
    // 7. Gridss Annotation
    final SubStageInputOutput annotation = new GridssAnnotation(resourceFiles, true).apply(SubStageInputOutput.of(sample, newRawVcf, Collections.emptyList()));
    startupScript.addCommands(annotation.bash());
    // 8. Archive targeted output
    final OutputFile unfilteredVcf = annotation.outputFile();
    final OutputFile unfilteredVcfIndex = unfilteredVcf.index(".tbi");
    final GoogleStorageLocation unfilteredVcfRemoteLocation = remoteUnfilteredVcfArchivePath(set, sample);
    final GoogleStorageLocation unfilteredVcfIndexRemoteLocation = index(unfilteredVcfRemoteLocation, ".tbi");
    startupScript.addCommand(() -> unfilteredVcf.copyToRemoteLocation(unfilteredVcfRemoteLocation));
    startupScript.addCommand(() -> unfilteredVcfIndex.copyToRemoteLocation(unfilteredVcfIndexRemoteLocation));
    // 9. Upload all output
    startupScript.addCommand(new OutputUpload(GoogleStorageLocation.of(runtimeBucket.name(), "gridss"), executionFlags));
    return VirtualMachineJobDefinition.structuralCalling(startupScript, ResultsDirectory.defaultDirectory());
}
Also used : OutputFile(com.hartwig.pipeline.execution.vm.OutputFile) ExportPathCommand(com.hartwig.pipeline.execution.vm.unix.ExportPathCommand) InputFileDescriptor(com.hartwig.batch.input.InputFileDescriptor) ImmutableInputFileDescriptor(com.hartwig.batch.input.ImmutableInputFileDescriptor) SubStageInputOutput(com.hartwig.pipeline.stages.SubStageInputOutput) GridssAnnotation(com.hartwig.pipeline.calling.structural.gridss.stage.GridssAnnotation) BwaCommand(com.hartwig.pipeline.calling.command.BwaCommand) ResourceFiles(com.hartwig.pipeline.resource.ResourceFiles) SamtoolsCommand(com.hartwig.pipeline.calling.command.SamtoolsCommand) OutputUpload(com.hartwig.pipeline.execution.vm.OutputUpload) GoogleStorageLocation(com.hartwig.pipeline.storage.GoogleStorageLocation) AllocateEvidence(com.hartwig.pipeline.calling.structural.gridss.command.AllocateEvidence) SoftClipsToSplitReads(com.hartwig.pipeline.calling.structural.gridss.command.SoftClipsToSplitReads)

Example 8 with SubStageInputOutput

use of com.hartwig.pipeline.stages.SubStageInputOutput in project pipeline5 by hartwigmedical.

the class SageCreatePonData method execute.

@Override
public VirtualMachineJobDefinition execute(final InputBundle inputs, final RuntimeBucket runtimeBucket, final BashStartupScript startupScript, final RuntimeFiles executionFlags) {
    final ResourceFiles resourceFiles = ResourceFilesFactory.buildResourceFiles(RefGenomeVersion.V37);
    final InputFileDescriptor remoteReferenceFile = inputs.get("reference");
    final InputFileDescriptor remoteReferenceIndex = remoteReferenceFile.index();
    final String localReferenceFile = localFilename(remoteReferenceFile);
    final String localReferenceBam = localReferenceFile.replace("cram", "bam");
    final String referenceSampleName = inputs.get("referenceSample").inputValue();
    // Download latest jar file
    // startupScript.addCommand(() -> format("gsutil -u hmf-crunch cp %s %s",
    // "gs://batch-sage-validation/resources/sage.jar",
    // "/opt/tools/sage/" + Versions.SAGE + "/sage.jar"));
    // Download normal
    startupScript.addCommand(() -> remoteReferenceFile.toCommandForm(localReferenceFile));
    startupScript.addCommand(() -> remoteReferenceIndex.toCommandForm(localFilename(remoteReferenceIndex)));
    final SageCommandBuilder sageCommandBuilder = new SageCommandBuilder(resourceFiles).ponMode(referenceSampleName, localReferenceBam);
    final SageApplication sageApplication = new SageApplication(sageCommandBuilder);
    // Convert to bam if necessary
    if (!localReferenceFile.equals(localReferenceBam)) {
        startupScript.addCommands(cramToBam(localReferenceFile));
    }
    // Run post processing (NONE for germline)
    final SubStageInputOutput postProcessing = sageApplication.apply(SubStageInputOutput.empty(referenceSampleName));
    startupScript.addCommands(postProcessing.bash());
    // Store output
    startupScript.addCommand(new OutputUpload(GoogleStorageLocation.of(runtimeBucket.name(), "sage"), executionFlags));
    return VirtualMachineJobDefinition.sageSomaticCalling(startupScript, ResultsDirectory.defaultDirectory());
}
Also used : ResourceFiles(com.hartwig.pipeline.resource.ResourceFiles) OutputUpload(com.hartwig.pipeline.execution.vm.OutputUpload) InputFileDescriptor(com.hartwig.batch.input.InputFileDescriptor) SubStageInputOutput(com.hartwig.pipeline.stages.SubStageInputOutput) SageCommandBuilder(com.hartwig.pipeline.calling.sage.SageCommandBuilder) SageApplication(com.hartwig.pipeline.calling.sage.SageApplication)

Aggregations

SubStageInputOutput (com.hartwig.pipeline.stages.SubStageInputOutput)8 OutputUpload (com.hartwig.pipeline.execution.vm.OutputUpload)5 ResourceFiles (com.hartwig.pipeline.resource.ResourceFiles)5 InputFileDescriptor (com.hartwig.batch.input.InputFileDescriptor)4 GoogleStorageLocation (com.hartwig.pipeline.storage.GoogleStorageLocation)4 BwaCommand (com.hartwig.pipeline.calling.command.BwaCommand)3 SamtoolsCommand (com.hartwig.pipeline.calling.command.SamtoolsCommand)3 GridssAnnotation (com.hartwig.pipeline.calling.structural.gridss.stage.GridssAnnotation)3 BashCommand (com.hartwig.pipeline.execution.vm.BashCommand)3 OutputFile (com.hartwig.pipeline.execution.vm.OutputFile)3 ExportPathCommand (com.hartwig.pipeline.execution.vm.unix.ExportPathCommand)3 SageApplication (com.hartwig.pipeline.calling.sage.SageApplication)2 SageCommandBuilder (com.hartwig.pipeline.calling.sage.SageCommandBuilder)2 InputDownload (com.hartwig.pipeline.execution.vm.InputDownload)2 MvCommand (com.hartwig.pipeline.execution.vm.unix.MvCommand)2 ArrayList (java.util.ArrayList)2 Storage (com.google.cloud.storage.Storage)1 RemoteLocationsApi (com.hartwig.batch.api.RemoteLocationsApi)1 ImmutableInputFileDescriptor (com.hartwig.batch.input.ImmutableInputFileDescriptor)1 Lane (com.hartwig.patient.Lane)1