Search in sources :

Example 1 with RuntimeBucket

use of com.hartwig.pipeline.storage.RuntimeBucket in project pipeline5 by hartwigmedical.

the class StageRunner method run.

public <T extends StageOutput> T run(final M metadata, final Stage<T, M> stage) {
    final List<BashCommand> commands = commands(mode, metadata, stage);
    if (stage.shouldRun(arguments) && !commands.isEmpty()) {
        if (!startingPoint.usePersisted(stage.namespace())) {
            StageTrace trace = new StageTrace(stage.namespace(), metadata.name(), StageTrace.ExecutorType.COMPUTE_ENGINE);
            RuntimeBucket bucket = RuntimeBucket.from(storage, stage.namespace(), metadata, arguments, labels);
            BashStartupScript bash = BashStartupScript.of(bucket.name());
            bash.addCommands(stage.inputs()).addCommands(OverrideReferenceGenomeCommand.overrides(arguments)).addCommands(commands).addCommand(new OutputUpload(GoogleStorageLocation.of(bucket.name(), resultsDirectory.path()), RuntimeFiles.typical()));
            PipelineStatus status = Failsafe.with(DefaultBackoffPolicy.of(String.format("[%s] stage [%s]", metadata.name(), stage.namespace()))).get(() -> computeEngine.submit(bucket, stage.vmDefinition(bash, resultsDirectory)));
            trace.stop();
            return stage.output(metadata, status, bucket, resultsDirectory);
        }
        return stage.persistedOutput(metadata);
    }
    return stage.skippedOutput(metadata);
}
Also used : PipelineStatus(com.hartwig.pipeline.execution.PipelineStatus) StageTrace(com.hartwig.pipeline.trace.StageTrace) BashStartupScript(com.hartwig.pipeline.execution.vm.BashStartupScript) OutputUpload(com.hartwig.pipeline.execution.vm.OutputUpload) BashCommand(com.hartwig.pipeline.execution.vm.BashCommand) RuntimeBucket(com.hartwig.pipeline.storage.RuntimeBucket)

Example 2 with RuntimeBucket

use of com.hartwig.pipeline.storage.RuntimeBucket in project pipeline5 by hartwigmedical.

the class SageCaller method outputBuilder.

protected ImmutableSageOutput.Builder outputBuilder(final SomaticRunMetadata metadata, final PipelineStatus jobStatus, final RuntimeBucket bucket, final ResultsDirectory resultsDirectory) {
    final String filteredOutputFile = sageConfiguration.filteredTemplate().apply(metadata);
    final String unfilteredOutputFile = sageConfiguration.unfilteredTemplate().apply(metadata);
    final String geneCoverageFile = sageConfiguration.geneCoverageTemplate().apply(metadata);
    final Optional<String> somaticRefSampleBqrPlot = referenceSampleBqrPlot(metadata);
    final Optional<String> somaticTumorSampleBqrPlot = tumorSampleBqrPlot(metadata);
    final ImmutableSageOutput.Builder builder = SageOutput.builder(namespace()).status(jobStatus);
    somaticRefSampleBqrPlot.ifPresent(s -> builder.maybeSomaticRefSampleBqrPlot(GoogleStorageLocation.of(bucket.name(), resultsDirectory.path(s))).addReportComponents(bqrComponent("png", bucket, resultsDirectory, metadata.reference().sampleName())).addReportComponents(bqrComponent("tsv", bucket, resultsDirectory, metadata.reference().sampleName())));
    somaticTumorSampleBqrPlot.ifPresent(s -> builder.maybeSomaticTumorSampleBqrPlot(GoogleStorageLocation.of(bucket.name(), resultsDirectory.path(s))).addReportComponents(bqrComponent("png", bucket, resultsDirectory, metadata.tumor().sampleName())).addReportComponents(bqrComponent("tsv", bucket, resultsDirectory, metadata.tumor().sampleName())));
    return builder.addFailedLogLocations(GoogleStorageLocation.of(bucket.name(), RunLogComponent.LOG_FILE)).maybeGermlineGeneCoverage(GoogleStorageLocation.of(bucket.name(), resultsDirectory.path(geneCoverageFile))).maybeSomaticTumorSampleBqrPlot(somaticTumorSampleBqrPlot.map(t -> GoogleStorageLocation.of(bucket.name(), resultsDirectory.path(t)))).maybeVariants(GoogleStorageLocation.of(bucket.name(), resultsDirectory.path(filteredOutputFile))).addReportComponents(bqrComponent("png", bucket, resultsDirectory, metadata.sampleName())).addReportComponents(bqrComponent("tsv", bucket, resultsDirectory, metadata.sampleName())).addReportComponents(vcfComponent(unfilteredOutputFile, bucket, resultsDirectory)).addReportComponents(vcfComponent(filteredOutputFile, bucket, resultsDirectory)).addReportComponents(singleFileComponent(geneCoverageFile, bucket, resultsDirectory)).addReportComponents(new RunLogComponent(bucket, namespace(), Folder.root(), resultsDirectory)).addReportComponents(new StartupScriptComponent(bucket, namespace(), Folder.root())).addDatatypes(new AddDatatype(sageConfiguration.vcfDatatype(), metadata.barcode(), new ArchivePath(Folder.root(), namespace(), filteredOutputFile))).addDatatypes(new AddDatatype(sageConfiguration.geneCoverageDatatype(), metadata.barcode(), new ArchivePath(Folder.root(), namespace(), geneCoverageFile))).addAllDatatypes(somaticRefSampleBqrPlot.stream().map(r -> new AddDatatype(sageConfiguration.tumorSampleBqrPlot(), metadata.barcode(), new ArchivePath(Folder.root(), namespace(), r))).collect(Collectors.toList())).addAllDatatypes(somaticTumorSampleBqrPlot.stream().map(t -> new AddDatatype(sageConfiguration.refSampleBqrPlot(), metadata.barcode(), new ArchivePath(Folder.root(), namespace(), t))).collect(Collectors.toList()));
}
Also used : ZippedVcfAndIndexComponent(com.hartwig.pipeline.report.ZippedVcfAndIndexComponent) SubStageInputOutput(com.hartwig.pipeline.stages.SubStageInputOutput) BashCommand(com.hartwig.pipeline.execution.vm.BashCommand) VirtualMachineJobDefinition(com.hartwig.pipeline.execution.vm.VirtualMachineJobDefinition) AlignmentPair(com.hartwig.pipeline.alignment.AlignmentPair) RuntimeBucket(com.hartwig.pipeline.storage.RuntimeBucket) PipelineStatus(com.hartwig.pipeline.execution.PipelineStatus) PersistedDataset(com.hartwig.pipeline.reruns.PersistedDataset) BashStartupScript(com.hartwig.pipeline.execution.vm.BashStartupScript) GoogleStorageLocation(com.hartwig.pipeline.storage.GoogleStorageLocation) TertiaryStage(com.hartwig.pipeline.tertiary.TertiaryStage) ArchivePath(com.hartwig.pipeline.metadata.ArchivePath) Folder(com.hartwig.pipeline.report.Folder) ResultsDirectory(com.hartwig.pipeline.ResultsDirectory) Collectors(java.util.stream.Collectors) String.format(java.lang.String.format) SingleFileComponent(com.hartwig.pipeline.report.SingleFileComponent) List(java.util.List) AddDatatype(com.hartwig.pipeline.metadata.AddDatatype) SomaticRunMetadata(com.hartwig.pipeline.metadata.SomaticRunMetadata) PersistedLocations(com.hartwig.pipeline.reruns.PersistedLocations) Optional(java.util.Optional) StartupScriptComponent(com.hartwig.pipeline.report.StartupScriptComponent) ReportComponent(com.hartwig.pipeline.report.ReportComponent) RunLogComponent(com.hartwig.pipeline.report.RunLogComponent) ArchivePath(com.hartwig.pipeline.metadata.ArchivePath) RunLogComponent(com.hartwig.pipeline.report.RunLogComponent) StartupScriptComponent(com.hartwig.pipeline.report.StartupScriptComponent) AddDatatype(com.hartwig.pipeline.metadata.AddDatatype)

Example 3 with RuntimeBucket

use of com.hartwig.pipeline.storage.RuntimeBucket in project pipeline5 by hartwigmedical.

the class BwaAligner method run.

public AlignmentOutput run(final SingleSampleRunMetadata metadata) throws Exception {
    StageTrace trace = new StageTrace(NAMESPACE, metadata.sampleName(), StageTrace.ExecutorType.COMPUTE_ENGINE).start();
    RuntimeBucket rootBucket = RuntimeBucket.from(storage, NAMESPACE, metadata, arguments, labels);
    Sample sample = sampleSource.sample(metadata);
    if (sample.bam().isPresent()) {
        String noPrefix = sample.bam().orElseThrow().replace("gs://", "");
        int firstSlash = noPrefix.indexOf("/");
        String bucket = noPrefix.substring(0, firstSlash);
        String path = noPrefix.substring(firstSlash + 1);
        return AlignmentOutput.builder().sample(metadata.sampleName()).status(PipelineStatus.PROVIDED).maybeAlignments(GoogleStorageLocation.of(bucket, path)).build();
    }
    final ResourceFiles resourceFiles = buildResourceFiles(arguments);
    sampleUpload.run(sample, rootBucket);
    List<Future<PipelineStatus>> futures = new ArrayList<>();
    List<GoogleStorageLocation> perLaneBams = new ArrayList<>();
    List<ReportComponent> laneLogComponents = new ArrayList<>();
    List<GoogleStorageLocation> laneFailedLogs = new ArrayList<>();
    for (Lane lane : sample.lanes()) {
        RuntimeBucket laneBucket = RuntimeBucket.from(storage, laneNamespace(lane), metadata, arguments, labels);
        BashStartupScript bash = BashStartupScript.of(laneBucket.name());
        InputDownload first = new InputDownload(GoogleStorageLocation.of(rootBucket.name(), fastQFileName(sample.name(), lane.firstOfPairPath())));
        InputDownload second = new InputDownload(GoogleStorageLocation.of(rootBucket.name(), fastQFileName(sample.name(), lane.secondOfPairPath())));
        bash.addCommand(first).addCommand(second);
        bash.addCommands(OverrideReferenceGenomeCommand.overrides(arguments));
        SubStageInputOutput alignment = new LaneAlignment(arguments.sbpApiRunId().isPresent(), resourceFiles.refGenomeFile(), first.getLocalTargetPath(), second.getLocalTargetPath(), metadata.sampleName(), lane).apply(SubStageInputOutput.empty(metadata.sampleName()));
        perLaneBams.add(GoogleStorageLocation.of(laneBucket.name(), resultsDirectory.path(alignment.outputFile().fileName())));
        bash.addCommands(alignment.bash()).addCommand(new OutputUpload(GoogleStorageLocation.of(laneBucket.name(), resultsDirectory.path()), RuntimeFiles.typical()));
        futures.add(executorService.submit(() -> runWithRetries(metadata, laneBucket, VirtualMachineJobDefinition.alignment(laneId(lane).toLowerCase(), bash, resultsDirectory))));
        laneLogComponents.add(new RunLogComponent(laneBucket, laneNamespace(lane), Folder.from(metadata), resultsDirectory));
        laneFailedLogs.add(GoogleStorageLocation.of(laneBucket.name(), RunLogComponent.LOG_FILE));
    }
    AlignmentOutput output;
    if (lanesSuccessfullyComplete(futures)) {
        List<InputDownload> laneBams = perLaneBams.stream().map(InputDownload::new).collect(Collectors.toList());
        BashStartupScript mergeMarkdupsBash = BashStartupScript.of(rootBucket.name());
        laneBams.forEach(mergeMarkdupsBash::addCommand);
        SubStageInputOutput merged = new MergeMarkDups(laneBams.stream().map(InputDownload::getLocalTargetPath).filter(path -> path.endsWith("bam")).collect(Collectors.toList())).apply(SubStageInputOutput.empty(metadata.sampleName()));
        mergeMarkdupsBash.addCommands(merged.bash());
        mergeMarkdupsBash.addCommand(new OutputUpload(GoogleStorageLocation.of(rootBucket.name(), resultsDirectory.path()), RuntimeFiles.typical()));
        PipelineStatus status = runWithRetries(metadata, rootBucket, VirtualMachineJobDefinition.mergeMarkdups(mergeMarkdupsBash, resultsDirectory));
        ImmutableAlignmentOutput.Builder outputBuilder = AlignmentOutput.builder().sample(metadata.sampleName()).status(status).maybeAlignments(GoogleStorageLocation.of(rootBucket.name(), resultsDirectory.path(merged.outputFile().fileName()))).addAllReportComponents(laneLogComponents).addAllFailedLogLocations(laneFailedLogs).addFailedLogLocations(GoogleStorageLocation.of(rootBucket.name(), RunLogComponent.LOG_FILE)).addReportComponents(new RunLogComponent(rootBucket, Aligner.NAMESPACE, Folder.from(metadata), resultsDirectory));
        if (!arguments.outputCram()) {
            outputBuilder.addReportComponents(new SingleFileComponent(rootBucket, Aligner.NAMESPACE, Folder.from(metadata), bam(metadata.sampleName()), bam(metadata.sampleName()), resultsDirectory), new SingleFileComponent(rootBucket, Aligner.NAMESPACE, Folder.from(metadata), bai(bam(metadata.sampleName())), bai(bam(metadata.sampleName())), resultsDirectory)).addDatatypes(new AddDatatype(DataType.ALIGNED_READS, metadata.barcode(), new ArchivePath(Folder.from(metadata), BwaAligner.NAMESPACE, bam(metadata.sampleName()))), new AddDatatype(DataType.ALIGNED_READS_INDEX, metadata.barcode(), new ArchivePath(Folder.from(metadata), BwaAligner.NAMESPACE, bai(metadata.sampleName()))));
        }
        output = outputBuilder.build();
    } else {
        output = AlignmentOutput.builder().sample(metadata.sampleName()).status(PipelineStatus.FAILED).build();
    }
    trace.stop();
    executorService.shutdown();
    return output;
}
Also used : Arguments(com.hartwig.pipeline.Arguments) StageTrace(com.hartwig.pipeline.trace.StageTrace) SubStageInputOutput(com.hartwig.pipeline.stages.SubStageInputOutput) Aligner(com.hartwig.pipeline.alignment.Aligner) InputDownload(com.hartwig.pipeline.execution.vm.InputDownload) ArrayList(java.util.ArrayList) VirtualMachineJobDefinition(com.hartwig.pipeline.execution.vm.VirtualMachineJobDefinition) Future(java.util.concurrent.Future) RuntimeBucket(com.hartwig.pipeline.storage.RuntimeBucket) PipelineStatus(com.hartwig.pipeline.execution.PipelineStatus) ExecutorService(java.util.concurrent.ExecutorService) BashStartupScript(com.hartwig.pipeline.execution.vm.BashStartupScript) DataType(com.hartwig.pipeline.datatypes.DataType) FileTypes.bai(com.hartwig.pipeline.datatypes.FileTypes.bai) ImmutableAlignmentOutput(com.hartwig.pipeline.alignment.ImmutableAlignmentOutput) GoogleStorageLocation(com.hartwig.pipeline.storage.GoogleStorageLocation) Lane(com.hartwig.patient.Lane) ArchivePath(com.hartwig.pipeline.metadata.ArchivePath) SampleUpload(com.hartwig.pipeline.storage.SampleUpload) Folder(com.hartwig.pipeline.report.Folder) ResultsDirectory(com.hartwig.pipeline.ResultsDirectory) OutputUpload(com.hartwig.pipeline.execution.vm.OutputUpload) DefaultBackoffPolicy(com.hartwig.pipeline.failsafe.DefaultBackoffPolicy) Collectors(java.util.stream.Collectors) String.format(java.lang.String.format) File(java.io.File) SingleFileComponent(com.hartwig.pipeline.report.SingleFileComponent) Failsafe(net.jodah.failsafe.Failsafe) ResourceFilesFactory.buildResourceFiles(com.hartwig.pipeline.resource.ResourceFilesFactory.buildResourceFiles) ExecutionException(java.util.concurrent.ExecutionException) List(java.util.List) Sample(com.hartwig.patient.Sample) AddDatatype(com.hartwig.pipeline.metadata.AddDatatype) AlignmentOutput(com.hartwig.pipeline.alignment.AlignmentOutput) OverrideReferenceGenomeCommand(com.hartwig.pipeline.resource.OverrideReferenceGenomeCommand) RuntimeFiles(com.hartwig.pipeline.execution.vm.RuntimeFiles) SingleSampleRunMetadata(com.hartwig.pipeline.metadata.SingleSampleRunMetadata) Storage(com.google.cloud.storage.Storage) Labels(com.hartwig.pipeline.labels.Labels) FileTypes.bam(com.hartwig.pipeline.datatypes.FileTypes.bam) ResourceFiles(com.hartwig.pipeline.resource.ResourceFiles) ComputeEngine(com.hartwig.pipeline.execution.vm.ComputeEngine) ReportComponent(com.hartwig.pipeline.report.ReportComponent) RunLogComponent(com.hartwig.pipeline.report.RunLogComponent) SampleSource(com.hartwig.pipeline.alignment.sample.SampleSource) RunLogComponent(com.hartwig.pipeline.report.RunLogComponent) PipelineStatus(com.hartwig.pipeline.execution.PipelineStatus) ArrayList(java.util.ArrayList) ReportComponent(com.hartwig.pipeline.report.ReportComponent) AddDatatype(com.hartwig.pipeline.metadata.AddDatatype) ArchivePath(com.hartwig.pipeline.metadata.ArchivePath) BashStartupScript(com.hartwig.pipeline.execution.vm.BashStartupScript) InputDownload(com.hartwig.pipeline.execution.vm.InputDownload) SingleFileComponent(com.hartwig.pipeline.report.SingleFileComponent) Sample(com.hartwig.patient.Sample) Lane(com.hartwig.patient.Lane) SubStageInputOutput(com.hartwig.pipeline.stages.SubStageInputOutput) ImmutableAlignmentOutput(com.hartwig.pipeline.alignment.ImmutableAlignmentOutput) StageTrace(com.hartwig.pipeline.trace.StageTrace) ResourceFilesFactory.buildResourceFiles(com.hartwig.pipeline.resource.ResourceFilesFactory.buildResourceFiles) ResourceFiles(com.hartwig.pipeline.resource.ResourceFiles) OutputUpload(com.hartwig.pipeline.execution.vm.OutputUpload) ImmutableAlignmentOutput(com.hartwig.pipeline.alignment.ImmutableAlignmentOutput) AlignmentOutput(com.hartwig.pipeline.alignment.AlignmentOutput) RuntimeBucket(com.hartwig.pipeline.storage.RuntimeBucket) Future(java.util.concurrent.Future) GoogleStorageLocation(com.hartwig.pipeline.storage.GoogleStorageLocation)

Example 4 with RuntimeBucket

use of com.hartwig.pipeline.storage.RuntimeBucket in project pipeline5 by hartwigmedical.

the class GoogleStorageSampleSource method sample.

@Override
public Sample sample(final SingleSampleRunMetadata metadata) {
    RuntimeBucket runtimeBucket = RuntimeBucket.from(storage, Aligner.NAMESPACE, metadata, arguments, labels);
    Iterable<Blob> blobs = runtimeBucket.list("samples/");
    if (Iterables.isEmpty(blobs)) {
        throw new IllegalArgumentException(String.format("No sample data found in bucket [%s] so there is no input to process. " + "You cannot use the upload=false flag if no sample has already been uploaded", runtimeBucket.name()));
    }
    String sampleDirectory = "/samples/" + metadata.barcode();
    String sampleNameWithPostfix = metadata.barcode();
    List<Lane> lanes = FastqFiles.toLanes(StreamSupport.stream(blobs.spliterator(), false).map(Blob::getName).map(File::new).map(File::getName).collect(Collectors.toList()), sampleDirectory, sampleNameWithPostfix);
    return Sample.builder(sampleNameWithPostfix).addAllLanes(lanes).build();
}
Also used : Blob(com.google.cloud.storage.Blob) RuntimeBucket(com.hartwig.pipeline.storage.RuntimeBucket) Lane(com.hartwig.patient.Lane) File(java.io.File)

Example 5 with RuntimeBucket

use of com.hartwig.pipeline.storage.RuntimeBucket in project pipeline5 by hartwigmedical.

the class StartupScriptComponentTest method copiesStartupScriptIntoReportBucket.

@Test
public void copiesStartupScriptIntoReportBucket() {
    Storage storage = mock(Storage.class);
    RuntimeBucket runtimeBucket = MockRuntimeBucket.test().getRuntimeBucket();
    Bucket reportBucket = mock(Bucket.class);
    when(reportBucket.getName()).thenReturn(REPORT_BUCKET);
    ArgumentCaptor<String> sourceBlobCaptor = ArgumentCaptor.forClass(String.class);
    ArgumentCaptor<String> targetBucketCaptor = ArgumentCaptor.forClass(String.class);
    ArgumentCaptor<String> targetBlobCaptor = ArgumentCaptor.forClass(String.class);
    StartupScriptComponent victim = new StartupScriptComponent(runtimeBucket, "test", Folder.from(TestInputs.referenceRunMetadata()));
    victim.addToReport(storage, reportBucket, "test_set");
    verify(runtimeBucket, times(1)).copyOutOf(sourceBlobCaptor.capture(), targetBucketCaptor.capture(), targetBlobCaptor.capture());
    assertThat(sourceBlobCaptor.getValue()).isEqualTo("copy_of_startup_script_for_run.sh");
    assertThat(targetBucketCaptor.getValue()).isEqualTo(REPORT_BUCKET);
    assertThat(targetBlobCaptor.getValue()).isEqualTo("test_set/reference/test/run.sh");
}
Also used : Storage(com.google.cloud.storage.Storage) Bucket(com.google.cloud.storage.Bucket) MockRuntimeBucket(com.hartwig.pipeline.testsupport.MockRuntimeBucket) RuntimeBucket(com.hartwig.pipeline.storage.RuntimeBucket) MockRuntimeBucket(com.hartwig.pipeline.testsupport.MockRuntimeBucket) RuntimeBucket(com.hartwig.pipeline.storage.RuntimeBucket) Test(org.junit.Test)

Aggregations

RuntimeBucket (com.hartwig.pipeline.storage.RuntimeBucket)9 Storage (com.google.cloud.storage.Storage)5 Bucket (com.google.cloud.storage.Bucket)4 PipelineStatus (com.hartwig.pipeline.execution.PipelineStatus)4 BashStartupScript (com.hartwig.pipeline.execution.vm.BashStartupScript)3 MockRuntimeBucket (com.hartwig.pipeline.testsupport.MockRuntimeBucket)3 Test (org.junit.Test)3 Blob (com.google.cloud.storage.Blob)2 Lane (com.hartwig.patient.Lane)2 ResultsDirectory (com.hartwig.pipeline.ResultsDirectory)2 BashCommand (com.hartwig.pipeline.execution.vm.BashCommand)2 OutputUpload (com.hartwig.pipeline.execution.vm.OutputUpload)2 VirtualMachineJobDefinition (com.hartwig.pipeline.execution.vm.VirtualMachineJobDefinition)2 AddDatatype (com.hartwig.pipeline.metadata.AddDatatype)2 ArchivePath (com.hartwig.pipeline.metadata.ArchivePath)2 Folder (com.hartwig.pipeline.report.Folder)2 ReportComponent (com.hartwig.pipeline.report.ReportComponent)2 RunLogComponent (com.hartwig.pipeline.report.RunLogComponent)2 SingleFileComponent (com.hartwig.pipeline.report.SingleFileComponent)2 SubStageInputOutput (com.hartwig.pipeline.stages.SubStageInputOutput)2