Search in sources :

Example 1 with Cigar

use of htsjdk.samtools.Cigar in project gatk by broadinstitute.

the class GappedAlignmentSplitter method split.

/**
     * Splits a gapped alignment into multiple alignment regions, based on input sensitivity: i.e. if the gap(s) in the input alignment
     * is equal to or larger than the size specified by {@code sensitivity}, two alignment regions will be generated.
     *
     * To fulfill this functionality, needs to accomplish three key tasks correctly:
     * <ul>
     *     <li>generate appropriate cigar,</li>
     *     <li>infer reference coordinates,</li>
     *     <li>infer contig coordinates</li>
     * </ul>
     * As an example, an alignment with CIGAR "397S118M2D26M6I50M7I26M1I8M13D72M398S", when {@code sensitivity} is set to "1", should be split into 5 alignments
     * <ul>
     *     <li>"397S118M594S"</li>
     *     <li>"515S26M568S"</li>
     *     <li>"547S50M512S"</li>
     *     <li>"631S8M470S"</li>
     *     <li>"639S72M398S"</li>
     * </ul>
     * On the other hand, an alignment with CIGAR "10M10D10M60I10M10I10M50D10M", when {@code sensitivity} is set to "50", should be split into 3 alignments
     * <ul>
     *     <li>"10M10D10M100S"</li>
     *     <li>"80S10M10I10M10S"</li>
     *     <li>"110S10M"</li>
     * </ul>
     * And when an alignment has hard clipping adjacent to soft clippings, e.g. "1H2S3M5I10M20D6M7S8H", it should be split into alignments with CIGAR resembling the original CIGAR as much as possible:
     * <ul>
     *     <li>"1H2S3M28S8H"</li>
     *     <li>"1H10S10M13S8H"</li>
     *     <li>"1H20S6M7S8H"</li>
     * </ul>
     *
     * @return an iterable of size >= 1. if size==1, the returned iterable contains only the input (i.e. either no gap or hasn't reached sensitivity)
     */
@VisibleForTesting
public static Iterable<AlignedAssembly.AlignmentInterval> split(final AlignedAssembly.AlignmentInterval oneRegion, final int sensitivity, final int unclippedContigLen) {
    final List<CigarElement> cigarElements = checkCigarAndConvertTerminalInsertionToSoftClip(oneRegion.cigarAlong5to3DirectionOfContig);
    if (cigarElements.size() == 1)
        return new ArrayList<>(Collections.singletonList(oneRegion));
    // blunt guess
    final List<AlignedAssembly.AlignmentInterval> result = new ArrayList<>(3);
    final int originalMapQ = oneRegion.mapQual;
    final List<CigarElement> cigarMemoryList = new ArrayList<>();
    final int clippedNBasesFromStart = SVVariantDiscoveryUtils.getNumClippedBases(true, cigarElements);
    final int hardClippingAtBeginning = cigarElements.get(0).getOperator() == CigarOperator.H ? cigarElements.get(0).getLength() : 0;
    final int hardClippingAtEnd = (cigarElements.get(cigarElements.size() - 1).getOperator() == CigarOperator.H) ? cigarElements.get(cigarElements.size() - 1).getLength() : 0;
    final CigarElement hardClippingAtBeginningMaybeNull = hardClippingAtBeginning == 0 ? null : new CigarElement(hardClippingAtBeginning, CigarOperator.H);
    int contigIntervalStart = 1 + clippedNBasesFromStart;
    // we are walking along the contig following the cigar, which indicates that we might be walking backwards on the reference if oneRegion.forwardStrand==false
    int refBoundary1stInTheDirectionOfContig = oneRegion.forwardStrand ? oneRegion.referenceInterval.getStart() : oneRegion.referenceInterval.getEnd();
    for (final CigarElement cigarElement : cigarElements) {
        final CigarOperator op = cigarElement.getOperator();
        final int operatorLen = cigarElement.getLength();
        switch(op) {
            case M:
            case EQ:
            case X:
            case S:
            case H:
                cigarMemoryList.add(cigarElement);
                break;
            case I:
            case D:
                if (operatorLen < sensitivity) {
                    cigarMemoryList.add(cigarElement);
                    break;
                }
                // collapse cigar memory list into a single cigar for ref & contig interval computation
                final Cigar memoryCigar = new Cigar(cigarMemoryList);
                final int effectiveReadLen = memoryCigar.getReadLength() + SVVariantDiscoveryUtils.getTotalHardClipping(memoryCigar) - SVVariantDiscoveryUtils.getNumClippedBases(true, memoryCigar);
                // task 1: infer reference interval taking into account of strand
                final SimpleInterval referenceInterval;
                if (oneRegion.forwardStrand) {
                    referenceInterval = new SimpleInterval(oneRegion.referenceInterval.getContig(), refBoundary1stInTheDirectionOfContig, refBoundary1stInTheDirectionOfContig + (memoryCigar.getReferenceLength() - 1));
                } else {
                    referenceInterval = new SimpleInterval(oneRegion.referenceInterval.getContig(), // step backward
                    refBoundary1stInTheDirectionOfContig - (memoryCigar.getReferenceLength() - 1), refBoundary1stInTheDirectionOfContig);
                }
                // task 2: infer contig interval
                final int contigIntervalEnd = contigIntervalStart + effectiveReadLen - 1;
                // task 3: now add trailing cigar element and create the real cigar for the to-be-returned AR
                cigarMemoryList.add(new CigarElement(unclippedContigLen - contigIntervalEnd - hardClippingAtEnd, CigarOperator.S));
                if (hardClippingAtEnd != 0) {
                    // be faithful to hard clipping (as the accompanying bases have been hard-clipped away)
                    cigarMemoryList.add(new CigarElement(hardClippingAtEnd, CigarOperator.H));
                }
                final Cigar cigarForNewAlignmentInterval = new Cigar(cigarMemoryList);
                final AlignedAssembly.AlignmentInterval split = new AlignedAssembly.AlignmentInterval(referenceInterval, contigIntervalStart, contigIntervalEnd, cigarForNewAlignmentInterval, oneRegion.forwardStrand, originalMapQ, SVConstants.DiscoveryStepConstants.ARTIFICIAL_MISMATCH);
                result.add(split);
                // update cigar memory
                cigarMemoryList.clear();
                if (hardClippingAtBeginningMaybeNull != null) {
                    // be faithful about hard clippings
                    cigarMemoryList.add(hardClippingAtBeginningMaybeNull);
                }
                cigarMemoryList.add(new CigarElement(contigIntervalEnd - hardClippingAtBeginning + (op.consumesReadBases() ? operatorLen : 0), CigarOperator.S));
                // update pointers into reference and contig
                final int refBoundaryAdvance = op.consumesReadBases() ? memoryCigar.getReferenceLength() : memoryCigar.getReferenceLength() + operatorLen;
                refBoundary1stInTheDirectionOfContig += oneRegion.forwardStrand ? refBoundaryAdvance : -refBoundaryAdvance;
                contigIntervalStart += op.consumesReadBases() ? effectiveReadLen + operatorLen : effectiveReadLen;
                break;
            default:
                // TODO: 1/20/17 still not quite sure if this is quite right, it doesn't blow up on NA12878 WGS, but who knows what happens in the future
                throw new GATKException("Alignment CIGAR contains an unexpected N or P element: " + oneRegion.toPackedString());
        }
    }
    if (result.isEmpty()) {
        return new ArrayList<>(Collections.singletonList(oneRegion));
    }
    final SimpleInterval lastReferenceInterval;
    if (oneRegion.forwardStrand) {
        lastReferenceInterval = new SimpleInterval(oneRegion.referenceInterval.getContig(), refBoundary1stInTheDirectionOfContig, oneRegion.referenceInterval.getEnd());
    } else {
        lastReferenceInterval = new SimpleInterval(oneRegion.referenceInterval.getContig(), oneRegion.referenceInterval.getStart(), refBoundary1stInTheDirectionOfContig);
    }
    final Cigar lastForwardStrandCigar = new Cigar(cigarMemoryList);
    int clippedNBasesFromEnd = SVVariantDiscoveryUtils.getNumClippedBases(false, cigarElements);
    result.add(new AlignedAssembly.AlignmentInterval(lastReferenceInterval, contigIntervalStart, unclippedContigLen - clippedNBasesFromEnd, lastForwardStrandCigar, oneRegion.forwardStrand, originalMapQ, SVConstants.DiscoveryStepConstants.ARTIFICIAL_MISMATCH));
    return result;
}
Also used : ArrayList(java.util.ArrayList) CigarOperator(htsjdk.samtools.CigarOperator) CigarElement(htsjdk.samtools.CigarElement) Cigar(htsjdk.samtools.Cigar) SimpleInterval(org.broadinstitute.hellbender.utils.SimpleInterval) GATKException(org.broadinstitute.hellbender.exceptions.GATKException) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Example 2 with Cigar

use of htsjdk.samtools.Cigar in project gatk by broadinstitute.

the class BreakpointComplications method extractCigarForTandup.

/**
     * Given a {@link AlignedAssembly.AlignmentInterval} from a pair of ARs that forms a {@link ChimericAlignment} signalling a tandem duplication,
     * extract a CIGAR from the {@link AlignedAssembly.AlignmentInterval#cigarAlong5to3DirectionOfContig}
     * that corresponds to the alignment between the suspected repeated sequence on reference between
     * [{@code alignmentIntervalTwoReferenceIntervalSpanBegin}, {@code alignmentIntervalOneReferenceIntervalSpanEnd}],
     * and the sequence in {@link AlignedAssembly.AlignmentInterval#referenceInterval}.
     */
@VisibleForTesting
static Cigar extractCigarForTandup(final AlignedAssembly.AlignmentInterval contigRegion, final int alignmentIntervalOneReferenceIntervalSpanEnd, final int alignmentIntervalTwoReferenceIntervalSpanBegin) {
    final List<CigarElement> elementList = contigRegion.cigarAlong5to3DirectionOfContig.getCigarElements();
    final List<CigarElement> result = new ArrayList<>(elementList.size());
    final int refStart = contigRegion.referenceInterval.getStart(), refEnd = contigRegion.referenceInterval.getEnd();
    final boolean isForwardStrand = contigRegion.forwardStrand;
    boolean initiatedCollection = false;
    int refPos = isForwardStrand ? refStart : refEnd;
    for (final CigarElement cigarElement : elementList) {
        final CigarOperator operator = cigarElement.getOperator();
        if (!operator.isClipping()) {
            final int opLen = cigarElement.getLength();
            refPos += operator.consumesReferenceBases() ? (isForwardStrand ? opLen : -opLen) : 0;
            final int offsetIntoRepeatRegion = isForwardStrand ? refPos - alignmentIntervalTwoReferenceIntervalSpanBegin : alignmentIntervalOneReferenceIntervalSpanEnd - refPos;
            final int overshootOutOfRepeatRegion = isForwardStrand ? refPos - alignmentIntervalOneReferenceIntervalSpanEnd - 1 : alignmentIntervalTwoReferenceIntervalSpanBegin - refPos - 1;
            if (offsetIntoRepeatRegion > 0) {
                if (overshootOutOfRepeatRegion <= 0) {
                    result.add(initiatedCollection ? cigarElement : new CigarElement(offsetIntoRepeatRegion, operator));
                    initiatedCollection = true;
                } else {
                    result.add(new CigarElement(opLen - overshootOutOfRepeatRegion, operator));
                    break;
                }
            }
        }
    }
    return new Cigar(result);
}
Also used : Cigar(htsjdk.samtools.Cigar) CigarOperator(htsjdk.samtools.CigarOperator) CigarElement(htsjdk.samtools.CigarElement) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Example 3 with Cigar

use of htsjdk.samtools.Cigar in project gatk by broadinstitute.

the class AlignmentUtils method getMismatchCount.

/**
     * Count how many bases mismatch the reference.  Indels are not considered mismatching.
     *
     * @param r                   the sam record to check against
     * @param refSeq              the byte array representing the reference sequence
     * @param refIndex            the index in the reference byte array of the read's first base (the reference index
     *                            is matching the alignment start, there may be tons of soft-clipped bases before/after
     *                            that so it's wrong to compare with getLength() here.).  Note that refIndex is
     *                            zero based, not 1 based
     * @param startOnRead         the index in the read's bases from which we start counting
     * @param nReadBases          the number of bases after (but including) startOnRead that we check
     * @return non-null object representing the mismatch count
     */
@SuppressWarnings("fallthrough")
public static MismatchCount getMismatchCount(GATKRead r, byte[] refSeq, int refIndex, int startOnRead, int nReadBases) {
    Utils.nonNull(r);
    Utils.nonNull(refSeq);
    if (refIndex < 0)
        throw new IllegalArgumentException("attempting to calculate the mismatch count with a reference index that is negative");
    if (startOnRead < 0)
        throw new IllegalArgumentException("attempting to calculate the mismatch count with a read start that is negative");
    if (nReadBases < 0)
        throw new IllegalArgumentException("attempting to calculate the mismatch count for a negative number of read bases");
    if (refSeq.length - refIndex < (r.getEnd() - r.getStart()))
        throw new IllegalArgumentException("attempting to calculate the mismatch count against a reference string that is smaller than the read");
    MismatchCount mc = new MismatchCount();
    int readIdx = 0;
    // index of the last base on read we want to count (note we are including soft-clipped bases with this math)
    final int endOnRead = startOnRead + nReadBases - 1;
    final byte[] readSeq = r.getBases();
    final Cigar c = r.getCigar();
    final byte[] readQuals = r.getBaseQualities();
    for (final CigarElement ce : c.getCigarElements()) {
        if (readIdx > endOnRead)
            break;
        final int elementLength = ce.getLength();
        switch(ce.getOperator()) {
            case X:
                mc.numMismatches += elementLength;
                for (int j = 0; j < elementLength; j++) mc.mismatchQualities += readQuals[readIdx + j];
            case EQ:
                refIndex += elementLength;
                readIdx += elementLength;
                break;
            case M:
                for (int j = 0; j < elementLength; j++, refIndex++, readIdx++) {
                    if (refIndex >= refSeq.length)
                        // TODO : It should never happen, we should throw exception here
                        continue;
                    if (readIdx < startOnRead)
                        continue;
                    if (readIdx > endOnRead)
                        break;
                    byte refChr = refSeq[refIndex];
                    byte readChr = readSeq[readIdx];
                    //    continue; // do not count Ns/Xs/etc ?
                    if (readChr != refChr) {
                        mc.numMismatches++;
                        mc.mismatchQualities += readQuals[readIdx];
                    }
                }
                break;
            case I:
            case S:
                readIdx += elementLength;
                break;
            case D:
            case N:
                refIndex += elementLength;
                break;
            case H:
            case P:
                break;
            default:
                throw new GATKException("The " + ce.getOperator() + " cigar element is not currently supported");
        }
    }
    return mc;
}
Also used : Cigar(htsjdk.samtools.Cigar) GATKException(org.broadinstitute.hellbender.exceptions.GATKException) CigarElement(htsjdk.samtools.CigarElement)

Example 4 with Cigar

use of htsjdk.samtools.Cigar in project gatk by broadinstitute.

the class BaseRecalibrationEngine method calculateKnownSites.

protected boolean[] calculateKnownSites(final GATKRead read, final Iterable<? extends Locatable> knownSites) {
    final int readLength = read.getLength();
    //initializes to all false
    final boolean[] knownSitesArray = new boolean[readLength];
    final Cigar cigar = read.getCigar();
    final int softStart = ReadUtils.getSoftStart(read);
    final int softEnd = ReadUtils.getSoftEnd(read);
    for (final Locatable knownSite : knownSites) {
        if (knownSite.getEnd() < softStart || knownSite.getStart() > softEnd) {
            // knownSite is outside clipping window for the read, ignore
            continue;
        }
        int featureStartOnRead = ReadUtils.getReadCoordinateForReferenceCoordinate(softStart, cigar, knownSite.getStart(), ReadUtils.ClippingTail.LEFT_TAIL, true);
        if (featureStartOnRead == ReadUtils.CLIPPING_GOAL_NOT_REACHED) {
            featureStartOnRead = 0;
        }
        int featureEndOnRead = ReadUtils.getReadCoordinateForReferenceCoordinate(softStart, cigar, knownSite.getEnd(), ReadUtils.ClippingTail.LEFT_TAIL, true);
        if (featureEndOnRead == ReadUtils.CLIPPING_GOAL_NOT_REACHED) {
            featureEndOnRead = readLength;
        }
        if (featureStartOnRead > readLength) {
            featureStartOnRead = featureEndOnRead = readLength;
        }
        Arrays.fill(knownSitesArray, Math.max(0, featureStartOnRead), Math.min(readLength, featureEndOnRead + 1), true);
    }
    return knownSitesArray;
}
Also used : Cigar(htsjdk.samtools.Cigar) Locatable(htsjdk.samtools.util.Locatable)

Example 5 with Cigar

use of htsjdk.samtools.Cigar in project gatk by broadinstitute.

the class CigarUtils method trimReadToUnclippedBases.

/**
     * Removes all clipping operators from the cigar.
     */
public static Cigar trimReadToUnclippedBases(final Cigar cigar) {
    Utils.nonNull(cigar, "cigar is null");
    final List<CigarElement> elements = new ArrayList<>(cigar.numCigarElements());
    for (final CigarElement ce : cigar.getCigarElements()) {
        if (!isClipOperator(ce.getOperator())) {
            elements.add(ce);
        }
    }
    return new Cigar(elements);
}
Also used : Cigar(htsjdk.samtools.Cigar) CigarElement(htsjdk.samtools.CigarElement)

Aggregations

Cigar (htsjdk.samtools.Cigar)106 Test (org.testng.annotations.Test)56 CigarElement (htsjdk.samtools.CigarElement)46 BaseTest (org.broadinstitute.hellbender.utils.test.BaseTest)35 GATKRead (org.broadinstitute.hellbender.utils.read.GATKRead)25 SimpleInterval (org.broadinstitute.hellbender.utils.SimpleInterval)14 SplitNCigarReadsUnitTest (org.broadinstitute.hellbender.tools.walkers.rnaseq.SplitNCigarReadsUnitTest)12 ArrayList (java.util.ArrayList)11 GATKException (org.broadinstitute.hellbender.exceptions.GATKException)7 Haplotype (org.broadinstitute.hellbender.utils.haplotype.Haplotype)6 VisibleForTesting (com.google.common.annotations.VisibleForTesting)3 DataProvider (org.testng.annotations.DataProvider)3 CigarOperator (htsjdk.samtools.CigarOperator)2 AssemblyRegion (org.broadinstitute.hellbender.engine.AssemblyRegion)2 AssemblyResultSet (org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyResultSet)2 KBestHaplotype (org.broadinstitute.hellbender.tools.walkers.haplotypecaller.graphs.KBestHaplotype)2 SWPairwiseAlignment (org.broadinstitute.hellbender.utils.smithwaterman.SWPairwiseAlignment)2 SAMFileHeader (htsjdk.samtools.SAMFileHeader)1 Locatable (htsjdk.samtools.util.Locatable)1 VariantContext (htsjdk.variant.variantcontext.VariantContext)1