Search in sources :

Example 46 with AminoAcid

use of de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid in project jstructure by JonStargaryen.

the class Start2FoldXmlParser method parseStability.

public static void parseStability(Chain chain, InputStream inputStream) {
    try {
        // assign baseline resp. entry container for each residue if not already happened
        chain.aminoAcids().filter(aminoAcid -> !aminoAcid.getFeatureContainer().getFeatureOptional(Start2FoldResidueAnnotation.class).isPresent()).forEach(aminoAcid -> aminoAcid.getFeatureContainer().addFeature(new Start2FoldResidueAnnotation()));
        Document document = Jsoup.parse(inputStream, "UTF-8", "/");
        Elements experimentElements = document.getElementsByTag("experiment");
        List<Experiment> experiments = experimentElements.stream().map(Experiment::parse).filter(experiment -> experiment.getMethod() == Method.STABILITY).collect(Collectors.toList());
        for (Experiment experiment : experiments) {
            assignValuesForStrong(experiment, chain);
        }
    } catch (IOException e) {
        throw new UncheckedIOException(e);
    }
}
Also used : LoggerFactory(org.slf4j.LoggerFactory) ProteinSequence(org.biojava.nbio.core.sequence.ProteinSequence) ProtectionLevel(de.bioforscher.jstructure.efr.model.ProtectionLevel) Experiment(de.bioforscher.jstructure.efr.model.Experiment) SimpleGapPenalty(org.biojava.nbio.alignment.SimpleGapPenalty) Method(de.bioforscher.jstructure.efr.model.Method) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Chain(de.bioforscher.jstructure.model.structure.Chain) SubstitutionMatrixHelper(org.biojava.nbio.core.alignment.matrices.SubstitutionMatrixHelper) CompoundNotFoundException(org.biojava.nbio.core.exceptions.CompoundNotFoundException) Path(java.nio.file.Path) Logger(org.slf4j.Logger) Start2FoldResidueAnnotation(de.bioforscher.jstructure.efr.model.Start2FoldResidueAnnotation) Files(java.nio.file.Files) SequencePair(org.biojava.nbio.core.alignment.template.SequencePair) AminoAcidCompound(org.biojava.nbio.core.sequence.compound.AminoAcidCompound) IOException(java.io.IOException) Collectors(java.util.stream.Collectors) UncheckedIOException(java.io.UncheckedIOException) List(java.util.List) Stream(java.util.stream.Stream) Alignments(org.biojava.nbio.alignment.Alignments) Document(org.jsoup.nodes.Document) Jsoup(org.jsoup.Jsoup) Elements(org.jsoup.select.Elements) InputStream(java.io.InputStream) Experiment(de.bioforscher.jstructure.efr.model.Experiment) Start2FoldResidueAnnotation(de.bioforscher.jstructure.efr.model.Start2FoldResidueAnnotation) UncheckedIOException(java.io.UncheckedIOException) IOException(java.io.IOException) UncheckedIOException(java.io.UncheckedIOException) Document(org.jsoup.nodes.Document) Elements(org.jsoup.select.Elements)

Example 47 with AminoAcid

use of de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid in project jstructure by JonStargaryen.

the class Start2FoldXmlParser method assignValuesForEarly.

private static void assignValuesForEarly(Experiment experiment, Chain chain) {
    String pdbSequence = chain.getAminoAcidSequence();
    String experimentSequence = experiment.getSequence();
    // align sequences to ensure correct mapping
    SequencePair<ProteinSequence, AminoAcidCompound> pair = null;
    try {
        pair = Alignments.getPairwiseAlignment(new ProteinSequence(experimentSequence), new ProteinSequence(pdbSequence), Alignments.PairwiseSequenceAlignerType.GLOBAL, new SimpleGapPenalty(), SubstitutionMatrixHelper.getBlosum62());
        List<AminoAcid> aminoAcids = chain.aminoAcids().collect(Collectors.toList());
        for (Experiment.Residue residue : experiment.getResidues()) {
            int experimentIndex = residue.getIndex() - 1;
            try {
                int pdbIndex;
                if (residue.getCode().equals("P") && residue.getIndex() == 1) {
                    // super-russian fix for STF0017 where the alignment should match theoretically
                    pdbIndex = 0;
                } else {
                    pdbIndex = pair.getIndexInTargetForQueryAt(experimentIndex);
                }
                AminoAcid aminoAcid = aminoAcids.get(pdbIndex);
                // assign experiment-specific protection level to residue
                aminoAcid.getFeature(Start2FoldResidueAnnotation.class).addProtectionLevelEntry(ProtectionLevel.EARLY);
            } catch (Exception e) {
                // residue not present in structure - e.g. for STF0031 and STF0032
                logger.warn("alignment:{}{}", System.lineSeparator(), pair.toString());
                logger.warn("failed to map residue {}-{}", residue.getCode(), residue.getIndex(), e);
            }
        }
    } catch (CompoundNotFoundException e) {
        throw new IllegalArgumentException(e);
    }
}
Also used : CompoundNotFoundException(org.biojava.nbio.core.exceptions.CompoundNotFoundException) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Experiment(de.bioforscher.jstructure.efr.model.Experiment) Start2FoldResidueAnnotation(de.bioforscher.jstructure.efr.model.Start2FoldResidueAnnotation) SimpleGapPenalty(org.biojava.nbio.alignment.SimpleGapPenalty) CompoundNotFoundException(org.biojava.nbio.core.exceptions.CompoundNotFoundException) IOException(java.io.IOException) UncheckedIOException(java.io.UncheckedIOException) ProteinSequence(org.biojava.nbio.core.sequence.ProteinSequence) AminoAcidCompound(org.biojava.nbio.core.sequence.compound.AminoAcidCompound)

Example 48 with AminoAcid

use of de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid in project jstructure by JonStargaryen.

the class StructuralInformationParserService method parseContactStructuralInformationFile.

public List<ContactStructuralInformation> parseContactStructuralInformationFile(InputStream inputStream, Chain chain, List<AminoAcid> earlyFoldingResidues) {
    Map<Pair<Integer, Integer>, List<String>> parsingMap = new HashMap<>();
    try (Stream<String> stream = new BufferedReader(new InputStreamReader(inputStream)).lines()) {
        stream.forEach(line -> {
            String[] split = line.split("\t");
            String[] idSplit = split[0].split(",");
            Pair<Integer, Integer> idPair = new Pair<>(Integer.valueOf(idSplit[0].split("\\(")[1].trim()), Integer.valueOf(idSplit[1].split("\\)")[0].trim()));
            if (!parsingMap.containsKey(idPair)) {
                parsingMap.put(idPair, new ArrayList<>());
            }
            parsingMap.get(idPair).add(line);
        });
    }
    Map<Pair<Integer, Integer>, List<ReconstructionStructuralInformation>> reconstructionMap = new HashMap<>();
    parsingMap.entrySet().stream().flatMap(entry -> {
        String aa1 = chain.select().residueNumber(entry.getKey().getLeft()).asAminoAcid().getOneLetterCode();
        String aa2 = chain.select().residueNumber(entry.getKey().getRight()).asAminoAcid().getOneLetterCode();
        return entry.getValue().stream().map(line -> line.split("\t")).map(split -> new ReconstructionStructuralInformation(entry.getKey().getLeft(), aa1, entry.getKey().getRight(), aa2, ContactDistanceBin.resolve(new Pair<>(IdentifierFactory.createResidueIdentifier(entry.getKey().getLeft()), IdentifierFactory.createResidueIdentifier(entry.getKey().getRight()))).orElse(null), split[1].equals("true"), Double.valueOf(split[2]), Double.valueOf(split[3]), Double.valueOf(split[4]), Double.valueOf(split[5]), Double.valueOf(split[6]), Double.valueOf(split[7]), Double.valueOf(split[8]), Double.valueOf(split[9]), Double.valueOf(split[10])));
    }).forEach(rsi -> {
        Pair<Integer, Integer> idPair = new Pair<>(rsi.getResidueIdentifier1(), rsi.getResidueIdentifier2());
        if (!reconstructionMap.containsKey(idPair)) {
            reconstructionMap.put(idPair, new ArrayList<>());
        }
        reconstructionMap.get(idPair).add(rsi);
    });
    List<ReconstructionStructuralInformation> reconstructionStructuralInformation = reconstructionMap.values().stream().flatMap(Collection::stream).collect(Collectors.toList());
    int numberOfReconstructions = reconstructionStructuralInformation.size();
    double averageRmsd = reconstructionStructuralInformation.stream().mapToDouble(ReconstructionStructuralInformation::getRmsdIncrease).average().orElse(0.0);
    double standardDeviationRmsd = new StandardDeviation().evaluate(reconstructionStructuralInformation.stream().mapToDouble(ReconstructionStructuralInformation::getRmsdIncrease).toArray());
    double averageMaximumRmsd = reconstructionMap.entrySet().stream().mapToDouble(entry -> entry.getValue().stream().mapToDouble(ReconstructionStructuralInformation::getRmsdIncrease).max().orElse(0.0)).average().orElse(0.0);
    double standardDeviationMaximumRmsd = new StandardDeviation().evaluate(reconstructionMap.entrySet().stream().mapToDouble(entry -> entry.getValue().stream().mapToDouble(ReconstructionStructuralInformation::getRmsdIncrease).max().orElse(0.0)).toArray());
    List<ReconstructionStructuralInformation> topScoringReconstructions = reconstructionMap.values().stream().flatMap(Collection::stream).sorted(Comparator.comparingDouble(ReconstructionStructuralInformation::getRmsdIncrease).reversed()).limit((int) (0.1 * numberOfReconstructions)).collect(Collectors.toList());
    return reconstructionMap.entrySet().stream().map(entry -> {
        List<ReconstructionStructuralInformation> values = entry.getValue();
        ReconstructionStructuralInformation reference = values.get(0);
        return new ContactStructuralInformation(reference.getResidueIdentifier1(), reference.getAa1(), reference.getResidueIdentifier2(), reference.getAa2(), reference.getContactDistanceBin(), computeAverage(values, ReconstructionStructuralInformation::getBaselineRmsd), computeAverage(values, ReconstructionStructuralInformation::getBaselineTmScore), computeAverage(values, ReconstructionStructuralInformation::getBaselineQ), computeAverage(values, ReconstructionStructuralInformation::getRmsdIncrease), computeAverage(values, ReconstructionStructuralInformation::getTmScoreIncrease), computeAverage(values, ReconstructionStructuralInformation::getqIncrease), computeMaximum(values, ReconstructionStructuralInformation::getRmsdIncrease), computeMaximum(values, ReconstructionStructuralInformation::getTmScoreIncrease), computeMaximum(values, ReconstructionStructuralInformation::getqIncrease), residueIsInCollection(earlyFoldingResidues, entry.getKey().getLeft(), entry.getKey().getRight()), contactIsInCollection(earlyFoldingResidues, entry.getKey().getLeft(), entry.getKey().getRight()), averageRmsd, standardDeviationRmsd, averageMaximumRmsd, standardDeviationMaximumRmsd, reconstructionStructuralInformation, topScoringReconstructions, values.stream().map(ReconstructionStructuralInformation::getRmsdIncrease).collect(Collectors.toList()));
    }).collect(Collectors.toList());
}
Also used : java.util(java.util) Files(java.nio.file.Files) ResidueIdentifier(de.bioforscher.jstructure.model.identifier.ResidueIdentifier) Collectors(java.util.stream.Collectors) Pair(de.bioforscher.jstructure.mathematics.Pair) HotSpotScoring(de.bioforscher.jstructure.efr.model.HotSpotScoring) IdentifierFactory(de.bioforscher.jstructure.model.identifier.IdentifierFactory) Stream(java.util.stream.Stream) java.io(java.io) ContactDistanceBin(de.bioforscher.jstructure.efr.model.ContactDistanceBin) Group(de.bioforscher.jstructure.model.structure.Group) ToDoubleFunction(java.util.function.ToDoubleFunction) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Chain(de.bioforscher.jstructure.model.structure.Chain) StandardDeviation(org.apache.commons.math3.stat.descriptive.moment.StandardDeviation) ReconstructionStructuralInformation(de.bioforscher.jstructure.efr.model.si.ReconstructionStructuralInformation) ResidueStructuralInformation(de.bioforscher.jstructure.efr.model.si.ResidueStructuralInformation) StandardFormat(de.bioforscher.jstructure.StandardFormat) ContactStructuralInformation(de.bioforscher.jstructure.efr.model.si.ContactStructuralInformation) Path(java.nio.file.Path) ReconstructionStructuralInformation(de.bioforscher.jstructure.efr.model.si.ReconstructionStructuralInformation) ContactStructuralInformation(de.bioforscher.jstructure.efr.model.si.ContactStructuralInformation) StandardDeviation(org.apache.commons.math3.stat.descriptive.moment.StandardDeviation) Pair(de.bioforscher.jstructure.mathematics.Pair)

Example 49 with AminoAcid

use of de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid in project jstructure by JonStargaryen.

the class A03_PrintStart2FoldDatasetTable method handleLine.

private static String handleLine(String line) {
    try {
        String[] split = line.split(";");
        String entryId = split[0];
        String pdbId = split[1];
        List<Integer> experimentIds = Pattern.compile(",").splitAsStream(split[2].replaceAll("\\[", "").replaceAll("]", "")).map(Integer::valueOf).collect(Collectors.toList());
        Structure structure = StructureParser.fromPdbId(pdbId).parse();
        Chain chain = structure.chains().findFirst().get();
        Start2FoldXmlParser.parseStability(chain, Start2FoldConstants.XML_DIRECTORY.resolve(entryId + ".xml"));
        Start2FoldXmlParser.parseSpecificExperiment(chain, Start2FoldConstants.XML_DIRECTORY.resolve(entryId + ".xml"), experimentIds);
        List<AminoAcid> earlyFoldingResidues = chain.aminoAcids().filter(aminoAcid -> aminoAcid.getFeature(Start2FoldResidueAnnotation.class).isEarly()).collect(Collectors.toList());
        List<AminoAcid> stableResidues = chain.aminoAcids().filter(aminoAcid -> aminoAcid.getFeature(Start2FoldResidueAnnotation.class).isStrong()).collect(Collectors.toList());
        List<Integer> functionalResidueNumbers = Start2FoldConstants.extractFunctionalResidueNumbers(split);
        List<AminoAcid> functionalResidues = new ArrayList<>();
        // do nothing if no annotation of functional residues exists
        if (!functionalResidueNumbers.isEmpty()) {
            FunctionalResidueParser.parse(chain, functionalResidueNumbers);
            chain.aminoAcids().filter(aminoAcid -> aminoAcid.getFeature(FunctionalResidueAnnotation.class).isFunctional()).forEach(functionalResidues::add);
        }
        List<AminoAcid> aminoAcids = chain.aminoAcids().collect(Collectors.toList());
        long intersection = earlyFoldingResidues.stream().filter(functionalResidues::contains).count();
        return entryId + "\t" + pdbId + "\t" + split[2] + "\t" + aminoAcids.size() + "\t" + earlyFoldingResidues.size() + "\t" + functionalResidues.size() + "\t" + intersection;
    } catch (Exception e) {
        e.printStackTrace();
        return "";
    }
}
Also used : FunctionalResidueParser(de.bioforscher.jstructure.efr.parser.FunctionalResidueParser) Start2FoldResidueAnnotation(de.bioforscher.jstructure.efr.model.Start2FoldResidueAnnotation) Files(java.nio.file.Files) Structure(de.bioforscher.jstructure.model.structure.Structure) IOException(java.io.IOException) StructureParser(de.bioforscher.jstructure.model.structure.StructureParser) Collectors(java.util.stream.Collectors) FunctionalResidueAnnotation(de.bioforscher.jstructure.efr.model.FunctionalResidueAnnotation) Start2FoldXmlParser(de.bioforscher.jstructure.efr.parser.Start2FoldXmlParser) Start2FoldConstants(de.bioforscher.jstructure.efr.Start2FoldConstants) ArrayList(java.util.ArrayList) List(java.util.List) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Chain(de.bioforscher.jstructure.model.structure.Chain) Pattern(java.util.regex.Pattern) Comparator(java.util.Comparator) Chain(de.bioforscher.jstructure.model.structure.Chain) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Start2FoldResidueAnnotation(de.bioforscher.jstructure.efr.model.Start2FoldResidueAnnotation) ArrayList(java.util.ArrayList) FunctionalResidueAnnotation(de.bioforscher.jstructure.efr.model.FunctionalResidueAnnotation) IOException(java.io.IOException) Structure(de.bioforscher.jstructure.model.structure.Structure)

Example 50 with AminoAcid

use of de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid in project jstructure by JonStargaryen.

the class A07_WriteCouplingRangeCsv method handleLine.

private static Optional<String> handleLine(String line) {
    try {
        System.out.println(line);
        String[] split = line.split(";");
        String entryId = split[0];
        String pdbId = split[1];
        List<Integer> experimentIds = Pattern.compile(",").splitAsStream(split[2].replaceAll("\\[", "").replaceAll("]", "")).map(Integer::valueOf).collect(Collectors.toList());
        Structure structure = StructureParser.fromPdbId(pdbId).parse();
        Chain chain = structure.chains().findFirst().get();
        Start2FoldXmlParser.parseSpecificExperiment(chain, Start2FoldConstants.XML_DIRECTORY.resolve(entryId + ".xml"), experimentIds);
        EvolutionaryCouplingParser.parseHotSpotFile(chain, Start2FoldConstants.COUPLING_DIRECTORY.resolve(entryId.toUpperCase() + "_hs.html"));
        List<AminoAcid> earlyFoldingResidues = chain.aminoAcids().filter(aminoAcid -> aminoAcid.getFeature(Start2FoldResidueAnnotation.class).isEarly()).collect(Collectors.toList());
        Map<Integer, List<Double>> localPlmScores = new HashMap<>();
        Map<Integer, List<Double>> longRangePlmScores = new HashMap<>();
        Document hotSpotDocument = Jsoup.parse(Files.readAllLines(Paths.get("/home/bittrich/git/phd_sb_repo/data/start2fold/coupling/" + entryId + "_ec.html")).stream().collect(Collectors.joining(System.lineSeparator())));
        List<AminoAcid> aminoAcids = chain.aminoAcids().collect(Collectors.toList());
        for (int i = 0; i < aminoAcids.size(); i++) {
            localPlmScores.put(i, new ArrayList<>());
            longRangePlmScores.put(i, new ArrayList<>());
        }
        hotSpotDocument.getElementsByTag("tr").stream().skip(1).forEach(element -> {
            Elements tds = element.getElementsByTag("td");
            int residueNumber1 = Integer.valueOf(tds.get(2).text()) - 1;
            int residueNumber2 = Integer.valueOf(tds.get(4).text()) - 1;
            double plmScore = Double.valueOf(tds.get(6).text());
            boolean localContact = Math.abs(residueNumber1 - residueNumber2) < 6;
            if (localContact) {
                System.out.println("local contact: " + element.text());
                localPlmScores.get(residueNumber1).add(plmScore);
                localPlmScores.get(residueNumber2).add(plmScore);
            } else {
                System.out.println("long-range contact: " + element.text());
                longRangePlmScores.get(residueNumber1).add(plmScore);
                longRangePlmScores.get(residueNumber2).add(plmScore);
            }
        });
        return Optional.of(aminoAcids.stream().map(aminoAcid -> pdbId + ",A," + aminoAcid.getOneLetterCode() + "," + aminoAcid.getResidueIdentifier().getResidueNumber() + "," + (earlyFoldingResidues.contains(aminoAcid) ? "early" : "late") + "," + "local," + StandardFormat.format(localPlmScores.get(aminoAcid.getResidueIndex()).stream().mapToDouble(Double::valueOf).average().orElse(0.0)) + System.lineSeparator() + pdbId + ",A," + aminoAcid.getOneLetterCode() + "," + aminoAcid.getResidueIdentifier().getResidueNumber() + "," + (earlyFoldingResidues.contains(aminoAcid) ? "early" : "late") + "," + "long-range," + StandardFormat.format(longRangePlmScores.get(aminoAcid.getResidueIndex()).stream().mapToDouble(Double::valueOf).average().orElse(0.0))).collect(Collectors.joining(System.lineSeparator())));
    } catch (Exception e) {
        e.printStackTrace();
        return Optional.empty();
    }
}
Also used : java.util(java.util) Files(java.nio.file.Files) Structure(de.bioforscher.jstructure.model.structure.Structure) IOException(java.io.IOException) StructureParser(de.bioforscher.jstructure.model.structure.StructureParser) Collectors(java.util.stream.Collectors) Start2FoldResidueAnnotation(de.bioforscher.start2fold.model.Start2FoldResidueAnnotation) Start2FoldConstants(de.bioforscher.start2fold.Start2FoldConstants) Paths(java.nio.file.Paths) Document(org.jsoup.nodes.Document) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Start2FoldXmlParser(de.bioforscher.start2fold.parser.Start2FoldXmlParser) Chain(de.bioforscher.jstructure.model.structure.Chain) StandardFormat(de.bioforscher.jstructure.StandardFormat) EvolutionaryCouplingParser(de.bioforscher.start2fold.parser.EvolutionaryCouplingParser) Jsoup(org.jsoup.Jsoup) Elements(org.jsoup.select.Elements) Pattern(java.util.regex.Pattern) Chain(de.bioforscher.jstructure.model.structure.Chain) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Start2FoldResidueAnnotation(de.bioforscher.start2fold.model.Start2FoldResidueAnnotation) Document(org.jsoup.nodes.Document) Elements(org.jsoup.select.Elements) IOException(java.io.IOException) Structure(de.bioforscher.jstructure.model.structure.Structure)

Aggregations

AminoAcid (de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid)66 Chain (de.bioforscher.jstructure.model.structure.Chain)40 Collectors (java.util.stream.Collectors)40 IOException (java.io.IOException)36 Files (java.nio.file.Files)35 List (java.util.List)31 StandardFormat (de.bioforscher.jstructure.StandardFormat)26 StructureParser (de.bioforscher.jstructure.model.structure.StructureParser)26 Path (java.nio.file.Path)25 Structure (de.bioforscher.jstructure.model.structure.Structure)23 Pattern (java.util.regex.Pattern)17 Logger (org.slf4j.Logger)16 LoggerFactory (org.slf4j.LoggerFactory)16 Start2FoldResidueAnnotation (de.bioforscher.jstructure.efr.model.Start2FoldResidueAnnotation)15 UncheckedIOException (java.io.UncheckedIOException)14 ArrayList (java.util.ArrayList)14 Stream (java.util.stream.Stream)14 Start2FoldResidueAnnotation (de.bioforscher.start2fold.model.Start2FoldResidueAnnotation)13 Optional (java.util.Optional)13 Pair (de.bioforscher.jstructure.mathematics.Pair)11