Search in sources :

Example 61 with AminoAcid

use of de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid in project jstructure by JonStargaryen.

the class A02_CreatePyMolRenderJobsForStrongResidues method composePyMolCommand.

private static Optional<String> composePyMolCommand(Path path) {
    try {
        String entryId = path.toFile().getName().split("\\.")[0];
        String pdbId = Jsoup.parse(path.toFile(), "UTF-8").getElementsByTag("protein").attr("pdb_id");
        Structure structure = StructureParser.fromPdbId(pdbId).parse();
        Chain chain = structure.chains().findFirst().get();
        Start2FoldXmlParser.parse(chain, Start2FoldConstants.XML_DIRECTORY.resolve(entryId + ".xml"));
        List<Integer> strongResidues = chain.aminoAcids().filter(aminoAcid -> aminoAcid.getFeature(Start2FoldResidueAnnotation.class).isStrong()).map(AminoAcid::getResidueIdentifier).map(ResidueIdentifier::getResidueNumber).collect(Collectors.toList());
        if (strongResidues.isEmpty()) {
            return Optional.empty();
        }
        return Optional.of("delete all" + System.lineSeparator() + "fetch " + pdbId + ", async=0" + System.lineSeparator() + // hide non-relevant stuff
        "hide everything" + System.lineSeparator() + "show cartoon, chain A" + System.lineSeparator() + // decolor everything
        "color grey80" + System.lineSeparator() + "zoom (chain A)" + System.lineSeparator() + strongResidues.stream().map(res -> "color efr, resi " + res).collect(Collectors.joining(System.lineSeparator())) + System.lineSeparator() + "ray" + System.lineSeparator() + "png " + Start2FoldConstants.PYMOL_DIRECTORY.resolve(entryId + "-strong.png") + System.lineSeparator());
    } catch (IOException e) {
        return Optional.empty();
    }
}
Also used : Start2FoldResidueAnnotation(de.bioforscher.jstructure.efr.model.Start2FoldResidueAnnotation) Files(java.nio.file.Files) ResidueIdentifier(de.bioforscher.jstructure.model.identifier.ResidueIdentifier) Structure(de.bioforscher.jstructure.model.structure.Structure) IOException(java.io.IOException) StructureParser(de.bioforscher.jstructure.model.structure.StructureParser) Collectors(java.util.stream.Collectors) Start2FoldXmlParser(de.bioforscher.jstructure.efr.parser.Start2FoldXmlParser) Start2FoldConstants(de.bioforscher.jstructure.efr.Start2FoldConstants) List(java.util.List) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Chain(de.bioforscher.jstructure.model.structure.Chain) Optional(java.util.Optional) StandardFormat(de.bioforscher.jstructure.StandardFormat) Jsoup(org.jsoup.Jsoup) Path(java.nio.file.Path) Chain(de.bioforscher.jstructure.model.structure.Chain) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Start2FoldResidueAnnotation(de.bioforscher.jstructure.efr.model.Start2FoldResidueAnnotation) IOException(java.io.IOException) Structure(de.bioforscher.jstructure.model.structure.Structure)

Example 62 with AminoAcid

use of de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid in project jstructure by JonStargaryen.

the class A04_ComputeContactTypeFrequencies method handleLine.

private static void handleLine(String line) {
    String[] split = line.split(";");
    String entryId = split[0];
    String pdbId = split[1];
    List<Integer> experimentIds = Pattern.compile(",").splitAsStream(split[2].replaceAll("\\[", "").replaceAll("]", "")).map(Integer::valueOf).collect(Collectors.toList());
    Structure structure = StructureParser.fromPdbId(pdbId).parse();
    Chain chain = structure.chains().findFirst().get();
    Start2FoldXmlParser.parseSpecificExperiment(chain, Start2FoldConstants.XML_DIRECTORY.resolve(entryId + ".xml"), experimentIds);
    List<AminoAcid> earlyFoldingResidues = chain.aminoAcids().filter(aminoAcid -> aminoAcid.getFeature(Start2FoldResidueAnnotation.class).isEarly()).collect(Collectors.toList());
    List<Integer> functionalResidueNumbers = Start2FoldConstants.extractFunctionalResidueNumbers(split);
    List<AminoAcid> functionalResidues = new ArrayList<>();
    // do nothing if no annotation of functional residues exists
    if (!functionalResidueNumbers.isEmpty()) {
        FunctionalResidueParser.parse(chain, functionalResidueNumbers);
        chain.aminoAcids().filter(aminoAcid -> aminoAcid.getFeature(FunctionalResidueAnnotation.class).isFunctional()).forEach(functionalResidues::add);
    }
    efr += earlyFoldingResidues.size();
    func += functionalResidues.size();
    efr_hb += earlyFoldingResidues.stream().filter(aminoAcid -> aminoAcid.getFeature(PLIPInteractionContainer.class).getHydrogenBonds().size() > 0).count();
    efr_hi += earlyFoldingResidues.stream().filter(aminoAcid -> aminoAcid.getFeature(PLIPInteractionContainer.class).getHydrophobicInteractions().size() > 0).count();
    func_hb += functionalResidues.stream().filter(aminoAcid -> aminoAcid.getFeature(PLIPInteractionContainer.class).getHydrogenBonds().size() > 0).count();
    func_hi += functionalResidues.stream().filter(aminoAcid -> aminoAcid.getFeature(PLIPInteractionContainer.class).getHydrophobicInteractions().size() > 0).count();
}
Also used : FunctionalResidueParser(de.bioforscher.jstructure.efr.parser.FunctionalResidueParser) Start2FoldResidueAnnotation(de.bioforscher.jstructure.efr.model.Start2FoldResidueAnnotation) Files(java.nio.file.Files) Structure(de.bioforscher.jstructure.model.structure.Structure) IOException(java.io.IOException) StructureParser(de.bioforscher.jstructure.model.structure.StructureParser) Collectors(java.util.stream.Collectors) FunctionalResidueAnnotation(de.bioforscher.jstructure.efr.model.FunctionalResidueAnnotation) Start2FoldXmlParser(de.bioforscher.jstructure.efr.parser.Start2FoldXmlParser) Start2FoldConstants(de.bioforscher.jstructure.efr.Start2FoldConstants) ArrayList(java.util.ArrayList) List(java.util.List) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Chain(de.bioforscher.jstructure.model.structure.Chain) StandardFormat(de.bioforscher.jstructure.StandardFormat) Pattern(java.util.regex.Pattern) PLIPInteractionContainer(de.bioforscher.jstructure.feature.interaction.PLIPInteractionContainer) Comparator(java.util.Comparator) Chain(de.bioforscher.jstructure.model.structure.Chain) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Start2FoldResidueAnnotation(de.bioforscher.jstructure.efr.model.Start2FoldResidueAnnotation) ArrayList(java.util.ArrayList) FunctionalResidueAnnotation(de.bioforscher.jstructure.efr.model.FunctionalResidueAnnotation) Structure(de.bioforscher.jstructure.model.structure.Structure)

Example 63 with AminoAcid

use of de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid in project jstructure by JonStargaryen.

the class ResidueGraphCalculations method shortestPathsPassingThrough.

/**
 * Determines the set of shortest paths passing through 1 particular edge.
 * @param contact
 * @return
 */
private List<GraphPath<AminoAcid, DefaultEdge>> shortestPathsPassingThrough(Pair<AminoAcid, AminoAcid> contact) {
    AminoAcid aminoAcid1 = contact.getLeft();
    AminoAcid aminoAcid2 = contact.getRight();
    return SetOperations.unorderedPairsOf(nodes).map(pair -> shortestPaths.get(pair.getLeft()).getPath(pair.getRight())).filter(graphPath -> {
        int index1 = graphPath.getVertexList().indexOf(aminoAcid1);
        int index2 = graphPath.getVertexList().indexOf(aminoAcid2);
        return index1 != -1 && index2 != -1 && Math.abs(index1 - index2) == 1;
    }).collect(Collectors.toList());
}
Also used : GraphPath(org.jgrapht.GraphPath) java.util(java.util) DefaultEdge(org.jgrapht.graph.DefaultEdge) ShortestPathAlgorithm(org.jgrapht.alg.interfaces.ShortestPathAlgorithm) ResidueIdentifier(de.bioforscher.jstructure.model.identifier.ResidueIdentifier) Group(de.bioforscher.jstructure.model.structure.Group) SetOperations(de.bioforscher.jstructure.mathematics.SetOperations) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Collectors(java.util.stream.Collectors) Pair(de.bioforscher.jstructure.mathematics.Pair) DijkstraShortestPath(org.jgrapht.alg.shortestpath.DijkstraShortestPath) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid)

Example 64 with AminoAcid

use of de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid in project jstructure by JonStargaryen.

the class A07_WriteStructuralInformationByContactCsv method handleLine.

private static Optional<String> handleLine(String line) {
    try {
        System.out.println(line);
        String[] split = line.split(";");
        String entryId = split[0];
        String pdbId = split[1];
        List<Integer> experimentIds = Pattern.compile(",").splitAsStream(split[2].replaceAll("\\[", "").replaceAll("]", "")).map(Integer::valueOf).collect(Collectors.toList());
        // boolean sane = split[6].equalsIgnoreCase("true");
        Structure structure = StructureParser.fromPdbId(pdbId).parse();
        Chain chain = structure.chains().findFirst().get();
        LinearAlgebra.PrimitiveDoubleArrayLinearAlgebra centroid = chain.calculate().centroid();
        Path start2foldXml = Start2FoldConstants.XML_DIRECTORY.resolve(entryId + ".xml");
        Start2FoldXmlParser.parseStability(chain, start2foldXml);
        Start2FoldXmlParser.parseSpecificExperiment(chain, start2foldXml, experimentIds);
        List<AminoAcid> earlyFoldingResidues = chain.aminoAcids().filter(aminoAcid -> aminoAcid.getFeature(Start2FoldResidueAnnotation.class).isEarly()).collect(Collectors.toList());
        List<Integer> functionalResidueNumbers = Start2FoldConstants.extractFunctionalResidueNumbers(split);
        List<AminoAcid> functionalResidues = new ArrayList<>();
        // do nothing if no annotation of functional residues exists
        if (!functionalResidueNumbers.isEmpty()) {
            FunctionalResidueParser.parse(chain, functionalResidueNumbers);
            chain.aminoAcids().filter(aminoAcid -> aminoAcid.getFeature(FunctionalResidueAnnotation.class).isFunctional()).forEach(functionalResidues::add);
        }
        List<AminoAcid> strongResidues = chain.aminoAcids().filter(aminoAcid -> aminoAcid.getFeature(Start2FoldResidueAnnotation.class).isStrong()).collect(Collectors.toList());
        List<AminoAcid> orderedResidues = chain.aminoAcids().filter(aminoAcid -> !aminoAcid.getFeature(GenericSecondaryStructure.class).getSecondaryStructure().isCoilType()).collect(Collectors.toList());
        List<AminoAcid> buriedResidues = chain.aminoAcids().filter(aminoAcid -> aminoAcid.getFeature(AccessibleSurfaceArea.class).isBuried()).collect(Collectors.toList());
        List<AminoAcid> residuesInEarlyFoldingSecondaryStructureElements = chain.aminoAcids().filter(aminoAcid -> !aminoAcid.getFeature(GenericSecondaryStructure.class).getSecondaryStructure().isCoilType()).filter(aminoAcid -> {
            GenericSecondaryStructure.SecondaryStructureElement surroundingSecondaryStructureElement = aminoAcid.getFeature(GenericSecondaryStructure.class).getSurroundingSecondaryStructureElement(aminoAcid);
            List<AminoAcid> surroundingAminoAcids = chain.getAminoAcids().subList(surroundingSecondaryStructureElement.getStart(), surroundingSecondaryStructureElement.getEnd() + 1);
            return surroundingAminoAcids.stream().anyMatch(earlyFoldingResidues::contains);
        }).collect(Collectors.toList());
        List<AminoAcid> aromaticResidues = chain.aminoAcids().filter(AminoAcid.Filter.AROMATIC).collect(Collectors.toList());
        List<ContactStructuralInformation> contactStructuralInformation = StructuralInformationParserService.getInstance().parseContactStructuralInformation(Start2FoldConstants.DATA_DIRECTORY.resolve("si").resolve("raw").resolve(entryId.toUpperCase() + ".out"), chain, earlyFoldingResidues);
        ResidueGraph conventionalProteinGraph = ResidueGraph.createResidueGraph(chain, ContactDefinitionFactory.createAlphaCarbonContactDefinition(8.0));
        ResidueGraphCalculations residueGraphCalculations = new ResidueGraphCalculations(conventionalProteinGraph);
        try {
            EvolutionaryCouplingParser.parsePlmScore(contactStructuralInformation, Jsoup.parse(Start2FoldConstants.newInputStream(Start2FoldConstants.COUPLING_DIRECTORY.resolve(entryId + "_ec.html")), "UTF-8", ""), chain.getAminoAcids().size());
        } catch (Exception e) {
        }
        boolean ecAnnotation = contactStructuralInformation.stream().anyMatch(csi -> csi.getPlmScore() != 0.0);
        PLIPInteractionContainer plipInteractionContainer = chain.getFeature(PLIPInteractionContainer.class);
        System.out.println("efr: " + (earlyFoldingResidues.size() > 0) + " strong: " + (strongResidues.size() > 0) + " functional: " + (functionalResidues.size() > 0) + " couplings: " + ecAnnotation);
        return Optional.of(contactStructuralInformation.stream().map(contact -> {
            AminoAcid aminoAcid1 = chain.select().residueNumber(contact.getResidueIdentifier1()).asAminoAcid();
            AminoAcid aminoAcid2 = chain.select().residueNumber(contact.getResidueIdentifier2()).asAminoAcid();
            Pair<AminoAcid, AminoAcid> pair = new Pair<>(aminoAcid1, aminoAcid2);
            ResidueTopologicPropertiesContainer residueTopologicPropertiesContainer1 = aminoAcid1.getFeature(ResidueTopologicPropertiesContainer.class);
            ResidueTopologicPropertiesContainer residueTopologicPropertiesContainer2 = aminoAcid1.getFeature(ResidueTopologicPropertiesContainer.class);
            LinearAlgebra.PrimitiveDoubleArrayLinearAlgebra contactCentroid = aminoAcid1.calculate().centroid().add(aminoAcid2.calculate().centroid()).divide(2);
            return pdbId + "," + "A" + "," + contact.getResidueIdentifier1() + "," + contact.getAa1() + "," + contact.getResidueIdentifier2() + "," + contact.getAa2() + "," + contact.getContactDistanceBin() + "," + (contact.getContactDistanceBin() == ContactDistanceBin.LONG) + "," + (contact.getContactDistanceBin() == ContactDistanceBin.MEDIUM) + "," + (contact.getContactDistanceBin() == ContactDistanceBin.SHORT) + "," + StandardFormat.format(contactCentroid.distance(centroid)) + "," + StandardFormat.format(contact.getAverageRmsdIncrease()) + "," + StandardFormat.format(contact.getAverageTmScoreIncrease()) + "," + StandardFormat.format(contact.getAverageQIncrease()) + "," + StandardFormat.format(contact.getMaximumRmsdIncrease()) + "," + StandardFormat.format(contact.getMaximumTmScoreIncrease()) + "," + StandardFormat.format(contact.getMaximumQIncrease()) + "," + StandardFormat.format(contact.getAverageRmsdIncreaseZScore()) + "," + contact.getFractionOfTopScoringContacts() + "," + StandardFormat.format(contact.getPlmScore()) + "," + contact.getCouplingRank() + "," + contact.istop02() + "," + contact.isTop04() + "," + contact.isTop06() + "," + contact.isTop08() + "," + contact.isTop10() + "," + contact.isTop12() + "," + contact.isTop14() + "," + contact.isTop16() + "," + StandardFormat.format(residueGraphCalculations.betweenness(pair)) + "," + StandardFormat.format(0.5 * residueTopologicPropertiesContainer1.getConventional().getBetweenness() + 0.5 * residueTopologicPropertiesContainer2.getConventional().getBetweenness()) + "," + StandardFormat.format(0.5 * residueTopologicPropertiesContainer1.getConventional().getCloseness() + 0.5 * residueTopologicPropertiesContainer2.getConventional().getCloseness()) + "," + StandardFormat.format(0.5 * residueTopologicPropertiesContainer1.getConventional().getClusteringCoefficient() + 0.5 * residueTopologicPropertiesContainer2.getConventional().getClusteringCoefficient()) + "," + plipInteractionContainer.getHydrogenBonds().stream().anyMatch(hydrogenBond -> isContact(hydrogenBond, aminoAcid1, aminoAcid2)) + "," + plipInteractionContainer.getHydrophobicInteractions().stream().anyMatch(hydrophobicInteraction -> isContact(hydrophobicInteraction, aminoAcid1, aminoAcid2)) + "," + contact.isEarlyFoldingResidue() + "," + contact.isEarlyFoldingContact() + "," + residueIsInCollection(functionalResidues, contact.getResidueIdentifier1(), contact.getResidueIdentifier2()) + "," + contactIsInCollection(functionalResidues, contact.getResidueIdentifier1(), contact.getResidueIdentifier2()) + "," + residueIsInCollection(strongResidues, contact.getResidueIdentifier1(), contact.getResidueIdentifier2()) + "," + contactIsInCollection(strongResidues, contact.getResidueIdentifier1(), contact.getResidueIdentifier2()) + "," + residueIsInCollection(buriedResidues, contact.getResidueIdentifier1(), contact.getResidueIdentifier2()) + "," + contactIsInCollection(buriedResidues, contact.getResidueIdentifier1(), contact.getResidueIdentifier2()) + "," + residueIsInCollection(orderedResidues, contact.getResidueIdentifier1(), contact.getResidueIdentifier2()) + "," + contactIsInCollection(orderedResidues, contact.getResidueIdentifier1(), contact.getResidueIdentifier2()) + "," + residueIsInCollection(residuesInEarlyFoldingSecondaryStructureElements, contact.getResidueIdentifier1(), contact.getResidueIdentifier2()) + "," + contactIsInCollection(residuesInEarlyFoldingSecondaryStructureElements, contact.getResidueIdentifier1(), contact.getResidueIdentifier2()) + "," + residueIsInCollection(aromaticResidues, contact.getResidueIdentifier1(), contact.getResidueIdentifier2()) + "," + contactIsInCollection(aromaticResidues, contact.getResidueIdentifier1(), contact.getResidueIdentifier2()) + "," + (earlyFoldingResidues.size() > 0) + "," + (strongResidues.size() > 0) + "," + (functionalResidues.size() > 0) + "," + ecAnnotation;
        }).collect(Collectors.joining(System.lineSeparator())));
    } catch (Exception e) {
        logger.info("calculation failed for {}\nby: {}", line, e.getMessage());
        return Optional.empty();
    }
}
Also used : FunctionalResidueParser(de.bioforscher.jstructure.efr.parser.FunctionalResidueParser) LinearAlgebra(de.bioforscher.jstructure.mathematics.LinearAlgebra) StructuralInformationParserService(de.bioforscher.jstructure.efr.parser.StructuralInformationParserService) PLIPInteraction(de.bioforscher.jstructure.feature.interaction.PLIPInteraction) ResidueIdentifier(de.bioforscher.jstructure.model.identifier.ResidueIdentifier) LoggerFactory(org.slf4j.LoggerFactory) Structure(de.bioforscher.jstructure.model.structure.Structure) GenericSecondaryStructure(de.bioforscher.jstructure.feature.sse.GenericSecondaryStructure) StructureParser(de.bioforscher.jstructure.model.structure.StructureParser) ArrayList(java.util.ArrayList) ContactDistanceBin(de.bioforscher.jstructure.efr.model.ContactDistanceBin) Group(de.bioforscher.jstructure.model.structure.Group) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Chain(de.bioforscher.jstructure.model.structure.Chain) StandardFormat(de.bioforscher.jstructure.StandardFormat) Path(java.nio.file.Path) Logger(org.slf4j.Logger) Start2FoldResidueAnnotation(de.bioforscher.jstructure.efr.model.Start2FoldResidueAnnotation) Files(java.nio.file.Files) IOException(java.io.IOException) Collectors(java.util.stream.Collectors) FunctionalResidueAnnotation(de.bioforscher.jstructure.efr.model.FunctionalResidueAnnotation) Start2FoldXmlParser(de.bioforscher.jstructure.efr.parser.Start2FoldXmlParser) Pair(de.bioforscher.jstructure.mathematics.Pair) Start2FoldConstants(de.bioforscher.jstructure.efr.Start2FoldConstants) ResidueGraphCalculations(de.bioforscher.jstructure.graph.ResidueGraphCalculations) List(java.util.List) AccessibleSurfaceArea(de.bioforscher.jstructure.feature.asa.AccessibleSurfaceArea) ResidueTopologicPropertiesContainer(de.bioforscher.jstructure.graph.ResidueTopologicPropertiesContainer) ResidueGraph(de.bioforscher.jstructure.graph.ResidueGraph) EvolutionaryCouplingParser(de.bioforscher.jstructure.efr.parser.EvolutionaryCouplingParser) Optional(java.util.Optional) Jsoup(org.jsoup.Jsoup) Pattern(java.util.regex.Pattern) ContactStructuralInformation(de.bioforscher.jstructure.efr.model.si.ContactStructuralInformation) PLIPInteractionContainer(de.bioforscher.jstructure.feature.interaction.PLIPInteractionContainer) ContactDefinitionFactory(de.bioforscher.jstructure.graph.contact.definition.ContactDefinitionFactory) Chain(de.bioforscher.jstructure.model.structure.Chain) ArrayList(java.util.ArrayList) GenericSecondaryStructure(de.bioforscher.jstructure.feature.sse.GenericSecondaryStructure) ContactStructuralInformation(de.bioforscher.jstructure.efr.model.si.ContactStructuralInformation) ArrayList(java.util.ArrayList) List(java.util.List) Structure(de.bioforscher.jstructure.model.structure.Structure) GenericSecondaryStructure(de.bioforscher.jstructure.feature.sse.GenericSecondaryStructure) AccessibleSurfaceArea(de.bioforscher.jstructure.feature.asa.AccessibleSurfaceArea) Pair(de.bioforscher.jstructure.mathematics.Pair) Path(java.nio.file.Path) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Start2FoldResidueAnnotation(de.bioforscher.jstructure.efr.model.Start2FoldResidueAnnotation) ResidueTopologicPropertiesContainer(de.bioforscher.jstructure.graph.ResidueTopologicPropertiesContainer) FunctionalResidueAnnotation(de.bioforscher.jstructure.efr.model.FunctionalResidueAnnotation) IOException(java.io.IOException) LinearAlgebra(de.bioforscher.jstructure.mathematics.LinearAlgebra) ResidueGraph(de.bioforscher.jstructure.graph.ResidueGraph) PLIPInteractionContainer(de.bioforscher.jstructure.feature.interaction.PLIPInteractionContainer) ResidueGraphCalculations(de.bioforscher.jstructure.graph.ResidueGraphCalculations)

Example 65 with AminoAcid

use of de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid in project jstructure by JonStargaryen.

the class A01_CreateContactMaps method handleFile.

private static void handleFile(String line) {
    String[] split = line.split(";");
    String stfId = split[0];
    String pdbId = split[1];
    List<Integer> experimentIds = Pattern.compile(",").splitAsStream(split[2].replaceAll("\\[", "").replaceAll("]", "")).map(Integer::valueOf).collect(Collectors.toList());
    Structure structure = StructureParser.fromPath(BASE_PATH.resolve("pdb").resolve(stfId + ".pdb")).parse();
    Chain chain = structure.chainsWithAminoAcids().findFirst().get();
    String sequence = chain.getAminoAcidSequence();
    String secondaryStructureString = chain.aminoAcids().map(aminoAcid -> aminoAcid.getFeature(GenericSecondaryStructure.class)).map(GenericSecondaryStructure::getSecondaryStructure).map(SecondaryStructureType::getReducedRepresentation).collect(Collectors.joining()).toUpperCase();
    Start2FoldConstants.write(BASE_PATH.resolve("reconstruction").resolve("fasta").resolve(stfId + ".fasta"), ">" + stfId + System.lineSeparator() + sequence);
    Start2FoldConstants.write(BASE_PATH.resolve("reconstruction").resolve("sse").resolve(stfId + ".sse"), ">" + stfId + System.lineSeparator() + secondaryStructureString);
    Start2FoldXmlParser.parseSpecificExperiment(chain, Start2FoldConstants.XML_DIRECTORY.resolve(stfId + ".xml"), experimentIds);
    List<AminoAcid> aminoAcids = chain.aminoAcids().collect(Collectors.toList());
    List<AminoAcid> earlyFoldingResidues = chain.aminoAcids().filter(aminoAcid -> aminoAcid.getFeature(Start2FoldResidueAnnotation.class).isEarly()).collect(Collectors.toList());
    List<Pair<AminoAcid, AminoAcid>> contacts = SetOperations.unorderedPairsOf(aminoAcids).filter(pair -> areNonCovalentGroups(pair.getLeft(), pair.getRight())).filter(pair -> ProteinGraphFactory.InteractionScheme.CALPHA8.areInContact(pair.getLeft(), pair.getRight())).collect(Collectors.toList());
    List<Pair<AminoAcid, AminoAcid>> earlyFoldingContacts = contacts.stream().filter(pair -> earlyFoldingResidues.contains(pair.getLeft()) && earlyFoldingResidues.contains(pair.getRight())).collect(Collectors.toList());
    String percentage = StandardFormat.formatToInteger(100 * earlyFoldingContacts.size() / (double) contacts.size());
    System.out.println("fraction of EFR contacts is " + percentage + "%: " + earlyFoldingContacts.size() + " " + contacts.size());
    Start2FoldConstants.write(MAP_PATH.resolve(stfId + "-sampled-100-1.rr"), composeRRString(contacts, sequence));
    Start2FoldConstants.write(MAP_PATH.resolve(stfId + "-efr-" + percentage + "-1.rr"), composeRRString(earlyFoldingContacts, sequence));
    for (int i = 5; i < 100; i = i + 5) {
        int numberOfContactsToSelect = (int) (i / (double) 100 * contacts.size());
        for (int j = 1; j < 6; j++) {
            Collections.shuffle(contacts);
            List<Pair<AminoAcid, AminoAcid>> selectedContacts = contacts.subList(0, numberOfContactsToSelect);
            Start2FoldConstants.write(MAP_PATH.resolve(stfId + "-random-" + i + "-" + j + ".rr"), composeRRString(selectedContacts, sequence));
        }
    }
    // create samplings of random residues
    for (int i = 5; i < 100; i = i + 5) {
        int numberOfResiduesToSelect = (int) (i / (double) 100 * aminoAcids.size());
        for (int j = 1; j < 6; j++) {
            Collections.shuffle(aminoAcids);
            List<AminoAcid> selectedAminoAcids = aminoAcids.subList(0, numberOfResiduesToSelect);
            Start2FoldConstants.write(MAP_PATH.resolve(stfId + "-residues-" + i + "-" + j + ".rr"), composeRRString(contacts.stream().filter(contact -> selectedAminoAcids.contains(contact.getLeft()) && selectedAminoAcids.contains(contact.getRight())).collect(Collectors.toList()), sequence));
        }
    }
    // create samplings of comparable nature of EFR contacts
    for (int j = 1; j < 6; j++) {
        int numberOfResiduesToSelect = earlyFoldingResidues.size();
        List<AminoAcid> interactingResidues = getInteractingResidues(aminoAcids, contacts, numberOfResiduesToSelect);
        Start2FoldConstants.write(MAP_PATH.resolve(stfId + "-interacting-" + percentage + "-" + j + ".rr"), composeRRString(contacts.stream().filter(contact -> interactingResidues.contains(contact.getLeft()) && interactingResidues.contains(contact.getRight())).collect(Collectors.toList()), sequence));
    }
    // create bin samplings of comparable nature of EFR contacts
    for (int i = 5; i < 100; i = i + 5) {
        int numberOfResiduesToSelect = (int) (i / (double) 100 * aminoAcids.size());
        for (int j = 1; j < 6; j++) {
            List<AminoAcid> interactingResidues = getInteractingResidues(aminoAcids, contacts, numberOfResiduesToSelect);
            Start2FoldConstants.write(MAP_PATH.resolve(stfId + "-interacting2-" + percentage + "-" + j + ".rr"), composeRRString(contacts.stream().filter(contact -> interactingResidues.contains(contact.getLeft()) && interactingResidues.contains(contact.getRight())).collect(Collectors.toList()), sequence));
        }
    }
}
Also used : Files(java.nio.file.Files) Structure(de.bioforscher.jstructure.model.structure.Structure) IOException(java.io.IOException) GenericSecondaryStructure(de.bioforscher.jstructure.feature.sse.GenericSecondaryStructure) StructureParser(de.bioforscher.jstructure.model.structure.StructureParser) Collectors(java.util.stream.Collectors) Pair(de.bioforscher.jstructure.mathematics.Pair) ArrayList(java.util.ArrayList) Start2FoldResidueAnnotation(de.bioforscher.start2fold.model.Start2FoldResidueAnnotation) List(java.util.List) Start2FoldConstants(de.bioforscher.start2fold.Start2FoldConstants) Paths(java.nio.file.Paths) SecondaryStructureType(de.bioforscher.jstructure.feature.sse.SecondaryStructureType) Group(de.bioforscher.jstructure.model.structure.Group) ProteinGraphFactory(de.bioforscher.jstructure.feature.graphs.ProteinGraphFactory) SetOperations(de.bioforscher.jstructure.mathematics.SetOperations) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Start2FoldXmlParser(de.bioforscher.start2fold.parser.Start2FoldXmlParser) Chain(de.bioforscher.jstructure.model.structure.Chain) Optional(java.util.Optional) StandardFormat(de.bioforscher.jstructure.StandardFormat) Pattern(java.util.regex.Pattern) Path(java.nio.file.Path) Collections(java.util.Collections) Chain(de.bioforscher.jstructure.model.structure.Chain) SecondaryStructureType(de.bioforscher.jstructure.feature.sse.SecondaryStructureType) AminoAcid(de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid) Start2FoldResidueAnnotation(de.bioforscher.start2fold.model.Start2FoldResidueAnnotation) GenericSecondaryStructure(de.bioforscher.jstructure.feature.sse.GenericSecondaryStructure) Structure(de.bioforscher.jstructure.model.structure.Structure) GenericSecondaryStructure(de.bioforscher.jstructure.feature.sse.GenericSecondaryStructure) Pair(de.bioforscher.jstructure.mathematics.Pair)

Aggregations

AminoAcid (de.bioforscher.jstructure.model.structure.aminoacid.AminoAcid)66 Chain (de.bioforscher.jstructure.model.structure.Chain)40 Collectors (java.util.stream.Collectors)40 IOException (java.io.IOException)36 Files (java.nio.file.Files)35 List (java.util.List)31 StandardFormat (de.bioforscher.jstructure.StandardFormat)26 StructureParser (de.bioforscher.jstructure.model.structure.StructureParser)26 Path (java.nio.file.Path)25 Structure (de.bioforscher.jstructure.model.structure.Structure)23 Pattern (java.util.regex.Pattern)17 Logger (org.slf4j.Logger)16 LoggerFactory (org.slf4j.LoggerFactory)16 Start2FoldResidueAnnotation (de.bioforscher.jstructure.efr.model.Start2FoldResidueAnnotation)15 UncheckedIOException (java.io.UncheckedIOException)14 ArrayList (java.util.ArrayList)14 Stream (java.util.stream.Stream)14 Start2FoldResidueAnnotation (de.bioforscher.start2fold.model.Start2FoldResidueAnnotation)13 Optional (java.util.Optional)13 Pair (de.bioforscher.jstructure.mathematics.Pair)11