Search in sources :

Example 6 with Structure

use of org.biojava.nbio.structure.Structure in project mm-dev by sbl-sdsc.

the class ShapeTypeDemo method getShapeData.

private static Row getShapeData(Tuple2<String, Structure> t) {
    String key = t._1;
    Structure structure = t._2;
    return RowFactory.create(// primary key for this dataset
    key, // structure.getChainByIndex(0).getSeqResSequence(),
    calcShape(structure));
}
Also used : Structure(org.biojava.nbio.structure.Structure)

Example 7 with Structure

use of org.biojava.nbio.structure.Structure in project mm-dev by sbl-sdsc.

the class ShapeTypeDemo method main.

public static void main(String[] args) throws IOException {
    String path = MmtfReader.getMmtfReducedPath();
    if (args.length != 1) {
        System.err.println("Usage: " + ShapeTypeDemo.class.getSimpleName() + " <dataset output file");
        System.exit(1);
    }
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(ShapeTypeDemo.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    long start = System.nanoTime();
    // load a representative PDB chain from the 40% seq. identity Blast Clusters
    int sequenceIdentity = 90;
    JavaPairRDD<String, StructureDataInterface> pdb = MmtfReader.readSequenceFile(path, sc).flatMapToPair(// extract polymer chains
    new StructureToPolymerChains()).filter(// get representative subset
    new Pisces(sequenceIdentity, 2.5));
    // get a data set with sequence info
    Dataset<Row> seqData = PolymerSequenceExtractor.getDataset(pdb);
    // convert to BioJava data structure
    JavaPairRDD<String, Structure> structures = pdb.mapValues(new StructureToBioJava());
    // calculate shape data and convert to dataset
    JavaRDD<Row> rows = structures.map(t -> getShapeData(t));
    Dataset<Row> data = JavaRDDToDataset.getDataset(rows, "structureChainId", "shape");
    // there are only few symmetric chain, leave them out
    data = data.filter("shape != 'EXCLUDE'");
    // join calculated data with the sequence data
    data = seqData.join(data, "structureChainId").cache();
    data.show(10);
    // create a Word2Vector representation of the protein sequences
    ProteinSequenceEncoder encoder = new ProteinSequenceEncoder(data);
    // create 2-grams
    int n = 2;
    // 25-amino residue window size for Word2Vector
    int windowSize = 25;
    // dimension of feature vector
    int vectorSize = 50;
    data = encoder.overlappingNgramWord2VecEncode(n, windowSize, vectorSize).cache();
    // save data in .parquet file
    data.write().mode("overwrite").format("parquet").save(args[0]);
    long end = System.nanoTime();
    System.out.println((end - start) / 1E9 + " sec.");
    sc.close();
}
Also used : ProteinSequenceEncoder(edu.sdsc.mmtf.spark.ml.ProteinSequenceEncoder) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) Pisces(edu.sdsc.mmtf.spark.webfilters.Pisces) StructureToBioJava(edu.sdsc.mmtf.spark.mappers.StructureToBioJava) StructureToPolymerChains(edu.sdsc.mmtf.spark.mappers.StructureToPolymerChains) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) Row(org.apache.spark.sql.Row) Structure(org.biojava.nbio.structure.Structure) SparkConf(org.apache.spark.SparkConf)

Example 8 with Structure

use of org.biojava.nbio.structure.Structure in project mm-dev by sbl-sdsc.

the class BioJavaStructureToDssp2 method call.

// private SecStrucCalc calculator = new SecStrucCalc(); // not serializable
@Override
public Iterator<String> call(Iterator<Structure> structures) throws Exception {
    SecStrucCalc calculator = new SecStrucCalc();
    Stream<Structure> structureStream = StreamSupport.stream(Spliterators.spliteratorUnknownSize(structures, Spliterator.NONNULL), false);
    Stream<String> secStructureStream = structureStream.map(s -> calculateSecStructure(s, calculator));
    return secStructureStream.iterator();
}
Also used : SecStrucCalc(org.biojava.nbio.structure.secstruc.SecStrucCalc) Structure(org.biojava.nbio.structure.Structure)

Aggregations

Structure (org.biojava.nbio.structure.Structure)8 IOException (java.io.IOException)4 ByteArrayInputStream (java.io.ByteArrayInputStream)3 FileInputStream (java.io.FileInputStream)3 InputStream (java.io.InputStream)3 GZIPInputStream (java.util.zip.GZIPInputStream)3 MmtfStructureWriter (org.biojava.nbio.structure.io.mmtf.MmtfStructureWriter)3 SecStrucCalc (org.biojava.nbio.structure.secstruc.SecStrucCalc)3 AdapterToStructureData (org.rcsb.mmtf.encoder.AdapterToStructureData)3 AbstractFeatureProvider (de.bioforscher.jstructure.model.feature.AbstractFeatureProvider)2 FeatureProviderRegistry (de.bioforscher.jstructure.model.feature.FeatureProviderRegistry)2 Protein (de.bioforscher.jstructure.model.structure.Protein)2 ProteinParser (de.bioforscher.jstructure.parser.ProteinParser)2 Collectors (java.util.stream.Collectors)2 Group (org.biojava.nbio.structure.Group)2 GroupType (org.biojava.nbio.structure.GroupType)2 StructureException (org.biojava.nbio.structure.StructureException)2 FileParsingParameters (org.biojava.nbio.structure.io.FileParsingParameters)2 MMCIFFileReader (org.biojava.nbio.structure.io.MMCIFFileReader)2 PDBFileReader (org.biojava.nbio.structure.io.PDBFileReader)2