Search in sources :

Example 51 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mm-dev by sbl-sdsc.

the class DownloadSwissModelFiles method main.

/**
 * Converts a directory containing Rosetta-style PDB files into an MMTF-Hadoop Sequence file.
 * The input directory is traversed recursively to find PDB files.
 *
 * <p> Example files from Gremlin website:
 * https://gremlin2.bakerlab.org/meta/aah4043_final.zip
 *
 * @param args args[0] <path-to-pdb_files>, args[1] <path-to-mmtf-hadoop-file>
 *
 * @throws FileNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException {
    // instantiate Spark
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("DownloadSwissProtModelFiles");
    JavaSparkContext sc = new JavaSparkContext(conf);
    List<String> uniProtIds = Arrays.asList("P22629", "Q9H2C2", "Q8WXK3");
    // List<String> uniProtIds = Arrays.asList("P07900");
    // read PDB files recursively starting the specified directory
    // TODO: Empty structure record for Q8WXK3
    JavaPairRDD<String, StructureDataInterface> structures = MmtfImporter.downloadSwissModelsByUniProtIds(uniProtIds, sc);
    structures.foreach(t -> TraverseStructureHierarchy.printStructureData(t._2));
    // save as an MMTF-Hadoop Sequence File
    // MmtfWriter.writeSequenceFile(mmtfPath, sc, structures);
    // close Spark
    sc.close();
}
Also used : JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Example 52 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mm-dev by sbl-sdsc.

the class PdbToMmtfFull method main.

/**
 * Converts a directory containing Rosetta-style PDB files into an MMTF-Hadoop Sequence file.
 * The input directory is traversed recursively to find PDB files.
 *
 * <p> Example files from Gremlin website:
 * https://gremlin2.bakerlab.org/meta/aah4043_final.zip
 *
 * @param args args[0] <path-to-pdb_files>, args[1] <path-to-mmtf-hadoop-file>
 *
 * @throws FileNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException {
    if (args.length != 2) {
        System.out.println("Usage: RosettaToMmtfFull <path-to-pdb_files> <path-to-mmtf-hadoop-file>");
    }
    // path to input directory
    String pdbPath = args[0];
    // path to output directory
    String mmtfPath = args[1];
    // instantiate Spark
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("RosettaToMmtfFull");
    JavaSparkContext sc = new JavaSparkContext(conf);
    // read PDB files recursively starting the specified directory
    JavaPairRDD<String, StructureDataInterface> structures = MmtfImporter.importPdbFiles(pdbPath, sc);
    structures.foreach(t -> TraverseStructureHierarchy.printStructureData(t._2));
    // save as an MMTF-Hadoop Sequence File
    MmtfWriter.writeSequenceFile(mmtfPath, sc, structures);
    // close Spark
    sc.close();
}
Also used : JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Example 53 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mm-dev by sbl-sdsc.

the class RosettaToMmtfFull method main.

/**
 * Converts a directory containing Rosetta-style PDB files into an MMTF-Hadoop Sequence file.
 * The input directory is traversed recursively to find PDB files.
 *
 * <p> Example files from Gremlin website:
 * https://gremlin2.bakerlab.org/meta/aah4043_final.zip
 *
 * @param args args[0] <path-to-pdb_files>, args[1] <path-to-mmtf-hadoop-file>
 *
 * @throws FileNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException {
    if (args.length != 2) {
        System.out.println("Usage: RosettaToMmtfFull <path-to-pdb_files> <path-to-mmtf-hadoop-file>");
    }
    // path to input directory
    String pdbPath = args[0];
    // path to output directory
    String mmtfPath = args[1];
    // instantiate Spark
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("RosettaToMmtfFull");
    JavaSparkContext sc = new JavaSparkContext(conf);
    // read PDB files recursively starting the specified directory
    JavaPairRDD<String, StructureDataInterface> structures = MmtfImporter.importPdbFiles(pdbPath, sc);
    // save as an MMTF-Hadoop Sequence File
    MmtfWriter.writeSequenceFile(mmtfPath, sc, structures);
    // close Spark
    sc.close();
}
Also used : JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Example 54 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mm-dev by sbl-sdsc.

the class SwissModelDatasetToStructure method main.

public static void main(String[] args) throws IOException {
    SparkSession spark = SparkSession.builder().master("local[*]").appName(SwissModelDatasetToStructure.class.getSimpleName()).getOrCreate();
    JavaSparkContext sc = new JavaSparkContext(spark.sparkContext());
    List<String> uniProtIds = Arrays.asList("P36575", "P24539", "O00244", "P18846", "Q9UII2");
    Dataset<Row> ds = SwissModelDataset.getSwissModels(uniProtIds);
    ds.show();
    ds = ds.filter("qmean > -2.5 AND coverage > 0.5");
    List<String> urls = ds.select("coordinates").as(Encoders.STRING()).collectAsList();
    System.out.println(urls);
    JavaPairRDD<String, StructureDataInterface> models = MmtfImporter.downloadSwissModelsByUrls(urls, sc);
    models.foreach(t -> System.out.println(t._2.getEntitySequence(0)));
    spark.close();
}
Also used : SparkSession(org.apache.spark.sql.SparkSession) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) Row(org.apache.spark.sql.Row) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface)

Example 55 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mm-dev by sbl-sdsc.

the class TestRosettaMmtf method main.

/**
 * Test: Read MMTF-Hadoop Sequence file.
 *
 * @param args args[0] <path-to-mmtf-haddop-sequence-file>
 *
 * @throws FileNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException {
    // instantiate Spark
    // TODO set to local[1] !!!!
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("TestSwissModelMmtf");
    JavaSparkContext sc = new JavaSparkContext(conf);
    long start = System.nanoTime();
    // read PDB files recursively starting the specified directory
    JavaPairRDD<String, StructureDataInterface> structures = MmtfReader.readSequenceFile(args[0], sc);
    // total:  639 structures
    // structures = structures.filter(new ContainsDnaChain()); //  ?
    // structures = structures.filter(new ContainsLProteinChain()); // 639?
    // structures = structures.filter(new ContainsGroup("ZN")); // 0
    // structures = structures.filter(new ContainsGroup("ATP")); //
    // debug: print structure data
    // structures.foreach(t -> TraverseStructureHierarchy.demo(t._2));
    // structures.foreach(t -> System.out.println(t._1));
    System.out.println(structures.map(t -> t._2.getNumEntities()).reduce((a, b) -> a + b));
    System.out.println("Number of structures read: " + structures.count());
    long end = System.nanoTime();
    System.out.println("Time: " + (end - start) / 1E9 + " sec.");
    // close Spark
    sc.close();
}
Also used : MmtfImporter(edu.sdsc.mmtf.spark.io.MmtfImporter) Arrays(java.util.Arrays) ContainsLProteinChain(edu.sdsc.mmtf.spark.filters.ContainsLProteinChain) SparkConf(org.apache.spark.SparkConf) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) MmtfWriter(edu.sdsc.mmtf.spark.io.MmtfWriter) JavaPairRDD(org.apache.spark.api.java.JavaPairRDD) FileNotFoundException(java.io.FileNotFoundException) ContainsGroup(edu.sdsc.mmtf.spark.filters.ContainsGroup) TraverseStructureHierarchy(edu.sdsc.mmtf.spark.io.demos.TraverseStructureHierarchy) List(java.util.List) ContainsDnaChain(edu.sdsc.mmtf.spark.filters.ContainsDnaChain) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) ContainsDProteinChain(edu.sdsc.mmtf.spark.filters.ContainsDProteinChain) MmtfReader(edu.sdsc.mmtf.spark.io.MmtfReader) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Aggregations

StructureDataInterface (org.rcsb.mmtf.api.StructureDataInterface)102 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)60 SparkConf (org.apache.spark.SparkConf)58 Row (org.apache.spark.sql.Row)27 StructureToPolymerChains (edu.sdsc.mmtf.spark.mappers.StructureToPolymerChains)22 Test (org.junit.Test)20 Pisces (edu.sdsc.mmtf.spark.webfilters.Pisces)19 ArrayList (java.util.ArrayList)12 ProteinSequenceEncoder (edu.sdsc.mmtf.spark.ml.ProteinSequenceEncoder)10 ColumnarStructure (edu.sdsc.mmtf.spark.utils.ColumnarStructure)10 Tuple2 (scala.Tuple2)9 Path (java.nio.file.Path)7 HashSet (java.util.HashSet)7 AdapterToStructureData (org.rcsb.mmtf.encoder.AdapterToStructureData)7 JavaPairRDD (org.apache.spark.api.java.JavaPairRDD)6 ContainsLProteinChain (edu.sdsc.mmtf.spark.filters.ContainsLProteinChain)5 List (java.util.List)5 Resolution (edu.sdsc.mmtf.spark.filters.Resolution)4 MmtfReader (edu.sdsc.mmtf.spark.io.MmtfReader)4 File (java.io.File)4