Search in sources :

Example 1 with ContainsGroup

use of edu.sdsc.mmtf.spark.filters.ContainsGroup in project mm-dev by sbl-sdsc.

the class TestRosettaMmtf method main.

/**
 * Test: Read MMTF-Hadoop Sequence file.
 *
 * @param args args[0] <path-to-mmtf-haddop-sequence-file>
 *
 * @throws FileNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException {
    // instantiate Spark
    // TODO set to local[1] !!!!
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("TestSwissModelMmtf");
    JavaSparkContext sc = new JavaSparkContext(conf);
    long start = System.nanoTime();
    // read PDB files recursively starting the specified directory
    JavaPairRDD<String, StructureDataInterface> structures = MmtfReader.readSequenceFile(args[0], sc);
    // total:  639 structures
    // structures = structures.filter(new ContainsDnaChain()); //  ?
    // structures = structures.filter(new ContainsLProteinChain()); // 639?
    // structures = structures.filter(new ContainsGroup("ZN")); // 0
    // structures = structures.filter(new ContainsGroup("ATP")); //
    // debug: print structure data
    // structures.foreach(t -> TraverseStructureHierarchy.demo(t._2));
    // structures.foreach(t -> System.out.println(t._1));
    System.out.println(structures.map(t -> t._2.getNumEntities()).reduce((a, b) -> a + b));
    System.out.println("Number of structures read: " + structures.count());
    long end = System.nanoTime();
    System.out.println("Time: " + (end - start) / 1E9 + " sec.");
    // close Spark
    sc.close();
}
Also used : MmtfImporter(edu.sdsc.mmtf.spark.io.MmtfImporter) Arrays(java.util.Arrays) ContainsLProteinChain(edu.sdsc.mmtf.spark.filters.ContainsLProteinChain) SparkConf(org.apache.spark.SparkConf) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) MmtfWriter(edu.sdsc.mmtf.spark.io.MmtfWriter) JavaPairRDD(org.apache.spark.api.java.JavaPairRDD) FileNotFoundException(java.io.FileNotFoundException) ContainsGroup(edu.sdsc.mmtf.spark.filters.ContainsGroup) TraverseStructureHierarchy(edu.sdsc.mmtf.spark.io.demos.TraverseStructureHierarchy) List(java.util.List) ContainsDnaChain(edu.sdsc.mmtf.spark.filters.ContainsDnaChain) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) ContainsDProteinChain(edu.sdsc.mmtf.spark.filters.ContainsDProteinChain) MmtfReader(edu.sdsc.mmtf.spark.io.MmtfReader) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Example 2 with ContainsGroup

use of edu.sdsc.mmtf.spark.filters.ContainsGroup in project mmtf-spark by sbl-sdsc.

the class FilterByGroups method main.

public static void main(String[] args) throws FileNotFoundException {
    String path = MmtfReader.getMmtfReducedPath();
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(FilterByGroups.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    // find all structure that contain ATP and MG
    long count = MmtfReader.readSequenceFile(path, sc).filter(new ContainsGroup("ATP")).filter(new ContainsGroup("MG")).count();
    System.out.println("Structures with ATP + MG: " + count);
    sc.close();
}
Also used : JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) SparkConf(org.apache.spark.SparkConf) ContainsGroup(edu.sdsc.mmtf.spark.filters.ContainsGroup)

Example 3 with ContainsGroup

use of edu.sdsc.mmtf.spark.filters.ContainsGroup in project mm-dev by sbl-sdsc.

the class TestSwissModelMmtf method main.

/**
 * Test: Read MMTF-Hadoop Sequence file.
 *
 * @param args args[0] <path-to-mmtf-haddop-sequence-file>
 *
 * @throws FileNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException {
    // instantiate Spark
    // TODO set to local[1] !!!!
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("TestSwissModelMmtf");
    JavaSparkContext sc = new JavaSparkContext(conf);
    long start = System.nanoTime();
    // read PDB files recursively starting the specified directory
    JavaPairRDD<String, StructureDataInterface> structures = MmtfReader.readSequenceFile(args[0], sc);
    // total: 6022 structures
    // structures = structures.filter(new ContainsDnaChain()); // 3 ?
    // structures = structures.filter(new ContainsLProteinChain()); // 6022 ?
    // structures = structures.filter(new ContainsGroup("ZN")); // 228
    // 228
    structures = structures.filter(new ContainsGroup("ATP"));
    // debug: print structure data
    // structures.foreach(t -> TraverseStructureHierarchy.demo(t._2));
    // structures.foreach(t -> System.out.println(t._1));
    // System.out.println(structures.map(t -> t._2.getNumGroups()).reduce((a, b) -> a+b));
    System.out.println("Number of structures read: " + structures.count());
    long end = System.nanoTime();
    System.out.println("Time: " + (end - start) / 1E9 + " sec.");
    // close Spark
    sc.close();
}
Also used : JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf) ContainsGroup(edu.sdsc.mmtf.spark.filters.ContainsGroup)

Aggregations

ContainsGroup (edu.sdsc.mmtf.spark.filters.ContainsGroup)3 SparkConf (org.apache.spark.SparkConf)3 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)3 StructureDataInterface (org.rcsb.mmtf.api.StructureDataInterface)2 ContainsDProteinChain (edu.sdsc.mmtf.spark.filters.ContainsDProteinChain)1 ContainsDnaChain (edu.sdsc.mmtf.spark.filters.ContainsDnaChain)1 ContainsLProteinChain (edu.sdsc.mmtf.spark.filters.ContainsLProteinChain)1 MmtfImporter (edu.sdsc.mmtf.spark.io.MmtfImporter)1 MmtfReader (edu.sdsc.mmtf.spark.io.MmtfReader)1 MmtfWriter (edu.sdsc.mmtf.spark.io.MmtfWriter)1 TraverseStructureHierarchy (edu.sdsc.mmtf.spark.io.demos.TraverseStructureHierarchy)1 FileNotFoundException (java.io.FileNotFoundException)1 Arrays (java.util.Arrays)1 List (java.util.List)1 JavaPairRDD (org.apache.spark.api.java.JavaPairRDD)1