Search in sources :

Example 1 with ContainsDnaChain

use of edu.sdsc.mmtf.spark.filters.ContainsDnaChain in project mm-dev by sbl-sdsc.

the class TestRosettaMmtf method main.

/**
 * Test: Read MMTF-Hadoop Sequence file.
 *
 * @param args args[0] <path-to-mmtf-haddop-sequence-file>
 *
 * @throws FileNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException {
    // instantiate Spark
    // TODO set to local[1] !!!!
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("TestSwissModelMmtf");
    JavaSparkContext sc = new JavaSparkContext(conf);
    long start = System.nanoTime();
    // read PDB files recursively starting the specified directory
    JavaPairRDD<String, StructureDataInterface> structures = MmtfReader.readSequenceFile(args[0], sc);
    // total:  639 structures
    // structures = structures.filter(new ContainsDnaChain()); //  ?
    // structures = structures.filter(new ContainsLProteinChain()); // 639?
    // structures = structures.filter(new ContainsGroup("ZN")); // 0
    // structures = structures.filter(new ContainsGroup("ATP")); //
    // debug: print structure data
    // structures.foreach(t -> TraverseStructureHierarchy.demo(t._2));
    // structures.foreach(t -> System.out.println(t._1));
    System.out.println(structures.map(t -> t._2.getNumEntities()).reduce((a, b) -> a + b));
    System.out.println("Number of structures read: " + structures.count());
    long end = System.nanoTime();
    System.out.println("Time: " + (end - start) / 1E9 + " sec.");
    // close Spark
    sc.close();
}
Also used : MmtfImporter(edu.sdsc.mmtf.spark.io.MmtfImporter) Arrays(java.util.Arrays) ContainsLProteinChain(edu.sdsc.mmtf.spark.filters.ContainsLProteinChain) SparkConf(org.apache.spark.SparkConf) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) MmtfWriter(edu.sdsc.mmtf.spark.io.MmtfWriter) JavaPairRDD(org.apache.spark.api.java.JavaPairRDD) FileNotFoundException(java.io.FileNotFoundException) ContainsGroup(edu.sdsc.mmtf.spark.filters.ContainsGroup) TraverseStructureHierarchy(edu.sdsc.mmtf.spark.io.demos.TraverseStructureHierarchy) List(java.util.List) ContainsDnaChain(edu.sdsc.mmtf.spark.filters.ContainsDnaChain) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) ContainsDProteinChain(edu.sdsc.mmtf.spark.filters.ContainsDProteinChain) MmtfReader(edu.sdsc.mmtf.spark.io.MmtfReader) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Example 2 with ContainsDnaChain

use of edu.sdsc.mmtf.spark.filters.ContainsDnaChain in project mmtf-spark by sbl-sdsc.

the class FilterProteinDnaComplexes method main.

public static void main(String[] args) throws FileNotFoundException {
    String path = MmtfReader.getMmtfReducedPath();
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(FilterProteinDnaComplexes.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    long count = MmtfReader.readSequenceFile(path, // read MMTF hadoop sequence file
    sc).filter(// retain pdb entries that contain L-peptide chains
    new ContainsLProteinChain()).filter(// retain pdb entries that contain L-Dna chains
    new ContainsDnaChain()).filter(// filter out an RNA containing entries
    new NotFilter(new ContainsRnaChain())).count();
    System.out.println("# L-peptide/DNA complexes: " + count);
    sc.close();
}
Also used : ContainsDnaChain(edu.sdsc.mmtf.spark.filters.ContainsDnaChain) ContainsRnaChain(edu.sdsc.mmtf.spark.filters.ContainsRnaChain) NotFilter(edu.sdsc.mmtf.spark.filters.NotFilter) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) SparkConf(org.apache.spark.SparkConf) ContainsLProteinChain(edu.sdsc.mmtf.spark.filters.ContainsLProteinChain)

Example 3 with ContainsDnaChain

use of edu.sdsc.mmtf.spark.filters.ContainsDnaChain in project mmtf-spark by sbl-sdsc.

the class NotFilterExample method main.

public static void main(String[] args) throws FileNotFoundException {
    String path = MmtfReader.getMmtfReducedPath();
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(NotFilterExample.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    long count = MmtfReader.readSequenceFile(path, // read MMTF hadoop sequence file
    sc).filter(// retain pdb entries that exclusively contain L-peptide chains
    new ContainsLProteinChain()).filter(// should not contain any DNA chains
    new NotFilter(new ContainsDnaChain())).count();
    System.out.println("# PDB entries with L-protein and without DNA chains: " + count);
    sc.close();
}
Also used : ContainsDnaChain(edu.sdsc.mmtf.spark.filters.ContainsDnaChain) NotFilter(edu.sdsc.mmtf.spark.filters.NotFilter) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) SparkConf(org.apache.spark.SparkConf) ContainsLProteinChain(edu.sdsc.mmtf.spark.filters.ContainsLProteinChain)

Aggregations

ContainsDnaChain (edu.sdsc.mmtf.spark.filters.ContainsDnaChain)3 ContainsLProteinChain (edu.sdsc.mmtf.spark.filters.ContainsLProteinChain)3 SparkConf (org.apache.spark.SparkConf)3 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)3 NotFilter (edu.sdsc.mmtf.spark.filters.NotFilter)2 ContainsDProteinChain (edu.sdsc.mmtf.spark.filters.ContainsDProteinChain)1 ContainsGroup (edu.sdsc.mmtf.spark.filters.ContainsGroup)1 ContainsRnaChain (edu.sdsc.mmtf.spark.filters.ContainsRnaChain)1 MmtfImporter (edu.sdsc.mmtf.spark.io.MmtfImporter)1 MmtfReader (edu.sdsc.mmtf.spark.io.MmtfReader)1 MmtfWriter (edu.sdsc.mmtf.spark.io.MmtfWriter)1 TraverseStructureHierarchy (edu.sdsc.mmtf.spark.io.demos.TraverseStructureHierarchy)1 FileNotFoundException (java.io.FileNotFoundException)1 Arrays (java.util.Arrays)1 List (java.util.List)1 JavaPairRDD (org.apache.spark.api.java.JavaPairRDD)1 StructureDataInterface (org.rcsb.mmtf.api.StructureDataInterface)1