Search in sources :

Example 1 with ContainsLProteinChain

use of edu.sdsc.mmtf.spark.filters.ContainsLProteinChain in project mmtf-spark by sbl-sdsc.

the class FilterByPolymerChainType method main.

public static void main(String[] args) throws FileNotFoundException {
    String path = MmtfReader.getMmtfReducedPath();
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(FilterByPolymerChainType.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    long count = MmtfReader.readSequenceFile(path, // read MMTF hadoop sequence file
    sc).filter(new ContainsPolymerChainType(ContainsPolymerChainType.DNA_LINKING, ContainsPolymerChainType.RNA_LINKING)).filter(new NotFilter(new ContainsLProteinChain())).filter(new NotFilter(new ContainsDSaccharideChain())).count();
    System.out.println("# pure DNA and RNA entries: " + count);
    sc.close();
}
Also used : ContainsPolymerChainType(edu.sdsc.mmtf.spark.filters.ContainsPolymerChainType) NotFilter(edu.sdsc.mmtf.spark.filters.NotFilter) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) ContainsDSaccharideChain(edu.sdsc.mmtf.spark.filters.ContainsDSaccharideChain) SparkConf(org.apache.spark.SparkConf) ContainsLProteinChain(edu.sdsc.mmtf.spark.filters.ContainsLProteinChain)

Example 2 with ContainsLProteinChain

use of edu.sdsc.mmtf.spark.filters.ContainsLProteinChain in project mmtf-spark by sbl-sdsc.

the class FilterExclusivelyByLProteins method main.

public static void main(String[] args) throws FileNotFoundException {
    String path = MmtfReader.getMmtfReducedPath();
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(FilterExclusivelyByLProteins.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    boolean exclusive = true;
    long count = MmtfReader.readSequenceFile(path, // read MMTF hadoop sequence file
    sc).filter(new ContainsLProteinChain(exclusive)).count();
    System.out.println("# L-proteins: " + count);
    sc.close();
}
Also used : JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) SparkConf(org.apache.spark.SparkConf) ContainsLProteinChain(edu.sdsc.mmtf.spark.filters.ContainsLProteinChain)

Example 3 with ContainsLProteinChain

use of edu.sdsc.mmtf.spark.filters.ContainsLProteinChain in project mm-dev by sbl-sdsc.

the class TestRosettaMmtf method main.

/**
 * Test: Read MMTF-Hadoop Sequence file.
 *
 * @param args args[0] <path-to-mmtf-haddop-sequence-file>
 *
 * @throws FileNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException {
    // instantiate Spark
    // TODO set to local[1] !!!!
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("TestSwissModelMmtf");
    JavaSparkContext sc = new JavaSparkContext(conf);
    long start = System.nanoTime();
    // read PDB files recursively starting the specified directory
    JavaPairRDD<String, StructureDataInterface> structures = MmtfReader.readSequenceFile(args[0], sc);
    // total:  639 structures
    // structures = structures.filter(new ContainsDnaChain()); //  ?
    // structures = structures.filter(new ContainsLProteinChain()); // 639?
    // structures = structures.filter(new ContainsGroup("ZN")); // 0
    // structures = structures.filter(new ContainsGroup("ATP")); //
    // debug: print structure data
    // structures.foreach(t -> TraverseStructureHierarchy.demo(t._2));
    // structures.foreach(t -> System.out.println(t._1));
    System.out.println(structures.map(t -> t._2.getNumEntities()).reduce((a, b) -> a + b));
    System.out.println("Number of structures read: " + structures.count());
    long end = System.nanoTime();
    System.out.println("Time: " + (end - start) / 1E9 + " sec.");
    // close Spark
    sc.close();
}
Also used : MmtfImporter(edu.sdsc.mmtf.spark.io.MmtfImporter) Arrays(java.util.Arrays) ContainsLProteinChain(edu.sdsc.mmtf.spark.filters.ContainsLProteinChain) SparkConf(org.apache.spark.SparkConf) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) MmtfWriter(edu.sdsc.mmtf.spark.io.MmtfWriter) JavaPairRDD(org.apache.spark.api.java.JavaPairRDD) FileNotFoundException(java.io.FileNotFoundException) ContainsGroup(edu.sdsc.mmtf.spark.filters.ContainsGroup) TraverseStructureHierarchy(edu.sdsc.mmtf.spark.io.demos.TraverseStructureHierarchy) List(java.util.List) ContainsDnaChain(edu.sdsc.mmtf.spark.filters.ContainsDnaChain) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) ContainsDProteinChain(edu.sdsc.mmtf.spark.filters.ContainsDProteinChain) MmtfReader(edu.sdsc.mmtf.spark.io.MmtfReader) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Example 4 with ContainsLProteinChain

use of edu.sdsc.mmtf.spark.filters.ContainsLProteinChain in project mmtf-spark by sbl-sdsc.

the class FilterProteinDnaComplexes method main.

public static void main(String[] args) throws FileNotFoundException {
    String path = MmtfReader.getMmtfReducedPath();
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(FilterProteinDnaComplexes.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    long count = MmtfReader.readSequenceFile(path, // read MMTF hadoop sequence file
    sc).filter(// retain pdb entries that contain L-peptide chains
    new ContainsLProteinChain()).filter(// retain pdb entries that contain L-Dna chains
    new ContainsDnaChain()).filter(// filter out an RNA containing entries
    new NotFilter(new ContainsRnaChain())).count();
    System.out.println("# L-peptide/DNA complexes: " + count);
    sc.close();
}
Also used : ContainsDnaChain(edu.sdsc.mmtf.spark.filters.ContainsDnaChain) ContainsRnaChain(edu.sdsc.mmtf.spark.filters.ContainsRnaChain) NotFilter(edu.sdsc.mmtf.spark.filters.NotFilter) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) SparkConf(org.apache.spark.SparkConf) ContainsLProteinChain(edu.sdsc.mmtf.spark.filters.ContainsLProteinChain)

Example 5 with ContainsLProteinChain

use of edu.sdsc.mmtf.spark.filters.ContainsLProteinChain in project mmtf-spark by sbl-sdsc.

the class NotFilterExample method main.

public static void main(String[] args) throws FileNotFoundException {
    String path = MmtfReader.getMmtfReducedPath();
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(NotFilterExample.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    long count = MmtfReader.readSequenceFile(path, // read MMTF hadoop sequence file
    sc).filter(// retain pdb entries that exclusively contain L-peptide chains
    new ContainsLProteinChain()).filter(// should not contain any DNA chains
    new NotFilter(new ContainsDnaChain())).count();
    System.out.println("# PDB entries with L-protein and without DNA chains: " + count);
    sc.close();
}
Also used : ContainsDnaChain(edu.sdsc.mmtf.spark.filters.ContainsDnaChain) NotFilter(edu.sdsc.mmtf.spark.filters.NotFilter) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) SparkConf(org.apache.spark.SparkConf) ContainsLProteinChain(edu.sdsc.mmtf.spark.filters.ContainsLProteinChain)

Aggregations

ContainsLProteinChain (edu.sdsc.mmtf.spark.filters.ContainsLProteinChain)9 SparkConf (org.apache.spark.SparkConf)9 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)9 StructureDataInterface (org.rcsb.mmtf.api.StructureDataInterface)5 Row (org.apache.spark.sql.Row)4 ContainsDnaChain (edu.sdsc.mmtf.spark.filters.ContainsDnaChain)3 NotFilter (edu.sdsc.mmtf.spark.filters.NotFilter)3 Resolution (edu.sdsc.mmtf.spark.filters.Resolution)2 InteractionFilter (edu.sdsc.mmtf.spark.interactions.InteractionFilter)2 StructureToPolymerChains (edu.sdsc.mmtf.spark.mappers.StructureToPolymerChains)2 SimpleDateFormat (java.text.SimpleDateFormat)2 HashSet (java.util.HashSet)2 CommandLine (org.apache.commons.cli.CommandLine)2 ContainsDProteinChain (edu.sdsc.mmtf.spark.filters.ContainsDProteinChain)1 ContainsDSaccharideChain (edu.sdsc.mmtf.spark.filters.ContainsDSaccharideChain)1 ContainsGroup (edu.sdsc.mmtf.spark.filters.ContainsGroup)1 ContainsPolymerChainType (edu.sdsc.mmtf.spark.filters.ContainsPolymerChainType)1 ContainsRnaChain (edu.sdsc.mmtf.spark.filters.ContainsRnaChain)1 MmtfImporter (edu.sdsc.mmtf.spark.io.MmtfImporter)1 MmtfReader (edu.sdsc.mmtf.spark.io.MmtfReader)1