Search in sources :

Example 1 with PolymerComposition

use of edu.sdsc.mmtf.spark.filters.PolymerComposition in project mmtf-spark by sbl-sdsc.

the class PolyPeptideChainStatistics method main.

public static void main(String[] args) throws FileNotFoundException {
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(PolyPeptideChainStatistics.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    JavaDoubleRDD chainLengths = MmtfReader.readReducedSequenceFile(// read PDB from MMTF-Hadoop sequence file
    sc).flatMapToPair(// split (flatmap) into unique polymer chains
    new StructureToPolymerChains(false, true)).filter(// only consider chains that contain the 20 standard aminoacids
    new PolymerComposition(PolymerComposition.AMINO_ACIDS_20)).mapToDouble(// get the number of groups (residues) in each chain using a lambda expression
    t -> t._2.getNumGroups());
    System.out.println("Protein chains length statistics for proteins in the PDB with the 20 standard amino acids:");
    System.out.println(chainLengths.stats());
    sc.close();
}
Also used : StructureToPolymerChains(edu.sdsc.mmtf.spark.mappers.StructureToPolymerChains) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) SparkConf(org.apache.spark.SparkConf) JavaDoubleRDD(org.apache.spark.api.java.JavaDoubleRDD) PolymerComposition(edu.sdsc.mmtf.spark.filters.PolymerComposition)

Aggregations

PolymerComposition (edu.sdsc.mmtf.spark.filters.PolymerComposition)1 StructureToPolymerChains (edu.sdsc.mmtf.spark.mappers.StructureToPolymerChains)1 SparkConf (org.apache.spark.SparkConf)1 JavaDoubleRDD (org.apache.spark.api.java.JavaDoubleRDD)1 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)1