use of edu.sdsc.mmtf.spark.filters.PolymerComposition in project mmtf-spark by sbl-sdsc.
the class PolyPeptideChainStatistics method main.
public static void main(String[] args) throws FileNotFoundException {
SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(PolyPeptideChainStatistics.class.getSimpleName());
JavaSparkContext sc = new JavaSparkContext(conf);
JavaDoubleRDD chainLengths = MmtfReader.readReducedSequenceFile(// read PDB from MMTF-Hadoop sequence file
sc).flatMapToPair(// split (flatmap) into unique polymer chains
new StructureToPolymerChains(false, true)).filter(// only consider chains that contain the 20 standard aminoacids
new PolymerComposition(PolymerComposition.AMINO_ACIDS_20)).mapToDouble(// get the number of groups (residues) in each chain using a lambda expression
t -> t._2.getNumGroups());
System.out.println("Protein chains length statistics for proteins in the PDB with the 20 standard amino acids:");
System.out.println(chainLengths.stats());
sc.close();
}
Aggregations