Search in sources :

Example 26 with StructureToPolymerChains

use of edu.sdsc.mmtf.spark.mappers.StructureToPolymerChains in project mm-dev by sbl-sdsc.

the class Driver1 method main.

public static void main(String[] args) throws IOException {
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(Driver1.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    long start = System.nanoTime();
    // download query structure
    // List<String> queryId = Arrays.asList("2O9U");
    List<String> queryId = Arrays.asList("1STP");
    JavaPairRDD<String, StructureDataInterface> query = MmtfReader.downloadReducedMmtfFiles(queryId, sc).flatMapToPair(new StructureToPolymerChains(false, true));
    // Examples similar: 4N6T, 2CH9, 3UL5, 3KVP
    // Examples dissimilar: 5O5I, 1STP,
    // List<String> targetId = Arrays.asList("4N6T", "2CH9", "3UL5", "3KVP", "1STP", "5O5I");
    List<String> targetId = Arrays.asList("4OKA");
    JavaPairRDD<String, StructureDataInterface> target = MmtfReader.downloadReducedMmtfFiles(targetId, sc).flatMapToPair(new StructureToPolymerChains(false, true));
    // two standard algorithms
    // String alignmentAlgorithm = CeMain.algorithmName;
    // String alignmentAlgorithm = FatCatRigid.algorithmName;
    String alignmentAlgorithm = "exhaustive";
    // calculate alignments
    Dataset<Row> alignments = StructureAligner.getQueryVsAllAlignments(query, target, alignmentAlgorithm).cache();
    alignments.coalesce(1).write().mode("overwrite").format("csv").save(args[0]);
    // show results
    int count = (int) alignments.count();
    alignments.sort(col("tm").desc()).show(count);
    System.out.println("Pairs: " + count);
    long end = System.nanoTime();
    System.out.println("Time per alignment: " + TimeUnit.NANOSECONDS.toMillis((end - start) / count) + " msec.");
    System.out.println("Time: " + TimeUnit.NANOSECONDS.toSeconds(end - start) + " sec.");
    sc.close();
}
Also used : StructureToPolymerChains(edu.sdsc.mmtf.spark.mappers.StructureToPolymerChains) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) Row(org.apache.spark.sql.Row) SparkConf(org.apache.spark.SparkConf)

Aggregations

StructureToPolymerChains (edu.sdsc.mmtf.spark.mappers.StructureToPolymerChains)26 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)23 SparkConf (org.apache.spark.SparkConf)22 StructureDataInterface (org.rcsb.mmtf.api.StructureDataInterface)22 Row (org.apache.spark.sql.Row)18 Pisces (edu.sdsc.mmtf.spark.webfilters.Pisces)15 ProteinSequenceEncoder (edu.sdsc.mmtf.spark.ml.ProteinSequenceEncoder)10 Path (java.nio.file.Path)3 Test (org.junit.Test)3 ContainsLProteinChain (edu.sdsc.mmtf.spark.filters.ContainsLProteinChain)2 StructureToBioJava (edu.sdsc.mmtf.spark.mappers.StructureToBioJava)2 PolymerComposition (edu.sdsc.mmtf.spark.filters.PolymerComposition)1 PdbjMineSearch (edu.sdsc.mmtf.spark.webfilters.PdbjMineSearch)1 SequenceSimilarity (edu.sdsc.mmtf.spark.webfilters.SequenceSimilarity)1 JavaDoubleRDD (org.apache.spark.api.java.JavaDoubleRDD)1 SparkSession (org.apache.spark.sql.SparkSession)1 StructField (org.apache.spark.sql.types.StructField)1 StructType (org.apache.spark.sql.types.StructType)1 Structure (org.biojava.nbio.structure.Structure)1