Search in sources :

Example 41 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mmtf-spark by sbl-sdsc.

the class FullToReducedSequenceFile method main.

/**
 * Converts a full MMTF Hadoop Sequence File to a reduced representation.
 * @param args args[0] input directory (full),
 * args[1] output directory (reduced)
 * @throws FileNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException {
    if (args.length != 2) {
        System.out.println("Usage: FullToReducedSequenceFile <path_to_full> <path_to_reduced>");
        System.exit(-1);
    }
    String fullPath = args[0];
    long start = System.nanoTime();
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(FullToReducedSequenceFile.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    // read PDB in MMTF format
    JavaPairRDD<String, StructureDataInterface> pdb = MmtfReader.readSequenceFile(fullPath, sc).mapValues(s -> ReducedEncoder.getReduced(s));
    String reducedPath = args[1];
    MmtfWriter.writeSequenceFile(reducedPath, sc, pdb);
    System.out.println("# structures converted: " + pdb.count());
    long end = System.nanoTime();
    System.out.println("Time:     " + (end - start) / 1E9 + "sec.");
    sc.close();
}
Also used : JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Example 42 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mmtf-spark by sbl-sdsc.

the class AuthorSearchDemo method main.

public static void main(String[] args) throws IOException {
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(AuthorSearchDemo.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    // query to find PDB structures for Doudna, J.A. as a deposition (audit) author
    // or as an author in the primary PDB citation
    String sqlQuery = "SELECT pdbid from audit_author " + "WHERE name LIKE 'Doudna%J.A.%' " + "UNION " + "SELECT pdbid from citation_author " + "WHERE citation_id = 'primary' AND name LIKE 'Doudna%J.A.%'";
    // read PDB and filter by author
    JavaPairRDD<String, StructureDataInterface> pdb = MmtfReader.readReducedSequenceFile(sc).filter(new PdbjMineSearch(sqlQuery));
    System.out.println("Number of entries matching query: " + pdb.count());
    sc.close();
}
Also used : PdbjMineSearch(edu.sdsc.mmtf.spark.webfilters.PdbjMineSearch) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Example 43 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mmtf-spark by sbl-sdsc.

the class CreateRepresentativeSet method main.

/**
 * @throws IOException
 */
public static void main(String[] args) throws IOException {
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(CreateRepresentativeSet.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    // filter by representative protein chains at 40% sequence identify
    // and  2.5 A resolution using the Pisces filter. Any pair of protein
    // chains in the representative set will have <= 40% sequence identity.
    int sequenceIdentity = 40;
    double resolution = 2.5;
    // read PDB, split entries into polymer chains, and filter by Pisces filter
    JavaPairRDD<String, StructureDataInterface> pdb = MmtfReader.readReducedSequenceFile(sc).flatMapToPair(new StructureToPolymerChains()).filter(new Pisces(sequenceIdentity, resolution));
    System.out.println("# representative chains: " + pdb.count());
    // coalesce partitions to avoid saving many small files
    pdb = pdb.coalesce(12);
    // save representative set
    String path = MmtfReader.getMmtfReducedPath();
    MmtfWriter.writeSequenceFile(path + "_representatives_i40_r2.5", sc, pdb);
    sc.close();
}
Also used : Pisces(edu.sdsc.mmtf.spark.webfilters.Pisces) StructureToPolymerChains(edu.sdsc.mmtf.spark.mappers.StructureToPolymerChains) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Example 44 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mmtf-spark by sbl-sdsc.

the class ColumnarStructureTest method testGetAtomNames.

@Test
public void testGetAtomNames() {
    StructureDataInterface s = pdb.values().first();
    ColumnarStructure cs = new ColumnarStructure(s, true);
    assertEquals("CG2", cs.getAtomNames()[900]);
}
Also used : ColumnarStructure(edu.sdsc.mmtf.spark.utils.ColumnarStructure) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) Test(org.junit.Test)

Example 45 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mmtf-spark by sbl-sdsc.

the class ColumnarStructureTest method testIsPolymer.

@Test
public void testIsPolymer() {
    StructureDataInterface s = pdb.values().first();
    ColumnarStructure cs = new ColumnarStructure(s, true);
    // chain A
    assertEquals(true, cs.isPolymer()[100]);
    // BTN
    assertEquals(false, cs.isPolymer()[901]);
    // HOH
    assertEquals(false, cs.isPolymer()[917]);
}
Also used : ColumnarStructure(edu.sdsc.mmtf.spark.utils.ColumnarStructure) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) Test(org.junit.Test)

Aggregations

StructureDataInterface (org.rcsb.mmtf.api.StructureDataInterface)102 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)60 SparkConf (org.apache.spark.SparkConf)58 Row (org.apache.spark.sql.Row)27 StructureToPolymerChains (edu.sdsc.mmtf.spark.mappers.StructureToPolymerChains)22 Test (org.junit.Test)20 Pisces (edu.sdsc.mmtf.spark.webfilters.Pisces)19 ArrayList (java.util.ArrayList)12 ProteinSequenceEncoder (edu.sdsc.mmtf.spark.ml.ProteinSequenceEncoder)10 ColumnarStructure (edu.sdsc.mmtf.spark.utils.ColumnarStructure)10 Tuple2 (scala.Tuple2)9 Path (java.nio.file.Path)7 HashSet (java.util.HashSet)7 AdapterToStructureData (org.rcsb.mmtf.encoder.AdapterToStructureData)7 JavaPairRDD (org.apache.spark.api.java.JavaPairRDD)6 ContainsLProteinChain (edu.sdsc.mmtf.spark.filters.ContainsLProteinChain)5 List (java.util.List)5 Resolution (edu.sdsc.mmtf.spark.filters.Resolution)4 MmtfReader (edu.sdsc.mmtf.spark.io.MmtfReader)4 File (java.io.File)4