Search in sources :

Example 36 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mmtf-spark by sbl-sdsc.

the class RepartitionHadoopSequenceFile method main.

/**
 * Reparations an MMTF-Hadoop Sequence file.
 *
 * @param args
 *            args[0] path to input Hadoop Sequence file, args[1] path to
 *            output Hadoop Sequence File, args[3] number of partitions
 * @throws IOException
 */
public static void main(String[] args) throws IOException {
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(RepartitionHadoopSequenceFile.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    long start = System.nanoTime();
    if (args.length != 3) {
        System.out.println("Usage: RepartitionHadoopSequenceFile <input-path> <ouput-path> <number-of-partitions>");
    }
    String inputPath = args[0];
    String outputPath = args[1];
    int numPartitions = Integer.parseInt(args[2]);
    JavaPairRDD<String, StructureDataInterface> pdb = MmtfReader.readSequenceFile(inputPath, sc);
    pdb = pdb.repartition(numPartitions);
    MmtfWriter.writeSequenceFile(outputPath, sc, pdb);
    long end = System.nanoTime();
    System.out.println("Time: " + TimeUnit.NANOSECONDS.toSeconds(end - start) + " sec.");
    sc.close();
}
Also used : JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Example 37 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mmtf-spark by sbl-sdsc.

the class MmcifToMmtfFull method main.

/**
 * Converts a directory containing .cif files into an MMTF-Hadoop Sequence file.
 * The input directory is traversed recursively to find PDB files.
 *
 * @param args args[0] <input-path-to-cif_files>, args[1] <output-path-to-mmtf-hadoop-file>
 *
 * @throws FileNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException {
    if (args.length != 2) {
        System.out.println("Usage: MmcifToMmtfFull <input-path-to-cif_files> <output-path-to-mmtf-hadoop-file>");
    }
    // path to input directory
    String cifPath = args[0];
    // path to output directory
    String mmtfPath = args[1];
    // instantiate Spark
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("MmcifToMmtfFull");
    JavaSparkContext sc = new JavaSparkContext(conf);
    // read cif files recursively starting from the specified top level directory
    JavaPairRDD<String, StructureDataInterface> structures = MmtfImporter.importMmcifFiles(cifPath, sc);
    // save as an MMTF-Hadoop Sequence File
    MmtfWriter.writeSequenceFile(mmtfPath, sc, structures);
    System.out.println(structures.count() + " structures written to: " + mmtfPath);
    // close Spark
    sc.close();
}
Also used : JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Example 38 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mmtf-spark by sbl-sdsc.

the class MmtfBenchmark method main.

public static void main(String[] args) throws FileNotFoundException {
    long start = System.nanoTime();
    if (args.length != 1) {
        System.out.println("Usage: MmtfBenchmark <mmtf-hadoop-sequence-file>");
    }
    // instantiate Spark. Each Spark application needs these two lines of code.
    SparkConf conf = new SparkConf().setAppName(MmtfBenchmark.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    // read all PDB entries from a local Hadoop sequence file
    String path = args[0];
    JavaPairRDD<String, StructureDataInterface> pdb = MmtfReader.readSequenceFile(path, sc);
    System.out.println("# structures: " + pdb.count());
    // close Spark
    sc.close();
    long end = System.nanoTime();
    System.out.println((end - start) / 1E9 + " sec.");
}
Also used : JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Example 39 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mmtf-spark by sbl-sdsc.

the class PdbToMmtfFull method main.

/**
 * Converts a directory containing PDB files into an MMTF-Hadoop Sequence file.
 * The input directory is traversed recursively to find PDB files.
 *
 * @param args args[0] <input-path-to-pdb_files>, args[1] <output-path-to-mmtf-hadoop-file>
 *
 * @throws FileNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException {
    if (args.length != 2) {
        System.out.println("Usage: PdbToMmtfFull <input-path-to-pdb_files> <output-path-to-mmtf-hadoop-file>");
    }
    // path to input directory
    String pdbPath = args[0];
    // path to output directory
    String mmtfPath = args[1];
    // instantiate Spark
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("PdbToMmtfFull");
    JavaSparkContext sc = new JavaSparkContext(conf);
    // read PDB files recursively starting from the specified top level directory
    JavaPairRDD<String, StructureDataInterface> structures = MmtfImporter.importPdbFiles(pdbPath, sc);
    // save as an MMTF-Hadoop Sequence File
    MmtfWriter.writeSequenceFile(mmtfPath, sc, structures);
    System.out.println(structures.count() + " structures written to: " + mmtfPath);
    // close Spark
    sc.close();
}
Also used : JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Example 40 with StructureDataInterface

use of org.rcsb.mmtf.api.StructureDataInterface in project mmtf-spark by sbl-sdsc.

the class ReadLocalMmtfHadoopFile method main.

public static void main(String[] args) {
    if (args.length != 1) {
        System.err.println("Usage: " + ReadLocalMmtfHadoopFile.class.getSimpleName() + " <inputFilePath>");
        System.exit(1);
    }
    // instantiate Spark. Each Spark application needs these two lines of code.
    SparkConf conf = new SparkConf().setMaster("local[*]").setAppName(ReadLocalMmtfHadoopFile.class.getSimpleName());
    JavaSparkContext sc = new JavaSparkContext(conf);
    // read a local MMTF file
    JavaPairRDD<String, StructureDataInterface> pdb = MmtfReader.readSequenceFile(args[0], sc);
    System.out.println("# structures: " + pdb.count());
    // print structural details
    pdb = pdb.sample(false, 0.01);
    pdb.foreach(t -> TraverseStructureHierarchy.printStructureData(t._2));
    // close Spark
    sc.close();
}
Also used : JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) StructureDataInterface(org.rcsb.mmtf.api.StructureDataInterface) SparkConf(org.apache.spark.SparkConf)

Aggregations

StructureDataInterface (org.rcsb.mmtf.api.StructureDataInterface)102 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)60 SparkConf (org.apache.spark.SparkConf)58 Row (org.apache.spark.sql.Row)27 StructureToPolymerChains (edu.sdsc.mmtf.spark.mappers.StructureToPolymerChains)22 Test (org.junit.Test)20 Pisces (edu.sdsc.mmtf.spark.webfilters.Pisces)19 ArrayList (java.util.ArrayList)12 ProteinSequenceEncoder (edu.sdsc.mmtf.spark.ml.ProteinSequenceEncoder)10 ColumnarStructure (edu.sdsc.mmtf.spark.utils.ColumnarStructure)10 Tuple2 (scala.Tuple2)9 Path (java.nio.file.Path)7 HashSet (java.util.HashSet)7 AdapterToStructureData (org.rcsb.mmtf.encoder.AdapterToStructureData)7 JavaPairRDD (org.apache.spark.api.java.JavaPairRDD)6 ContainsLProteinChain (edu.sdsc.mmtf.spark.filters.ContainsLProteinChain)5 List (java.util.List)5 Resolution (edu.sdsc.mmtf.spark.filters.Resolution)4 MmtfReader (edu.sdsc.mmtf.spark.io.MmtfReader)4 File (java.io.File)4