Search in sources :

Example 1 with PDFParser

use of org.apache.tika.parser.pdf.PDFParser in project tika by apache.

the class DisplayMetInstance method getMet.

public static Metadata getMet(URL url) throws IOException, SAXException, TikaException {
    Metadata met = new Metadata();
    PDFParser parser = new PDFParser();
    parser.parse(url.openStream(), new BodyContentHandler(), met, new ParseContext());
    return met;
}
Also used : BodyContentHandler(org.apache.tika.sax.BodyContentHandler) PDFParser(org.apache.tika.parser.pdf.PDFParser) Metadata(org.apache.tika.metadata.Metadata) ParseContext(org.apache.tika.parser.ParseContext)

Example 2 with PDFParser

use of org.apache.tika.parser.pdf.PDFParser in project tika by apache.

the class JournalParser method parse.

public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    TikaInputStream tis = TikaInputStream.get(stream, new TemporaryResources());
    File tmpFile = tis.getFile();
    GrobidRESTParser grobidParser = new GrobidRESTParser();
    grobidParser.parse(tmpFile.getAbsolutePath(), handler, metadata, context);
    PDFParser parser = new PDFParser();
    parser.parse(new FileInputStream(tmpFile), handler, metadata, context);
}
Also used : PDFParser(org.apache.tika.parser.pdf.PDFParser) TemporaryResources(org.apache.tika.io.TemporaryResources) TikaInputStream(org.apache.tika.io.TikaInputStream) File(java.io.File) FileInputStream(java.io.FileInputStream)

Aggregations

PDFParser (org.apache.tika.parser.pdf.PDFParser)2 File (java.io.File)1 FileInputStream (java.io.FileInputStream)1 TemporaryResources (org.apache.tika.io.TemporaryResources)1 TikaInputStream (org.apache.tika.io.TikaInputStream)1 Metadata (org.apache.tika.metadata.Metadata)1 ParseContext (org.apache.tika.parser.ParseContext)1 BodyContentHandler (org.apache.tika.sax.BodyContentHandler)1