Search in sources :

Example 6 with PDFParser

use of org.apache.pdfbox.pdfparser.PDFParser in project tutorials by eugenp.

the class PDF2TextExample method generateTxtFromPDF.

private static void generateTxtFromPDF(String filename) throws IOException {
    File f = new File(filename);
    String parsedText;
    PDFParser parser = new PDFParser(new RandomAccessFile(f, "r"));
    parser.parse();
    COSDocument cosDoc = parser.getDocument();
    PDFTextStripper pdfStripper = new PDFTextStripper();
    PDDocument pdDoc = new PDDocument(cosDoc);
    parsedText = pdfStripper.getText(pdDoc);
    if (cosDoc != null)
        cosDoc.close();
    if (pdDoc != null)
        pdDoc.close();
    PrintWriter pw = new PrintWriter("src/output/pdf.txt");
    pw.print(parsedText);
    pw.close();
}
Also used : RandomAccessFile(org.apache.pdfbox.io.RandomAccessFile) PDFParser(org.apache.pdfbox.pdfparser.PDFParser) PDDocument(org.apache.pdfbox.pdmodel.PDDocument) COSDocument(org.apache.pdfbox.cos.COSDocument) File(java.io.File) RandomAccessFile(org.apache.pdfbox.io.RandomAccessFile) PDFTextStripper(org.apache.pdfbox.text.PDFTextStripper) PrintWriter(java.io.PrintWriter)

Aggregations

PDFParser (org.apache.pdfbox.pdfparser.PDFParser)6 IOException (java.io.IOException)3 ScratchFile (org.apache.pdfbox.io.ScratchFile)3 COSDocument (org.apache.pdfbox.cos.COSDocument)2 RandomAccessRead (org.apache.pdfbox.io.RandomAccessRead)2 PDDocument (org.apache.pdfbox.pdmodel.PDDocument)2 File (java.io.File)1 PrintWriter (java.io.PrintWriter)1 RandomAccessBuffer (org.apache.pdfbox.io.RandomAccessBuffer)1 RandomAccessBufferedFileInputStream (org.apache.pdfbox.io.RandomAccessBufferedFileInputStream)1 RandomAccessFile (org.apache.pdfbox.io.RandomAccessFile)1 PDFTextStripper (org.apache.pdfbox.text.PDFTextStripper)1 PDFTextStripper (org.apache.pdfbox.util.PDFTextStripper)1