Search in sources :

Example 1 with TextExtractionStrategy

use of com.itextpdf.text.pdf.parser.TextExtractionStrategy in project tutorials by eugenp.

the class PDF2WordExample method generateDocFromPDF.

private static void generateDocFromPDF(String filename) throws IOException {
    XWPFDocument doc = new XWPFDocument();
    String pdf = filename;
    PdfReader reader = new PdfReader(pdf);
    PdfReaderContentParser parser = new PdfReaderContentParser(reader);
    for (int i = 1; i <= reader.getNumberOfPages(); i++) {
        TextExtractionStrategy strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
        String text = strategy.getResultantText();
        XWPFParagraph p = doc.createParagraph();
        XWPFRun run = p.createRun();
        run.setText(text);
        run.addBreak(BreakType.PAGE);
    }
    FileOutputStream out = new FileOutputStream("src/output/pdf.docx");
    doc.write(out);
    out.close();
    reader.close();
    doc.close();
}
Also used : XWPFParagraph(org.apache.poi.xwpf.usermodel.XWPFParagraph) TextExtractionStrategy(com.itextpdf.text.pdf.parser.TextExtractionStrategy) SimpleTextExtractionStrategy(com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy) XWPFRun(org.apache.poi.xwpf.usermodel.XWPFRun) PdfReaderContentParser(com.itextpdf.text.pdf.parser.PdfReaderContentParser) SimpleTextExtractionStrategy(com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy) FileOutputStream(java.io.FileOutputStream) XWPFDocument(org.apache.poi.xwpf.usermodel.XWPFDocument) PdfReader(com.itextpdf.text.pdf.PdfReader)

Aggregations

PdfReader (com.itextpdf.text.pdf.PdfReader)1 PdfReaderContentParser (com.itextpdf.text.pdf.parser.PdfReaderContentParser)1 SimpleTextExtractionStrategy (com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy)1 TextExtractionStrategy (com.itextpdf.text.pdf.parser.TextExtractionStrategy)1 FileOutputStream (java.io.FileOutputStream)1 XWPFDocument (org.apache.poi.xwpf.usermodel.XWPFDocument)1 XWPFParagraph (org.apache.poi.xwpf.usermodel.XWPFParagraph)1 XWPFRun (org.apache.poi.xwpf.usermodel.XWPFRun)1