Search in sources :

Example 1 with PDDocument

use of org.apache.pdfbox.pdmodel.PDDocument in project paper2ebook by ogrisel.

the class Transformer method main.

public static void main(String[] args) throws IOException, COSVisitorException {
    String original_pdf;
    if (args.length < 1 || args.length > 2) {
        System.err.println("Usage: java -jar paper2ebook-*.jar input.pdf [output.pdf]");
        return;
    } else {
        original_pdf = args[0];
    }
    Transformer transformer = new Transformer(PDDocument.load(original_pdf));
    PDDocument output = transformer.extract();
    if (args.length == 1) {
        String orig_no_pdf = original_pdf.substring(0, original_pdf.length() - 4);
        output.save(orig_no_pdf + "_ebook.pdf");
    } else {
        output.save(args[1]);
    }
}
Also used : PDDocument(org.apache.pdfbox.pdmodel.PDDocument)

Example 2 with PDDocument

use of org.apache.pdfbox.pdmodel.PDDocument in project mustangproject by ZUGFeRD.

the class ZUGFeRDExporterFromA1Factory method loadFromPDFA1.

/**
     * Makes A PDF/A3a-compliant document from a PDF-A1 compliant document (on
     * the metadata level, this will not e.g. convert graphics to JPG-2000)
     *
     * @param pdfBinary binary of a PDF/A1 compliant document
     */
public ZUGFeRDExporter loadFromPDFA1(byte[] pdfBinary) throws IOException, TransformerException {
    ensurePDFIsValidA1(new ByteArrayDataSource(new ByteArrayInputStream(pdfBinary)));
    PDDocument doc = PDDocument.load(pdfBinary);
    makePDFA3compliant(doc);
    return new ZUGFeRDExporter(doc);
}
Also used : ByteArrayInputStream(java.io.ByteArrayInputStream) PDDocument(org.apache.pdfbox.pdmodel.PDDocument) ByteArrayDataSource(org.apache.pdfbox.preflight.utils.ByteArrayDataSource)

Example 3 with PDDocument

use of org.apache.pdfbox.pdmodel.PDDocument in project camel by apache.

the class PdfProducer method doExtractText.

private String doExtractText(Exchange exchange) throws IOException, CryptographyException, InvalidPasswordException, BadSecurityHandlerException {
    LOG.debug("Got {} operation, going to extract text from provided pdf.", pdfConfiguration.getOperation());
    PDDocument document = exchange.getIn().getBody(PDDocument.class);
    if (document.isEncrypted()) {
        DecryptionMaterial decryptionMaterial = exchange.getIn().getHeader(DECRYPTION_MATERIAL_HEADER_NAME, DecryptionMaterial.class);
        if (decryptionMaterial == null) {
            throw new IllegalArgumentException(String.format("%s header is expected for %s operation " + "on encrypted document", DECRYPTION_MATERIAL_HEADER_NAME, pdfConfiguration.getOperation()));
        }
        document.openProtection(decryptionMaterial);
    }
    PDFTextStripper pdfTextStripper = new PDFTextStripper();
    return pdfTextStripper.getText(document);
}
Also used : DecryptionMaterial(org.apache.pdfbox.pdmodel.encryption.DecryptionMaterial) PDDocument(org.apache.pdfbox.pdmodel.PDDocument) PDFTextStripper(org.apache.pdfbox.util.PDFTextStripper)

Example 4 with PDDocument

use of org.apache.pdfbox.pdmodel.PDDocument in project camel by apache.

the class PdfAppendTest method testAppend.

@Test
public void testAppend() throws Exception {
    final String originalText = "Test";
    final String textToAppend = "Append";
    PDDocument document = new PDDocument();
    PDPage page = new PDPage(PDPage.PAGE_SIZE_A4);
    document.addPage(page);
    PDPageContentStream contentStream = new PDPageContentStream(document, page);
    contentStream.setFont(PDType1Font.HELVETICA, 12);
    contentStream.beginText();
    contentStream.moveTextPositionByAmount(20, 400);
    contentStream.drawString(originalText);
    contentStream.endText();
    contentStream.close();
    template.sendBodyAndHeader("direct:start", textToAppend, PdfHeaderConstants.PDF_DOCUMENT_HEADER_NAME, document);
    resultEndpoint.setExpectedMessageCount(1);
    resultEndpoint.expectedMessagesMatches(new Predicate() {

        @Override
        public boolean matches(Exchange exchange) {
            Object body = exchange.getIn().getBody();
            assertThat(body, instanceOf(ByteArrayOutputStream.class));
            try {
                PDDocument doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) body).toByteArray()));
                PDFTextStripper pdfTextStripper = new PDFTextStripper();
                String text = pdfTextStripper.getText(doc);
                assertEquals(2, doc.getNumberOfPages());
                assertThat(text, containsString(originalText));
                assertThat(text, containsString(textToAppend));
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
            return true;
        }
    });
    resultEndpoint.assertIsSatisfied();
}
Also used : Exchange(org.apache.camel.Exchange) PDPage(org.apache.pdfbox.pdmodel.PDPage) ByteArrayInputStream(java.io.ByteArrayInputStream) PDDocument(org.apache.pdfbox.pdmodel.PDDocument) PDPageContentStream(org.apache.pdfbox.pdmodel.edit.PDPageContentStream) Matchers.containsString(org.hamcrest.Matchers.containsString) IOException(java.io.IOException) Predicate(org.apache.camel.Predicate) PDFTextStripper(org.apache.pdfbox.util.PDFTextStripper) Test(org.junit.Test)

Example 5 with PDDocument

use of org.apache.pdfbox.pdmodel.PDDocument in project camel by apache.

the class PdfAppendTest method testAppendEncrypted.

@Test
public void testAppendEncrypted() throws Exception {
    final String originalText = "Test";
    final String textToAppend = "Append";
    PDDocument document = new PDDocument();
    PDPage page = new PDPage(PDPage.PAGE_SIZE_A4);
    document.addPage(page);
    PDPageContentStream contentStream = new PDPageContentStream(document, page);
    contentStream.setFont(PDType1Font.HELVETICA, 12);
    contentStream.beginText();
    contentStream.moveTextPositionByAmount(20, 400);
    contentStream.drawString(originalText);
    contentStream.endText();
    contentStream.close();
    final String ownerPass = "ownerPass";
    final String userPass = "userPass";
    AccessPermission accessPermission = new AccessPermission();
    accessPermission.setCanExtractContent(false);
    StandardProtectionPolicy protectionPolicy = new StandardProtectionPolicy(ownerPass, userPass, accessPermission);
    protectionPolicy.setEncryptionKeyLength(128);
    document.protect(protectionPolicy);
    ByteArrayOutputStream output = new ByteArrayOutputStream();
    document.save(output);
    // Encryption happens after saving.
    PDDocument encryptedDocument = PDDocument.load(new ByteArrayInputStream(output.toByteArray()));
    Map<String, Object> headers = new HashMap<String, Object>();
    headers.put(PdfHeaderConstants.PDF_DOCUMENT_HEADER_NAME, encryptedDocument);
    headers.put(PdfHeaderConstants.DECRYPTION_MATERIAL_HEADER_NAME, new StandardDecryptionMaterial(userPass));
    template.sendBodyAndHeaders("direct:start", textToAppend, headers);
    resultEndpoint.setExpectedMessageCount(1);
    resultEndpoint.expectedMessagesMatches(new Predicate() {

        @Override
        public boolean matches(Exchange exchange) {
            Object body = exchange.getIn().getBody();
            assertThat(body, instanceOf(ByteArrayOutputStream.class));
            try {
                PDDocument doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) body).toByteArray()));
                PDFTextStripper pdfTextStripper = new PDFTextStripper();
                String text = pdfTextStripper.getText(doc);
                assertEquals(2, doc.getNumberOfPages());
                assertThat(text, containsString(originalText));
                assertThat(text, containsString(textToAppend));
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
            return true;
        }
    });
    resultEndpoint.assertIsSatisfied();
}
Also used : PDPage(org.apache.pdfbox.pdmodel.PDPage) HashMap(java.util.HashMap) StandardProtectionPolicy(org.apache.pdfbox.pdmodel.encryption.StandardProtectionPolicy) AccessPermission(org.apache.pdfbox.pdmodel.encryption.AccessPermission) StandardDecryptionMaterial(org.apache.pdfbox.pdmodel.encryption.StandardDecryptionMaterial) Matchers.containsString(org.hamcrest.Matchers.containsString) ByteArrayOutputStream(java.io.ByteArrayOutputStream) IOException(java.io.IOException) Predicate(org.apache.camel.Predicate) Exchange(org.apache.camel.Exchange) ByteArrayInputStream(java.io.ByteArrayInputStream) PDDocument(org.apache.pdfbox.pdmodel.PDDocument) PDPageContentStream(org.apache.pdfbox.pdmodel.edit.PDPageContentStream) PDFTextStripper(org.apache.pdfbox.util.PDFTextStripper) Test(org.junit.Test)

Aggregations

PDDocument (org.apache.pdfbox.pdmodel.PDDocument)305 File (java.io.File)127 PDPage (org.apache.pdfbox.pdmodel.PDPage)93 Test (org.junit.Test)65 IOException (java.io.IOException)61 PDPageContentStream (org.apache.pdfbox.pdmodel.PDPageContentStream)49 InputStream (java.io.InputStream)43 BufferedImage (java.awt.image.BufferedImage)34 PDDocumentCatalog (org.apache.pdfbox.pdmodel.PDDocumentCatalog)27 ByteArrayInputStream (java.io.ByteArrayInputStream)25 PDRectangle (org.apache.pdfbox.pdmodel.common.PDRectangle)25 PDFont (org.apache.pdfbox.pdmodel.font.PDFont)24 ByteArrayOutputStream (java.io.ByteArrayOutputStream)22 FileInputStream (java.io.FileInputStream)21 PDFRenderer (org.apache.pdfbox.rendering.PDFRenderer)21 PDAcroForm (org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm)20 PDResources (org.apache.pdfbox.pdmodel.PDResources)19 PDFTextStripper (org.apache.pdfbox.text.PDFTextStripper)18 COSDictionary (org.apache.pdfbox.cos.COSDictionary)17 FileOutputStream (java.io.FileOutputStream)15