Search in sources :

Example 6 with PasswordProvider

use of org.apache.tika.parser.PasswordProvider in project tika by apache.

the class PDFParser method getPassword.

private String getPassword(Metadata metadata, ParseContext context) {
    String password = null;
    // Did they supply a new style Password Provider?
    PasswordProvider passwordProvider = context.get(PasswordProvider.class);
    if (passwordProvider != null) {
        password = passwordProvider.getPassword(metadata);
    }
    // Fall back on the old style metadata if set
    if (password == null && metadata.get(PASSWORD) != null) {
        password = metadata.get(PASSWORD);
    }
    // If no password is given, use an empty string as the default
    if (password == null) {
        password = "";
    }
    return password;
}
Also used : COSString(org.apache.pdfbox.cos.COSString) PasswordProvider(org.apache.tika.parser.PasswordProvider)

Example 7 with PasswordProvider

use of org.apache.tika.parser.PasswordProvider in project tika by apache.

the class PDFParserTest method testLegacyAccessChecking.

//Access checker tests
@Test
public void testLegacyAccessChecking() throws Exception {
    //test that default behavior doesn't throw AccessPermissionException
    for (String file : new String[] { "testPDF_no_extract_no_accessibility_owner_empty.pdf", "testPDF_no_extract_yes_accessibility_owner_empty.pdf" }) {
        String xml = getXML(file).xml;
        assertContains("Hello World", xml);
    }
    //now try with the user password
    PasswordProvider provider = new PasswordProvider() {

        @Override
        public String getPassword(Metadata metadata) {
            return "user";
        }
    };
    ParseContext context = new ParseContext();
    context.set(PasswordProvider.class, provider);
    Parser parser = new AutoDetectParser();
    for (String path : new String[] { "testPDF_no_extract_no_accessibility_owner_user.pdf", "testPDF_no_extract_yes_accessibility_owner_user.pdf" }) {
        assertContains("Hello World", getXML(path, context).xml);
    }
}
Also used : Metadata(org.apache.tika.metadata.Metadata) ParseContext(org.apache.tika.parser.ParseContext) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) PasswordProvider(org.apache.tika.parser.PasswordProvider) Parser(org.apache.tika.parser.Parser) CompositeParser(org.apache.tika.parser.CompositeParser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) TesseractOCRParser(org.apache.tika.parser.ocr.TesseractOCRParser) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest)

Example 8 with PasswordProvider

use of org.apache.tika.parser.PasswordProvider in project tika by apache.

the class PDFParserTest method testAccessCheckingOwnerPassword.

@Test
public void testAccessCheckingOwnerPassword() throws Exception {
    ParseContext context = new ParseContext();
    PDFParserConfig config = new PDFParserConfig();
    //don't allow extraction, not even for accessibility
    config.setAccessChecker(new AccessChecker(true));
    PasswordProvider passwordProvider = new PasswordProvider() {

        @Override
        public String getPassword(Metadata metadata) {
            return "owner";
        }
    };
    context.set(PasswordProvider.class, passwordProvider);
    context.set(PDFParserConfig.class, config);
    //with owner's password, text can be extracted, no matter the AccessibilityChecker's settings
    for (String path : new String[] { "testPDF_no_extract_no_accessibility_owner_user.pdf", "testPDF_no_extract_yes_accessibility_owner_user.pdf", "testPDF_no_extract_no_accessibility_owner_empty.pdf", "testPDF_no_extract_yes_accessibility_owner_empty.pdf" }) {
        assertContains("Hello World", getXML(path, context).xml);
    }
    //really, with owner's password, all extraction is allowed
    config.setAccessChecker(new AccessChecker(false));
    for (String path : new String[] { "testPDF_no_extract_no_accessibility_owner_user.pdf", "testPDF_no_extract_yes_accessibility_owner_user.pdf", "testPDF_no_extract_no_accessibility_owner_empty.pdf", "testPDF_no_extract_yes_accessibility_owner_empty.pdf" }) {
        assertContains("Hello World", getXML(path, context).xml);
    }
}
Also used : ParseContext(org.apache.tika.parser.ParseContext) Metadata(org.apache.tika.metadata.Metadata) PasswordProvider(org.apache.tika.parser.PasswordProvider) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest)

Example 9 with PasswordProvider

use of org.apache.tika.parser.PasswordProvider in project tika by apache.

the class RFC822ParserTest method testEncryptedZipAttachment.

/**
     * Test TIKA-1028 - If the mail contains an encrypted attachment (or
     * an attachment that others triggers an error), parsing should carry
     * on for the remainder regardless
     */
@Test
public void testEncryptedZipAttachment() throws Exception {
    Parser parser = new RFC822Parser();
    Metadata metadata = new Metadata();
    ParseContext context = new ParseContext();
    context.set(Parser.class, new AutoDetectParser());
    InputStream stream = getStream("test-documents/testRFC822_encrypted_zip");
    ContentHandler handler = new BodyContentHandler();
    parser.parse(stream, handler, metadata, context);
    // Check we go the metadata
    assertEquals("Juha Haaga <juha.haaga@gmail.com>", metadata.get(Metadata.MESSAGE_FROM));
    assertEquals("Test mail for Tika", metadata.get(TikaCoreProperties.TITLE));
    // Check we got the message text, for both Plain Text and HTML
    assertContains("Includes encrypted zip file", handler.toString());
    assertContains("password is \"test\".", handler.toString());
    assertContains("This is the Plain Text part", handler.toString());
    assertContains("This is the HTML part", handler.toString());
    // We won't get the contents of the zip file, but we will get the name
    assertContains("text.txt", handler.toString());
    assertNotContained("ENCRYPTED ZIP FILES", handler.toString());
    // Try again, this time with the password supplied
    // Check that we also get the zip's contents as well
    context.set(PasswordProvider.class, new PasswordProvider() {

        public String getPassword(Metadata metadata) {
            return "test";
        }
    });
    stream = getStream("test-documents/testRFC822_encrypted_zip");
    handler = new BodyContentHandler();
    parser.parse(stream, handler, metadata, context);
    assertContains("Includes encrypted zip file", handler.toString());
    assertContains("password is \"test\".", handler.toString());
    assertContains("This is the Plain Text part", handler.toString());
    assertContains("This is the HTML part", handler.toString());
    // We do get the name of the file in the encrypted zip file
    assertContains("text.txt", handler.toString());
    // TODO Upgrade to a version of Commons Compress with Encryption
    //  support, then verify we get the contents of the text file
    //  held within the encrypted zip
    // No Zip Encryption support yet
    assumeTrue(false);
    assertContains("TEST DATA FOR TIKA.", handler.toString());
    assertContains("ENCRYPTED ZIP FILES", handler.toString());
    assertContains("TIKA-1028", handler.toString());
}
Also used : BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ByteArrayInputStream(java.io.ByteArrayInputStream) TikaInputStream(org.apache.tika.io.TikaInputStream) InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata) ParseContext(org.apache.tika.parser.ParseContext) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) PasswordProvider(org.apache.tika.parser.PasswordProvider) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ContentHandler(org.xml.sax.ContentHandler) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) Parser(org.apache.tika.parser.Parser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) TesseractOCRParserTest(org.apache.tika.parser.ocr.TesseractOCRParserTest) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest)

Example 10 with PasswordProvider

use of org.apache.tika.parser.PasswordProvider in project tika by apache.

the class JackcessParserTest method testPassword.

@Test
public void testPassword() throws Exception {
    ParseContext c = new ParseContext();
    c.set(PasswordProvider.class, new PasswordProvider() {

        @Override
        public String getPassword(Metadata metadata) {
            return "tika";
        }
    });
    Parser p = new AutoDetectParser();
    String content = null;
    try (InputStream is = this.getResourceAsStream("/test-documents/testAccess2_encrypted.accdb")) {
        content = getText(is, p, c);
    }
    assertContains("red and brown", content);
    //now try wrong password
    c.set(PasswordProvider.class, new PasswordProvider() {

        @Override
        public String getPassword(Metadata metadata) {
            return "WRONG";
        }
    });
    boolean ex = false;
    try (InputStream is = this.getResourceAsStream("/test-documents/testAccess2_encrypted.accdb")) {
        getText(is, p, c);
    } catch (EncryptedDocumentException e) {
        ex = true;
    }
    assertTrue("failed to throw encrypted document exception for wrong password", ex);
    //now try null
    c.set(PasswordProvider.class, new PasswordProvider() {

        @Override
        public String getPassword(Metadata metadata) {
            return null;
        }
    });
    ex = false;
    try (InputStream is = this.getResourceAsStream("/test-documents/testAccess2_encrypted.accdb")) {
        getText(is, p, c);
    } catch (EncryptedDocumentException e) {
        ex = true;
    }
    assertTrue("failed to throw encrypted document exception for null password", ex);
    //now try missing password provider
    c = new ParseContext();
    ex = false;
    try (InputStream is = this.getResourceAsStream("/test-documents/testAccess2_encrypted.accdb")) {
        getText(is, p, c);
    } catch (EncryptedDocumentException e) {
        ex = true;
    }
    assertTrue("failed to throw encrypted document exception for missing password provider", ex);
    //now try password on file that doesn't need a password
    c = new ParseContext();
    c.set(PasswordProvider.class, new PasswordProvider() {

        @Override
        public String getPassword(Metadata metadata) {
            return "tika";
        }
    });
    ex = false;
    try (InputStream is = this.getResourceAsStream("/test-documents/testAccess2.accdb")) {
        content = getText(is, p, c);
    } catch (EncryptedDocumentException e) {
        ex = true;
    }
    assertFalse("shouldn't have thrown encrypted document exception for " + "opening unencrypted file that doesn't need passowrd", ex);
    assertContains("red and brown", content);
}
Also used : EncryptedDocumentException(org.apache.tika.exception.EncryptedDocumentException) InputStream(java.io.InputStream) ParseContext(org.apache.tika.parser.ParseContext) Metadata(org.apache.tika.metadata.Metadata) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) PasswordProvider(org.apache.tika.parser.PasswordProvider) Parser(org.apache.tika.parser.Parser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest)

Aggregations

PasswordProvider (org.apache.tika.parser.PasswordProvider)16 Metadata (org.apache.tika.metadata.Metadata)12 Test (org.junit.Test)11 TikaTest (org.apache.tika.TikaTest)10 EncryptedDocumentException (org.apache.tika.exception.EncryptedDocumentException)10 ParseContext (org.apache.tika.parser.ParseContext)10 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)9 Parser (org.apache.tika.parser.Parser)9 InputStream (java.io.InputStream)8 BodyContentHandler (org.apache.tika.sax.BodyContentHandler)7 ContentHandler (org.xml.sax.ContentHandler)6 TikaInputStream (org.apache.tika.io.TikaInputStream)5 HashMap (java.util.HashMap)3 Map (java.util.Map)3 CompositeParser (org.apache.tika.parser.CompositeParser)3 TesseractOCRParser (org.apache.tika.parser.ocr.TesseractOCRParser)3 XHTMLContentHandler (org.apache.tika.sax.XHTMLContentHandler)3 IOException (java.io.IOException)2 TikaException (org.apache.tika.exception.TikaException)2 CryptCodecProvider (com.healthmarketscience.jackcess.CryptCodecProvider)1