Search in sources :

Example 1 with EncryptedDocumentException

use of org.apache.tika.exception.EncryptedDocumentException in project tika by apache.

the class SXWPFExtractorTest method testEncrypted.

@Test
public void testEncrypted() throws Exception {
    Map<String, String> tests = new HashMap<String, String>();
    tests.put("testWORD_protected_passtika.docx", "This is an encrypted Word 2007 File");
    Parser parser = new AutoDetectParser();
    Metadata m = new Metadata();
    PasswordProvider passwordProvider = new PasswordProvider() {

        @Override
        public String getPassword(Metadata metadata) {
            return "tika";
        }
    };
    OfficeParserConfig opc = new OfficeParserConfig();
    opc.setUseSAXDocxExtractor(true);
    ParseContext passwordContext = new ParseContext();
    passwordContext.set(org.apache.tika.parser.PasswordProvider.class, passwordProvider);
    passwordContext.set(OfficeParserConfig.class, opc);
    for (Map.Entry<String, String> e : tests.entrySet()) {
        assertContains(e.getValue(), getXML(e.getKey(), passwordContext).xml);
    }
    //now try with no password
    for (Map.Entry<String, String> e : tests.entrySet()) {
        boolean exc = false;
        try {
            getXML(e.getKey(), parseContext);
        } catch (EncryptedDocumentException ex) {
            exc = true;
        }
        assertTrue(exc);
    }
}
Also used : EncryptedDocumentException(org.apache.tika.exception.EncryptedDocumentException) HashMap(java.util.HashMap) Metadata(org.apache.tika.metadata.Metadata) PasswordProvider(org.apache.tika.parser.PasswordProvider) Parser(org.apache.tika.parser.Parser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) OfficeParserConfig(org.apache.tika.parser.microsoft.OfficeParserConfig) ParseContext(org.apache.tika.parser.ParseContext) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) HashMap(java.util.HashMap) Map(java.util.Map) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest)

Example 2 with EncryptedDocumentException

use of org.apache.tika.exception.EncryptedDocumentException in project tika by apache.

the class OOXMLParserTest method testEncrypted.

@Test
public void testEncrypted() throws Exception {
    Map<String, String> tests = new HashMap<String, String>();
    tests.put("testWORD_protected_passtika.docx", "This is an encrypted Word 2007 File");
    tests.put("testPPT_protected_passtika.pptx", "This is an encrypted PowerPoint 2007 slide.");
    tests.put("testEXCEL_protected_passtika.xlsx", "This is an Encrypted Excel spreadsheet.");
    Parser parser = new AutoDetectParser();
    Metadata m = new Metadata();
    PasswordProvider passwordProvider = new PasswordProvider() {

        @Override
        public String getPassword(Metadata metadata) {
            return "tika";
        }
    };
    ParseContext passwordContext = new ParseContext();
    passwordContext.set(org.apache.tika.parser.PasswordProvider.class, passwordProvider);
    for (Map.Entry<String, String> e : tests.entrySet()) {
        try (InputStream is = getTestDocument(e.getKey())) {
            ContentHandler handler = new BodyContentHandler();
            parser.parse(is, handler, m, passwordContext);
            assertContains(e.getValue(), handler.toString());
        }
    }
    ParseContext context = new ParseContext();
    //now try with no password
    for (Map.Entry<String, String> e : tests.entrySet()) {
        boolean exc = false;
        try (InputStream is = getTestDocument(e.getKey())) {
            ContentHandler handler = new BodyContentHandler();
            parser.parse(is, handler, m, context);
        } catch (EncryptedDocumentException ex) {
            exc = true;
        }
        assertTrue(exc);
    }
}
Also used : BodyContentHandler(org.apache.tika.sax.BodyContentHandler) EncryptedDocumentException(org.apache.tika.exception.EncryptedDocumentException) HashMap(java.util.HashMap) TikaInputStream(org.apache.tika.io.TikaInputStream) InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata) PasswordProvider(org.apache.tika.parser.PasswordProvider) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ContentHandler(org.xml.sax.ContentHandler) Parser(org.apache.tika.parser.Parser) OfficeParser(org.apache.tika.parser.microsoft.OfficeParser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) EmptyParser(org.apache.tika.parser.EmptyParser) ParseContext(org.apache.tika.parser.ParseContext) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) Map(java.util.Map) HashMap(java.util.HashMap) ExcelParserTest(org.apache.tika.parser.microsoft.ExcelParserTest) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest) WordParserTest(org.apache.tika.parser.microsoft.WordParserTest)

Example 3 with EncryptedDocumentException

use of org.apache.tika.exception.EncryptedDocumentException in project tika by apache.

the class ExcelParserTest method testExcelParserPassword.

@Test
public void testExcelParserPassword() throws Exception {
    try (InputStream input = ExcelParserTest.class.getResourceAsStream("/test-documents/testEXCEL_protected_passtika.xls")) {
        Metadata metadata = new Metadata();
        ContentHandler handler = new BodyContentHandler();
        ParseContext context = new ParseContext();
        context.set(Locale.class, Locale.US);
        new OfficeParser().parse(input, handler, metadata, context);
        fail("Document is encrypted, shouldn't parse");
    } catch (EncryptedDocumentException e) {
    // Good
    }
    // Try again, this time with the password
    try (InputStream input = ExcelParserTest.class.getResourceAsStream("/test-documents/testEXCEL_protected_passtika.xls")) {
        Metadata metadata = new Metadata();
        ContentHandler handler = new BodyContentHandler();
        ParseContext context = new ParseContext();
        context.set(Locale.class, Locale.US);
        context.set(PasswordProvider.class, new PasswordProvider() {

            @Override
            public String getPassword(Metadata metadata) {
                return "tika";
            }
        });
        new OfficeParser().parse(input, handler, metadata, context);
        assertEquals("application/vnd.ms-excel", metadata.get(Metadata.CONTENT_TYPE));
        assertEquals(null, metadata.get(TikaCoreProperties.TITLE));
        assertEquals("Antoni", metadata.get(TikaCoreProperties.CREATOR));
        assertEquals("2011-11-25T09:52:48Z", metadata.get(TikaCoreProperties.CREATED));
        String content = handler.toString();
        assertContains("This is an Encrypted Excel spreadsheet", content);
        assertNotContained("9.0", content);
    }
}
Also used : BodyContentHandler(org.apache.tika.sax.BodyContentHandler) EncryptedDocumentException(org.apache.tika.exception.EncryptedDocumentException) InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata) ParseContext(org.apache.tika.parser.ParseContext) PasswordProvider(org.apache.tika.parser.PasswordProvider) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ContentHandler(org.xml.sax.ContentHandler) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest)

Example 4 with EncryptedDocumentException

use of org.apache.tika.exception.EncryptedDocumentException in project tika by apache.

the class JackcessParser method parse.

@Override
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    TikaInputStream tis = TikaInputStream.get(stream);
    Database db = null;
    XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
    xhtml.startDocument();
    String password = null;
    PasswordProvider passwordProvider = context.get(PasswordProvider.class);
    if (passwordProvider != null) {
        password = passwordProvider.getPassword(metadata);
    }
    try {
        if (password == null) {
            //do this to ensure encryption/wrong password exception vs. more generic
            //"need right codec" error message.
            db = new DatabaseBuilder(tis.getFile()).setCodecProvider(new CryptCodecProvider()).setReadOnly(true).open();
        } else {
            db = new DatabaseBuilder(tis.getFile()).setCodecProvider(new CryptCodecProvider(password)).setReadOnly(true).open();
        }
        //just in case
        db.setLinkResolver(IGNORE_LINK_RESOLVER);
        JackcessExtractor ex = new JackcessExtractor(metadata, context, locale);
        ex.parse(db, xhtml);
    } catch (IllegalStateException e) {
        if (e.getMessage() != null && e.getMessage().contains("Incorrect password")) {
            throw new EncryptedDocumentException(e);
        }
        throw e;
    } finally {
        if (db != null) {
            try {
                db.close();
            } catch (IOException e) {
            //swallow = silent close
            }
        }
    }
    xhtml.endDocument();
}
Also used : DatabaseBuilder(com.healthmarketscience.jackcess.DatabaseBuilder) CryptCodecProvider(com.healthmarketscience.jackcess.CryptCodecProvider) EncryptedDocumentException(org.apache.tika.exception.EncryptedDocumentException) Database(com.healthmarketscience.jackcess.Database) TikaInputStream(org.apache.tika.io.TikaInputStream) IOException(java.io.IOException) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) PasswordProvider(org.apache.tika.parser.PasswordProvider)

Example 5 with EncryptedDocumentException

use of org.apache.tika.exception.EncryptedDocumentException in project tika by apache.

the class RarParser method parse.

@Override
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
    xhtml.startDocument();
    EmbeddedDocumentExtractor extractor = EmbeddedDocumentUtil.getEmbeddedDocumentExtractor(context);
    Archive rar = null;
    try (TemporaryResources tmp = new TemporaryResources()) {
        TikaInputStream tis = TikaInputStream.get(stream, tmp);
        rar = new Archive(tis.getFile());
        if (rar.isEncrypted()) {
            throw new EncryptedDocumentException();
        }
        //Without this BodyContentHandler does not work
        xhtml.element("div", " ");
        FileHeader header = rar.nextFileHeader();
        while (header != null && !Thread.currentThread().isInterrupted()) {
            if (!header.isDirectory()) {
                try (InputStream subFile = rar.getInputStream(header)) {
                    Metadata entrydata = PackageParser.handleEntryMetadata("".equals(header.getFileNameW()) ? header.getFileNameString() : header.getFileNameW(), header.getCTime(), header.getMTime(), header.getFullUnpackSize(), xhtml);
                    if (extractor.shouldParseEmbedded(entrydata)) {
                        extractor.parseEmbedded(subFile, handler, entrydata, true);
                    }
                }
            }
            header = rar.nextFileHeader();
        }
    } catch (RarException e) {
        throw new TikaException("RarParser Exception", e);
    } finally {
        if (rar != null)
            rar.close();
    }
    xhtml.endDocument();
}
Also used : Archive(com.github.junrar.Archive) EncryptedDocumentException(org.apache.tika.exception.EncryptedDocumentException) TikaException(org.apache.tika.exception.TikaException) EmbeddedDocumentExtractor(org.apache.tika.extractor.EmbeddedDocumentExtractor) TikaInputStream(org.apache.tika.io.TikaInputStream) InputStream(java.io.InputStream) TemporaryResources(org.apache.tika.io.TemporaryResources) Metadata(org.apache.tika.metadata.Metadata) TikaInputStream(org.apache.tika.io.TikaInputStream) RarException(com.github.junrar.exception.RarException) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) FileHeader(com.github.junrar.rarfile.FileHeader)

Aggregations

EncryptedDocumentException (org.apache.tika.exception.EncryptedDocumentException)16 PasswordProvider (org.apache.tika.parser.PasswordProvider)10 Metadata (org.apache.tika.metadata.Metadata)9 TikaInputStream (org.apache.tika.io.TikaInputStream)8 InputStream (java.io.InputStream)7 BodyContentHandler (org.apache.tika.sax.BodyContentHandler)7 Test (org.junit.Test)7 TikaTest (org.apache.tika.TikaTest)6 TikaException (org.apache.tika.exception.TikaException)6 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)6 ParseContext (org.apache.tika.parser.ParseContext)6 Parser (org.apache.tika.parser.Parser)6 ContentHandler (org.xml.sax.ContentHandler)5 TemporaryResources (org.apache.tika.io.TemporaryResources)4 HashMap (java.util.HashMap)3 Map (java.util.Map)3 GeneralSecurityException (java.security.GeneralSecurityException)2 ZipArchiveEntry (org.apache.commons.compress.archivers.zip.ZipArchiveEntry)2 CloseShieldInputStream (org.apache.commons.io.input.CloseShieldInputStream)2 XHTMLContentHandler (org.apache.tika.sax.XHTMLContentHandler)2