Search in sources :

Example 1 with PasswordProvider

use of org.apache.tika.parser.PasswordProvider in project tika by apache.

the class TikaResource method fillMetadata.

@SuppressWarnings("serial")
public static void fillMetadata(Parser parser, Metadata metadata, ParseContext context, MultivaluedMap<String, String> httpHeaders) {
    String fileName = detectFilename(httpHeaders);
    if (fileName != null) {
        metadata.set(TikaMetadataKeys.RESOURCE_NAME_KEY, fileName);
    }
    String contentTypeHeader = httpHeaders.getFirst(HttpHeaders.CONTENT_TYPE);
    javax.ws.rs.core.MediaType mediaType = contentTypeHeader == null ? null : javax.ws.rs.core.MediaType.valueOf(contentTypeHeader);
    if (mediaType != null && "xml".equals(mediaType.getSubtype())) {
        mediaType = null;
    }
    if (mediaType != null && mediaType.equals(javax.ws.rs.core.MediaType.APPLICATION_OCTET_STREAM_TYPE)) {
        mediaType = null;
    }
    if (mediaType != null) {
        metadata.add(org.apache.tika.metadata.HttpHeaders.CONTENT_TYPE, mediaType.toString());
        final Detector detector = getDetector(parser);
        setDetector(parser, new Detector() {

            public MediaType detect(InputStream inputStream, Metadata metadata) throws IOException {
                String ct = metadata.get(org.apache.tika.metadata.HttpHeaders.CONTENT_TYPE);
                //make sure never to return null -- TIKA-1845
                MediaType type = null;
                if (ct != null) {
                    //this can return null if ct is not a valid mime type
                    type = MediaType.parse(ct);
                }
                if (type != null) {
                    return type;
                } else {
                    return detector.detect(inputStream, metadata);
                }
            }
        });
    }
    final String password = httpHeaders.getFirst("Password");
    if (password != null) {
        context.set(PasswordProvider.class, new PasswordProvider() {

            @Override
            public String getPassword(Metadata metadata) {
                return password;
            }
        });
    }
}
Also used : Detector(org.apache.tika.detect.Detector) InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata) MediaType(org.apache.tika.mime.MediaType) IOException(java.io.IOException) PasswordProvider(org.apache.tika.parser.PasswordProvider)

Example 2 with PasswordProvider

use of org.apache.tika.parser.PasswordProvider in project tika by apache.

the class SXWPFExtractorTest method testEncrypted.

@Test
public void testEncrypted() throws Exception {
    Map<String, String> tests = new HashMap<String, String>();
    tests.put("testWORD_protected_passtika.docx", "This is an encrypted Word 2007 File");
    Parser parser = new AutoDetectParser();
    Metadata m = new Metadata();
    PasswordProvider passwordProvider = new PasswordProvider() {

        @Override
        public String getPassword(Metadata metadata) {
            return "tika";
        }
    };
    OfficeParserConfig opc = new OfficeParserConfig();
    opc.setUseSAXDocxExtractor(true);
    ParseContext passwordContext = new ParseContext();
    passwordContext.set(org.apache.tika.parser.PasswordProvider.class, passwordProvider);
    passwordContext.set(OfficeParserConfig.class, opc);
    for (Map.Entry<String, String> e : tests.entrySet()) {
        assertContains(e.getValue(), getXML(e.getKey(), passwordContext).xml);
    }
    //now try with no password
    for (Map.Entry<String, String> e : tests.entrySet()) {
        boolean exc = false;
        try {
            getXML(e.getKey(), parseContext);
        } catch (EncryptedDocumentException ex) {
            exc = true;
        }
        assertTrue(exc);
    }
}
Also used : EncryptedDocumentException(org.apache.tika.exception.EncryptedDocumentException) HashMap(java.util.HashMap) Metadata(org.apache.tika.metadata.Metadata) PasswordProvider(org.apache.tika.parser.PasswordProvider) Parser(org.apache.tika.parser.Parser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) OfficeParserConfig(org.apache.tika.parser.microsoft.OfficeParserConfig) ParseContext(org.apache.tika.parser.ParseContext) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) HashMap(java.util.HashMap) Map(java.util.Map) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest)

Example 3 with PasswordProvider

use of org.apache.tika.parser.PasswordProvider in project tika by apache.

the class OOXMLParserTest method testEncrypted.

@Test
public void testEncrypted() throws Exception {
    Map<String, String> tests = new HashMap<String, String>();
    tests.put("testWORD_protected_passtika.docx", "This is an encrypted Word 2007 File");
    tests.put("testPPT_protected_passtika.pptx", "This is an encrypted PowerPoint 2007 slide.");
    tests.put("testEXCEL_protected_passtika.xlsx", "This is an Encrypted Excel spreadsheet.");
    Parser parser = new AutoDetectParser();
    Metadata m = new Metadata();
    PasswordProvider passwordProvider = new PasswordProvider() {

        @Override
        public String getPassword(Metadata metadata) {
            return "tika";
        }
    };
    ParseContext passwordContext = new ParseContext();
    passwordContext.set(org.apache.tika.parser.PasswordProvider.class, passwordProvider);
    for (Map.Entry<String, String> e : tests.entrySet()) {
        try (InputStream is = getTestDocument(e.getKey())) {
            ContentHandler handler = new BodyContentHandler();
            parser.parse(is, handler, m, passwordContext);
            assertContains(e.getValue(), handler.toString());
        }
    }
    ParseContext context = new ParseContext();
    //now try with no password
    for (Map.Entry<String, String> e : tests.entrySet()) {
        boolean exc = false;
        try (InputStream is = getTestDocument(e.getKey())) {
            ContentHandler handler = new BodyContentHandler();
            parser.parse(is, handler, m, context);
        } catch (EncryptedDocumentException ex) {
            exc = true;
        }
        assertTrue(exc);
    }
}
Also used : BodyContentHandler(org.apache.tika.sax.BodyContentHandler) EncryptedDocumentException(org.apache.tika.exception.EncryptedDocumentException) HashMap(java.util.HashMap) TikaInputStream(org.apache.tika.io.TikaInputStream) InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata) PasswordProvider(org.apache.tika.parser.PasswordProvider) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ContentHandler(org.xml.sax.ContentHandler) Parser(org.apache.tika.parser.Parser) OfficeParser(org.apache.tika.parser.microsoft.OfficeParser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) EmptyParser(org.apache.tika.parser.EmptyParser) ParseContext(org.apache.tika.parser.ParseContext) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) Map(java.util.Map) HashMap(java.util.HashMap) ExcelParserTest(org.apache.tika.parser.microsoft.ExcelParserTest) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest) WordParserTest(org.apache.tika.parser.microsoft.WordParserTest)

Example 4 with PasswordProvider

use of org.apache.tika.parser.PasswordProvider in project tika by apache.

the class ExcelParserTest method testExcelParserPassword.

@Test
public void testExcelParserPassword() throws Exception {
    try (InputStream input = ExcelParserTest.class.getResourceAsStream("/test-documents/testEXCEL_protected_passtika.xls")) {
        Metadata metadata = new Metadata();
        ContentHandler handler = new BodyContentHandler();
        ParseContext context = new ParseContext();
        context.set(Locale.class, Locale.US);
        new OfficeParser().parse(input, handler, metadata, context);
        fail("Document is encrypted, shouldn't parse");
    } catch (EncryptedDocumentException e) {
    // Good
    }
    // Try again, this time with the password
    try (InputStream input = ExcelParserTest.class.getResourceAsStream("/test-documents/testEXCEL_protected_passtika.xls")) {
        Metadata metadata = new Metadata();
        ContentHandler handler = new BodyContentHandler();
        ParseContext context = new ParseContext();
        context.set(Locale.class, Locale.US);
        context.set(PasswordProvider.class, new PasswordProvider() {

            @Override
            public String getPassword(Metadata metadata) {
                return "tika";
            }
        });
        new OfficeParser().parse(input, handler, metadata, context);
        assertEquals("application/vnd.ms-excel", metadata.get(Metadata.CONTENT_TYPE));
        assertEquals(null, metadata.get(TikaCoreProperties.TITLE));
        assertEquals("Antoni", metadata.get(TikaCoreProperties.CREATOR));
        assertEquals("2011-11-25T09:52:48Z", metadata.get(TikaCoreProperties.CREATED));
        String content = handler.toString();
        assertContains("This is an Encrypted Excel spreadsheet", content);
        assertNotContained("9.0", content);
    }
}
Also used : BodyContentHandler(org.apache.tika.sax.BodyContentHandler) EncryptedDocumentException(org.apache.tika.exception.EncryptedDocumentException) InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata) ParseContext(org.apache.tika.parser.ParseContext) PasswordProvider(org.apache.tika.parser.PasswordProvider) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ContentHandler(org.xml.sax.ContentHandler) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest)

Example 5 with PasswordProvider

use of org.apache.tika.parser.PasswordProvider in project tika by apache.

the class JackcessParser method parse.

@Override
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    TikaInputStream tis = TikaInputStream.get(stream);
    Database db = null;
    XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
    xhtml.startDocument();
    String password = null;
    PasswordProvider passwordProvider = context.get(PasswordProvider.class);
    if (passwordProvider != null) {
        password = passwordProvider.getPassword(metadata);
    }
    try {
        if (password == null) {
            //do this to ensure encryption/wrong password exception vs. more generic
            //"need right codec" error message.
            db = new DatabaseBuilder(tis.getFile()).setCodecProvider(new CryptCodecProvider()).setReadOnly(true).open();
        } else {
            db = new DatabaseBuilder(tis.getFile()).setCodecProvider(new CryptCodecProvider(password)).setReadOnly(true).open();
        }
        //just in case
        db.setLinkResolver(IGNORE_LINK_RESOLVER);
        JackcessExtractor ex = new JackcessExtractor(metadata, context, locale);
        ex.parse(db, xhtml);
    } catch (IllegalStateException e) {
        if (e.getMessage() != null && e.getMessage().contains("Incorrect password")) {
            throw new EncryptedDocumentException(e);
        }
        throw e;
    } finally {
        if (db != null) {
            try {
                db.close();
            } catch (IOException e) {
            //swallow = silent close
            }
        }
    }
    xhtml.endDocument();
}
Also used : DatabaseBuilder(com.healthmarketscience.jackcess.DatabaseBuilder) CryptCodecProvider(com.healthmarketscience.jackcess.CryptCodecProvider) EncryptedDocumentException(org.apache.tika.exception.EncryptedDocumentException) Database(com.healthmarketscience.jackcess.Database) TikaInputStream(org.apache.tika.io.TikaInputStream) IOException(java.io.IOException) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) PasswordProvider(org.apache.tika.parser.PasswordProvider)

Aggregations

PasswordProvider (org.apache.tika.parser.PasswordProvider)17 Metadata (org.apache.tika.metadata.Metadata)13 ParseContext (org.apache.tika.parser.ParseContext)11 Test (org.junit.Test)11 TikaTest (org.apache.tika.TikaTest)10 EncryptedDocumentException (org.apache.tika.exception.EncryptedDocumentException)10 InputStream (java.io.InputStream)9 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)9 Parser (org.apache.tika.parser.Parser)9 BodyContentHandler (org.apache.tika.sax.BodyContentHandler)7 TikaInputStream (org.apache.tika.io.TikaInputStream)6 ContentHandler (org.xml.sax.ContentHandler)6 IOException (java.io.IOException)3 HashMap (java.util.HashMap)3 Map (java.util.Map)3 TikaException (org.apache.tika.exception.TikaException)3 CompositeParser (org.apache.tika.parser.CompositeParser)3 TesseractOCRParser (org.apache.tika.parser.ocr.TesseractOCRParser)3 XHTMLContentHandler (org.apache.tika.sax.XHTMLContentHandler)3 BufferedInputStream (java.io.BufferedInputStream)2