Search in sources :

Example 46 with ZipArchiveEntry

use of org.apache.commons.compress.archivers.zip.ZipArchiveEntry in project tika by apache.

the class IWorkPackageParser method parse.

public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    ZipArchiveInputStream zip = new ZipArchiveInputStream(stream);
    ZipArchiveEntry entry = zip.getNextZipEntry();
    while (entry != null) {
        if (!IWORK_CONTENT_ENTRIES.contains(entry.getName())) {
            entry = zip.getNextZipEntry();
            continue;
        }
        InputStream entryStream = new BufferedInputStream(zip, 4096);
        entryStream.mark(4096);
        IWORKDocumentType type = IWORKDocumentType.detectType(entryStream);
        entryStream.reset();
        if (type != null) {
            XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
            ContentHandler contentHandler;
            switch(type) {
                case KEYNOTE:
                    contentHandler = new KeynoteContentHandler(xhtml, metadata);
                    break;
                case NUMBERS:
                    contentHandler = new NumbersContentHandler(xhtml, metadata);
                    break;
                case PAGES:
                    contentHandler = new PagesContentHandler(xhtml, metadata);
                    break;
                case ENCRYPTED:
                    // We can't do anything for the file right now
                    contentHandler = null;
                    break;
                default:
                    throw new TikaException("Unhandled iWorks file " + type);
            }
            metadata.add(Metadata.CONTENT_TYPE, type.getType().toString());
            xhtml.startDocument();
            if (contentHandler != null) {
                context.getSAXParser().parse(new CloseShieldInputStream(entryStream), new OfflineContentHandler(contentHandler));
            }
            xhtml.endDocument();
        }
        entry = zip.getNextZipEntry();
    }
// Don't close the zip InputStream (TIKA-1117).
}
Also used : TikaException(org.apache.tika.exception.TikaException) ZipArchiveInputStream(org.apache.commons.compress.archivers.zip.ZipArchiveInputStream) BufferedInputStream(java.io.BufferedInputStream) ZipArchiveInputStream(org.apache.commons.compress.archivers.zip.ZipArchiveInputStream) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream) InputStream(java.io.InputStream) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) OfflineContentHandler(org.apache.tika.sax.OfflineContentHandler) ContentHandler(org.xml.sax.ContentHandler) OfflineContentHandler(org.apache.tika.sax.OfflineContentHandler) BufferedInputStream(java.io.BufferedInputStream) ZipArchiveEntry(org.apache.commons.compress.archivers.zip.ZipArchiveEntry) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream)

Aggregations

ZipArchiveEntry (org.apache.commons.compress.archivers.zip.ZipArchiveEntry)46 ZipFile (org.apache.commons.compress.archivers.zip.ZipFile)21 IOException (java.io.IOException)13 File (java.io.File)12 FileInputStream (java.io.FileInputStream)10 InputStream (java.io.InputStream)10 Path (java.nio.file.Path)10 Test (org.junit.Test)8 BufferedInputStream (java.io.BufferedInputStream)7 ZipArchiveInputStream (org.apache.commons.compress.archivers.zip.ZipArchiveInputStream)7 ZipArchiveOutputStream (org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream)6 FileOutputStream (java.io.FileOutputStream)5 ArrayList (java.util.ArrayList)5 ZipInputStream (java.util.zip.ZipInputStream)5 ImageInfo (com.github.hmdev.info.ImageInfo)4 SectionInfo (com.github.hmdev.info.SectionInfo)4 BufferedWriter (java.io.BufferedWriter)4 ByteArrayInputStream (java.io.ByteArrayInputStream)4 ByteArrayOutputStream (java.io.ByteArrayOutputStream)4 FileNotFoundException (java.io.FileNotFoundException)4