Search in sources :

Example 16 with Entry

use of org.apache.poi.poifs.filesystem.Entry in project tika by apache.

the class POIFSContainerDetector method processCompObjFormatType.

/**
     * Is this one of the kinds of formats which uses CompObj to
     * store all of their data, eg Star Draw, Star Impress or
     * (older) Works?
     * If not, it's likely an embedded resource
     */
private static MediaType processCompObjFormatType(DirectoryEntry root) {
    try {
        Entry e = root.getEntry("CompObj");
        if (e != null && e.isDocumentEntry()) {
            DocumentNode dn = (DocumentNode) e;
            DocumentInputStream stream = new DocumentInputStream(dn);
            byte[] bytes = IOUtils.toByteArray(stream);
            /*
                 * This array contains a string with a normal ASCII name of the
                 * application used to create this file. We want to search for that
                 * name.
                 */
            if (arrayContains(bytes, MS_GRAPH_CHART_BYTES)) {
                return MS_GRAPH_CHART;
            } else if (arrayContains(bytes, STAR_DRAW)) {
                return SDA;
            } else if (arrayContains(bytes, STAR_IMPRESS)) {
                return SDD;
            } else if (arrayContains(bytes, WORKS_QUILL96)) {
                return WPS;
            }
        }
    } catch (Exception e) {
    /*
             * "root.getEntry" can throw FileNotFoundException. The code inside
             * "if" can throw IOExceptions. Theoretically. Practically no
             * exceptions will likely ever appear.
             *
             * Swallow all of them. If any occur, we just assume that we can't
             * distinguish between Draw and Impress and return something safe:
             * x-tika-msoffice
             */
    }
    return OLE;
}
Also used : Entry(org.apache.poi.poifs.filesystem.Entry) DirectoryEntry(org.apache.poi.poifs.filesystem.DirectoryEntry) DocumentNode(org.apache.poi.poifs.filesystem.DocumentNode) DocumentInputStream(org.apache.poi.poifs.filesystem.DocumentInputStream) IOException(java.io.IOException)

Example 17 with Entry

use of org.apache.poi.poifs.filesystem.Entry in project OpenOLAT by OpenOLAT.

the class WordDocument method readContent.

@Override
protected FileContent readContent(VFSLeaf leaf) throws IOException, DocumentException {
    LimitedContentWriter sb = new LimitedContentWriter((int) leaf.getSize(), FileDocumentFactory.getMaxFileSize());
    try (InputStream bis = new BufferedInputStream(leaf.getInputStream())) {
        POIFSFileSystem filesystem = new POIFSFileSystem(bis);
        Iterator<?> entries = filesystem.getRoot().getEntries();
        while (entries.hasNext()) {
            Entry entry = (Entry) entries.next();
            String name = entry.getName();
            if (!(entry instanceof DocumentEntry)) {
            // Skip directory entries
            } else if ("WordDocument".equals(name)) {
                collectWordDocument(leaf, filesystem, sb);
            }
        }
        return new FileContent(sb.toString());
    } catch (Exception e) {
        log.warn("could not read in word document: " + leaf + " please check, that this is not an docx/rtf/html file!");
        throw new DocumentException(e.getMessage());
    }
}
Also used : LimitedContentWriter(org.olat.core.util.io.LimitedContentWriter) Entry(org.apache.poi.poifs.filesystem.Entry) DocumentEntry(org.apache.poi.poifs.filesystem.DocumentEntry) BufferedInputStream(java.io.BufferedInputStream) BufferedInputStream(java.io.BufferedInputStream) InputStream(java.io.InputStream) POIFSFileSystem(org.apache.poi.poifs.filesystem.POIFSFileSystem) DocumentEntry(org.apache.poi.poifs.filesystem.DocumentEntry) IOException(java.io.IOException) OldWordFileFormatException(org.apache.poi.hwpf.OldWordFileFormatException)

Example 18 with Entry

use of org.apache.poi.poifs.filesystem.Entry in project openolat by klemens.

the class WordDocument method readContent.

@Override
protected FileContent readContent(VFSLeaf leaf) throws IOException, DocumentException {
    LimitedContentWriter sb = new LimitedContentWriter((int) leaf.getSize(), FileDocumentFactory.getMaxFileSize());
    try (InputStream bis = new BufferedInputStream(leaf.getInputStream())) {
        POIFSFileSystem filesystem = new POIFSFileSystem(bis);
        Iterator<?> entries = filesystem.getRoot().getEntries();
        while (entries.hasNext()) {
            Entry entry = (Entry) entries.next();
            String name = entry.getName();
            if (!(entry instanceof DocumentEntry)) {
            // Skip directory entries
            } else if ("WordDocument".equals(name)) {
                collectWordDocument(leaf, filesystem, sb);
            }
        }
        return new FileContent(sb.toString());
    } catch (Exception e) {
        log.warn("could not read in word document: " + leaf + " please check, that this is not an docx/rtf/html file!");
        throw new DocumentException(e.getMessage());
    }
}
Also used : LimitedContentWriter(org.olat.core.util.io.LimitedContentWriter) Entry(org.apache.poi.poifs.filesystem.Entry) DocumentEntry(org.apache.poi.poifs.filesystem.DocumentEntry) BufferedInputStream(java.io.BufferedInputStream) BufferedInputStream(java.io.BufferedInputStream) InputStream(java.io.InputStream) POIFSFileSystem(org.apache.poi.poifs.filesystem.POIFSFileSystem) DocumentEntry(org.apache.poi.poifs.filesystem.DocumentEntry) IOException(java.io.IOException) OldWordFileFormatException(org.apache.poi.hwpf.OldWordFileFormatException)

Example 19 with Entry

use of org.apache.poi.poifs.filesystem.Entry in project poi by apache.

the class OLE2ExtractorFactory method getEmbededDocsTextExtractors.

/**
     * Returns an array of text extractors, one for each of
     *  the embedded documents in the file (if there are any).
     * If there are no embedded documents, you'll get back an
     *  empty array. Otherwise, you'll get one open
     *  {@link POITextExtractor} for each embedded file.
     */
public static POITextExtractor[] getEmbededDocsTextExtractors(POIOLE2TextExtractor ext) throws IOException {
    // All the embedded directories we spotted
    List<Entry> dirs = new ArrayList<Entry>();
    // For anything else not directly held in as a POIFS directory
    List<InputStream> nonPOIFS = new ArrayList<InputStream>();
    // Find all the embedded directories
    DirectoryEntry root = ext.getRoot();
    if (root == null) {
        throw new IllegalStateException("The extractor didn't know which POIFS it came from!");
    }
    if (ext instanceof ExcelExtractor) {
        // These are in MBD... under the root
        Iterator<Entry> it = root.getEntries();
        while (it.hasNext()) {
            Entry entry = it.next();
            if (entry.getName().startsWith("MBD")) {
                dirs.add(entry);
            }
        }
    } else {
        // Ask Scratchpad, or fail trying
        Class<?> cls = getScratchpadClass();
        try {
            Method m = cls.getDeclaredMethod("identifyEmbeddedResources", POIOLE2TextExtractor.class, List.class, List.class);
            m.invoke(null, ext, dirs, nonPOIFS);
        } catch (Exception e) {
            throw new IllegalArgumentException("Error checking for Scratchpad embedded resources", e);
        }
    }
    // Create the extractors
    if (dirs.size() == 0 && nonPOIFS.size() == 0) {
        return new POITextExtractor[0];
    }
    ArrayList<POITextExtractor> e = new ArrayList<POITextExtractor>();
    for (Entry dir : dirs) {
        e.add(createExtractor((DirectoryNode) dir));
    }
    for (InputStream nonPOIF : nonPOIFS) {
        try {
            e.add(createExtractor(nonPOIF));
        } catch (IllegalArgumentException ie) {
            // Ignore, just means it didn't contain
            //  a format we support as yet
            LOGGER.log(POILogger.WARN, ie);
        } catch (Exception xe) {
            // Ignore, invalid format
            LOGGER.log(POILogger.WARN, xe);
        }
    }
    return e.toArray(new POITextExtractor[e.size()]);
}
Also used : InputStream(java.io.InputStream) ArrayList(java.util.ArrayList) DirectoryNode(org.apache.poi.poifs.filesystem.DirectoryNode) Method(java.lang.reflect.Method) DirectoryEntry(org.apache.poi.poifs.filesystem.DirectoryEntry) IOException(java.io.IOException) OldExcelFormatException(org.apache.poi.hssf.OldExcelFormatException) Entry(org.apache.poi.poifs.filesystem.Entry) DirectoryEntry(org.apache.poi.poifs.filesystem.DirectoryEntry) POITextExtractor(org.apache.poi.POITextExtractor) ExcelExtractor(org.apache.poi.hssf.extractor.ExcelExtractor) EventBasedExcelExtractor(org.apache.poi.hssf.extractor.EventBasedExcelExtractor)

Example 20 with Entry

use of org.apache.poi.poifs.filesystem.Entry in project poi by apache.

the class PropertySet method write.

/**
     * Writes a property set to a document in a POI filesystem directory.
     *
     * @param dir The directory in the POI filesystem to write the document to.
     * @param name The document's name. If there is already a document with the
     * same name in the directory the latter will be overwritten.
     *
     * @throws WritingNotSupportedException if the filesystem doesn't support writing
     * @throws IOException if the old entry can't be deleted or the new entry be written
     */
public void write(final DirectoryEntry dir, final String name) throws WritingNotSupportedException, IOException {
    /* If there is already an entry with the same name, remove it. */
    if (dir.hasEntry(name)) {
        final Entry e = dir.getEntry(name);
        e.delete();
    }
    /* Create the new entry. */
    dir.createDocument(name, toInputStream());
}
Also used : Entry(org.apache.poi.poifs.filesystem.Entry) DirectoryEntry(org.apache.poi.poifs.filesystem.DirectoryEntry)

Aggregations

Entry (org.apache.poi.poifs.filesystem.Entry)24 DirectoryEntry (org.apache.poi.poifs.filesystem.DirectoryEntry)12 IOException (java.io.IOException)9 DirectoryNode (org.apache.poi.poifs.filesystem.DirectoryNode)9 FileNotFoundException (java.io.FileNotFoundException)6 InputStream (java.io.InputStream)6 DocumentEntry (org.apache.poi.poifs.filesystem.DocumentEntry)6 DocumentInputStream (org.apache.poi.poifs.filesystem.DocumentInputStream)6 DocumentNode (org.apache.poi.poifs.filesystem.DocumentNode)4 POIFSFileSystem (org.apache.poi.poifs.filesystem.POIFSFileSystem)4 ArrayList (java.util.ArrayList)3 AttachmentChunks (org.apache.poi.hsmf.datatypes.AttachmentChunks)3 HWPFDocument (org.apache.poi.hwpf.HWPFDocument)3 OldWordFileFormatException (org.apache.poi.hwpf.OldWordFileFormatException)3 BufferedInputStream (java.io.BufferedInputStream)2 ByteArrayInputStream (java.io.ByteArrayInputStream)2 FileInputStream (java.io.FileInputStream)2 POITextExtractor (org.apache.poi.POITextExtractor)2 HSLFSlideShow (org.apache.poi.hslf.usermodel.HSLFSlideShow)2 MAPIMessage (org.apache.poi.hsmf.MAPIMessage)2