Search in sources :

Example 1 with VBAMacroReader

use of org.apache.poi.poifs.macros.VBAMacroReader in project poi by apache.

the class TestBugs method getMacrosFromHSLF.

//It isn't pretty, but it works...
private Map<String, String> getMacrosFromHSLF(String fileName) throws IOException {
    InputStream is = null;
    NPOIFSFileSystem npoifs = null;
    try {
        is = new FileInputStream(POIDataSamples.getSlideShowInstance().getFile(fileName));
        npoifs = new NPOIFSFileSystem(is);
        //TODO: should we run the VBAMacroReader on this npoifs?
        //TBD: We know that ppt typically don't store macros in the regular place,
        //but _can_ they?
        HSLFSlideShow ppt = new HSLFSlideShow(npoifs);
        //get macro persist id
        DocInfoListContainer list = (DocInfoListContainer) ppt.getDocumentRecord().findFirstOfType(RecordTypes.List.typeID);
        VBAInfoContainer vbaInfo = (VBAInfoContainer) list.findFirstOfType(RecordTypes.VBAInfo.typeID);
        VBAInfoAtom vbaAtom = (VBAInfoAtom) vbaInfo.findFirstOfType(RecordTypes.VBAInfoAtom.typeID);
        long persistId = vbaAtom.getPersistIdRef();
        for (HSLFObjectData objData : ppt.getEmbeddedObjects()) {
            if (objData.getExOleObjStg().getPersistId() == persistId) {
                VBAMacroReader mr = new VBAMacroReader(objData.getData());
                try {
                    return mr.readMacros();
                } finally {
                    mr.close();
                }
            }
        }
        ppt.close();
    } finally {
        IOUtils.closeQuietly(npoifs);
        IOUtils.closeQuietly(is);
    }
    return null;
}
Also used : VBAInfoContainer(org.apache.poi.hslf.record.VBAInfoContainer) NPOIFSFileSystem(org.apache.poi.poifs.filesystem.NPOIFSFileSystem) VBAInfoAtom(org.apache.poi.hslf.record.VBAInfoAtom) FileInputStream(java.io.FileInputStream) InputStream(java.io.InputStream) VBAMacroReader(org.apache.poi.poifs.macros.VBAMacroReader) DocInfoListContainer(org.apache.poi.hslf.record.DocInfoListContainer) FileInputStream(java.io.FileInputStream)

Example 2 with VBAMacroReader

use of org.apache.poi.poifs.macros.VBAMacroReader in project tika by apache.

the class OfficeParser method extractMacros.

/**
     * Helper to extract macros from an NPOIFS/vbaProject.bin
     *
     * As of POI-3.15-final, there are still some bugs in VBAMacroReader.
     * For now, we are swallowing NPE and other runtime exceptions
     *
     * @param fs NPOIFS to extract from
     * @param xhtml SAX writer
     * @param embeddedDocumentExtractor extractor for embedded documents
     * @throws IOException on IOException if it occurs during the extraction of the embedded doc
     * @throws SAXException on SAXException for writing to xhtml
     */
public static void extractMacros(NPOIFSFileSystem fs, ContentHandler xhtml, EmbeddedDocumentExtractor embeddedDocumentExtractor) throws IOException, SAXException {
    VBAMacroReader reader = null;
    Map<String, String> macros = null;
    try {
        reader = new VBAMacroReader(fs);
        macros = reader.readMacros();
    } catch (Exception e) {
        //swallow
        return;
    }
    for (Map.Entry<String, String> e : macros.entrySet()) {
        Metadata m = new Metadata();
        m.set(Metadata.EMBEDDED_RESOURCE_TYPE, TikaCoreProperties.EmbeddedResourceType.MACRO.toString());
        m.set(Metadata.CONTENT_TYPE, "text/x-vbasic");
        if (embeddedDocumentExtractor.shouldParseEmbedded(m)) {
            embeddedDocumentExtractor.parseEmbedded(new ByteArrayInputStream(e.getValue().getBytes(StandardCharsets.UTF_8)), xhtml, m, true);
        }
    }
}
Also used : ByteArrayInputStream(java.io.ByteArrayInputStream) VBAMacroReader(org.apache.poi.poifs.macros.VBAMacroReader) Metadata(org.apache.tika.metadata.Metadata) Map(java.util.Map) GeneralSecurityException(java.security.GeneralSecurityException) TikaException(org.apache.tika.exception.TikaException) IOException(java.io.IOException) SAXException(org.xml.sax.SAXException) EncryptedDocumentException(org.apache.tika.exception.EncryptedDocumentException)

Aggregations

VBAMacroReader (org.apache.poi.poifs.macros.VBAMacroReader)2 ByteArrayInputStream (java.io.ByteArrayInputStream)1 FileInputStream (java.io.FileInputStream)1 IOException (java.io.IOException)1 InputStream (java.io.InputStream)1 GeneralSecurityException (java.security.GeneralSecurityException)1 Map (java.util.Map)1 DocInfoListContainer (org.apache.poi.hslf.record.DocInfoListContainer)1 VBAInfoAtom (org.apache.poi.hslf.record.VBAInfoAtom)1 VBAInfoContainer (org.apache.poi.hslf.record.VBAInfoContainer)1 NPOIFSFileSystem (org.apache.poi.poifs.filesystem.NPOIFSFileSystem)1 EncryptedDocumentException (org.apache.tika.exception.EncryptedDocumentException)1 TikaException (org.apache.tika.exception.TikaException)1 Metadata (org.apache.tika.metadata.Metadata)1 SAXException (org.xml.sax.SAXException)1