Search in sources :

Example 1 with OLEShape

use of org.apache.poi.hslf.model.OLEShape in project poi by apache.

the class DataExtraction method main.

public static void main(String[] args) throws Exception {
    if (args.length == 0) {
        usage();
        return;
    }
    FileInputStream is = new FileInputStream(args[0]);
    HSLFSlideShow ppt = new HSLFSlideShow(is);
    is.close();
    //extract all sound files embedded in this presentation
    HSLFSoundData[] sound = ppt.getSoundData();
    for (int i = 0; i < sound.length; i++) {
        //*.wav
        String type = sound[i].getSoundType();
        //typically file name
        String name = sound[i].getSoundName();
        //raw bytes
        byte[] data = sound[i].getData();
        //save the sound  on disk
        FileOutputStream out = new FileOutputStream(name + type);
        out.write(data);
        out.close();
    }
    int oleIdx = -1, picIdx = -1;
    for (HSLFSlide slide : ppt.getSlides()) {
        //extract embedded OLE documents
        for (HSLFShape shape : slide.getShapes()) {
            if (shape instanceof OLEShape) {
                oleIdx++;
                OLEShape ole = (OLEShape) shape;
                HSLFObjectData data = ole.getObjectData();
                String name = ole.getInstanceName();
                if ("Worksheet".equals(name)) {
                    //read xls
                    @SuppressWarnings({ "unused", "resource" }) HSSFWorkbook wb = new HSSFWorkbook(data.getData());
                } else if ("Document".equals(name)) {
                    HWPFDocument doc = new HWPFDocument(data.getData());
                    //read the word document
                    Range r = doc.getRange();
                    for (int k = 0; k < r.numParagraphs(); k++) {
                        Paragraph p = r.getParagraph(k);
                        System.out.println(p.text());
                    }
                    //save on disk
                    FileOutputStream out = new FileOutputStream(name + "-(" + (oleIdx) + ").doc");
                    doc.write(out);
                    out.close();
                    doc.close();
                } else {
                    FileOutputStream out = new FileOutputStream(ole.getProgID() + "-" + (oleIdx + 1) + ".dat");
                    InputStream dis = data.getData();
                    byte[] chunk = new byte[2048];
                    int count;
                    while ((count = dis.read(chunk)) >= 0) {
                        out.write(chunk, 0, count);
                    }
                    is.close();
                    out.close();
                }
            } else //Pictures
            if (shape instanceof HSLFPictureShape) {
                picIdx++;
                HSLFPictureShape p = (HSLFPictureShape) shape;
                HSLFPictureData data = p.getPictureData();
                String ext = data.getType().extension;
                FileOutputStream out = new FileOutputStream("pict-" + picIdx + ext);
                out.write(data.getData());
                out.close();
            }
        }
    }
    ppt.close();
}
Also used : FileInputStream(java.io.FileInputStream) InputStream(java.io.InputStream) HSLFObjectData(org.apache.poi.hslf.usermodel.HSLFObjectData) Range(org.apache.poi.hwpf.usermodel.Range) HSLFSlideShow(org.apache.poi.hslf.usermodel.HSLFSlideShow) FileInputStream(java.io.FileInputStream) OLEShape(org.apache.poi.hslf.model.OLEShape) HSSFWorkbook(org.apache.poi.hssf.usermodel.HSSFWorkbook) Paragraph(org.apache.poi.hwpf.usermodel.Paragraph) HWPFDocument(org.apache.poi.hwpf.HWPFDocument) HSLFShape(org.apache.poi.hslf.usermodel.HSLFShape) HSLFPictureShape(org.apache.poi.hslf.usermodel.HSLFPictureShape) FileOutputStream(java.io.FileOutputStream) HSLFSoundData(org.apache.poi.hslf.usermodel.HSLFSoundData) HSLFPictureData(org.apache.poi.hslf.usermodel.HSLFPictureData) HSLFSlide(org.apache.poi.hslf.usermodel.HSLFSlide)

Example 2 with OLEShape

use of org.apache.poi.hslf.model.OLEShape in project poi by apache.

the class TestExtractor method test52991.

/**
     * A powerpoint file with embeded powerpoint files
     */
@Test
public void test52991() throws IOException {
    PowerPointExtractor ppe = openExtractor("badzip.ppt");
    for (OLEShape shape : ppe.getOLEShapes()) {
        IOUtils.copy(shape.getObjectData().getData(), new ByteArrayOutputStream());
    }
    ppe.close();
}
Also used : ByteArrayOutputStream(java.io.ByteArrayOutputStream) OLEShape(org.apache.poi.hslf.model.OLEShape) Test(org.junit.Test)

Example 3 with OLEShape

use of org.apache.poi.hslf.model.OLEShape in project poi by apache.

the class HSLFShapeFactory method createFrame.

private static HSLFShape createFrame(EscherContainerRecord spContainer, ShapeContainer<HSLFShape, HSLFTextParagraph> parent) {
    InteractiveInfo info = getClientDataRecord(spContainer, RecordTypes.InteractiveInfo.typeID);
    if (info != null && info.getInteractiveInfoAtom() != null) {
        switch(info.getInteractiveInfoAtom().getAction()) {
            case InteractiveInfoAtom.ACTION_OLE:
                return new OLEShape(spContainer, parent);
            case InteractiveInfoAtom.ACTION_MEDIA:
                return new MovieShape(spContainer, parent);
            default:
                break;
        }
    }
    ExObjRefAtom oes = getClientDataRecord(spContainer, RecordTypes.ExObjRefAtom.typeID);
    return (oes != null) ? new OLEShape(spContainer, parent) : new HSLFPictureShape(spContainer, parent);
}
Also used : ExObjRefAtom(org.apache.poi.hslf.record.ExObjRefAtom) MovieShape(org.apache.poi.hslf.model.MovieShape) InteractiveInfo(org.apache.poi.hslf.record.InteractiveInfo) OLEShape(org.apache.poi.hslf.model.OLEShape)

Example 4 with OLEShape

use of org.apache.poi.hslf.model.OLEShape in project tika by apache.

the class HSLFExtractor method handleSlideEmbeddedResources.

private void handleSlideEmbeddedResources(HSLFSlide slide, XHTMLContentHandler xhtml) throws TikaException, SAXException, IOException {
    List<HSLFShape> shapes;
    try {
        shapes = slide.getShapes();
    } catch (NullPointerException e) {
        // Sometimes HSLF hits problems
        // Please open POI bugs for any you come across!
        EmbeddedDocumentUtil.recordEmbeddedStreamException(e, parentMetadata);
        return;
    }
    for (HSLFShape shape : shapes) {
        if (shape instanceof OLEShape) {
            OLEShape oleShape = (OLEShape) shape;
            HSLFObjectData data = null;
            try {
                data = oleShape.getObjectData();
            } catch (NullPointerException e) {
                /* getObjectData throws NPE some times. */
                EmbeddedDocumentUtil.recordEmbeddedStreamException(e, parentMetadata);
                continue;
            }
            if (data != null) {
                String objID = Integer.toString(oleShape.getObjectID());
                // Embedded Object: add a <div
                // class="embedded" id="X"/> so consumer can see where
                // in the main text each embedded document
                // occurred:
                AttributesImpl attributes = new AttributesImpl();
                attributes.addAttribute("", "class", "class", "CDATA", "embedded");
                attributes.addAttribute("", "id", "id", "CDATA", objID);
                xhtml.startElement("div", attributes);
                xhtml.endElement("div");
                InputStream dataStream = null;
                try {
                    dataStream = data.getData();
                } catch (Exception e) {
                    EmbeddedDocumentUtil.recordEmbeddedStreamException(e, parentMetadata);
                    continue;
                }
                try (TikaInputStream stream = TikaInputStream.get(dataStream)) {
                    String mediaType = null;
                    if ("Excel.Chart.8".equals(oleShape.getProgID())) {
                        mediaType = "application/vnd.ms-excel";
                    } else {
                        MediaType mt = getTikaConfig().getDetector().detect(stream, new Metadata());
                        mediaType = mt.toString();
                    }
                    if (mediaType.equals("application/x-tika-msoffice-embedded; format=comp_obj")) {
                        try (NPOIFSFileSystem npoifs = new NPOIFSFileSystem(new CloseShieldInputStream(stream))) {
                            handleEmbeddedOfficeDoc(npoifs.getRoot(), objID, xhtml);
                        }
                    } else {
                        handleEmbeddedResource(stream, objID, objID, mediaType, xhtml, false);
                    }
                } catch (IOException e) {
                    EmbeddedDocumentUtil.recordEmbeddedStreamException(e, parentMetadata);
                }
            }
        }
    }
}
Also used : TikaInputStream(org.apache.tika.io.TikaInputStream) CloseShieldInputStream(org.apache.tika.io.CloseShieldInputStream) InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata) TikaInputStream(org.apache.tika.io.TikaInputStream) IOException(java.io.IOException) HSLFObjectData(org.apache.poi.hslf.usermodel.HSLFObjectData) OLEShape(org.apache.poi.hslf.model.OLEShape) TikaException(org.apache.tika.exception.TikaException) IOException(java.io.IOException) SAXException(org.xml.sax.SAXException) NPOIFSFileSystem(org.apache.poi.poifs.filesystem.NPOIFSFileSystem) HSLFShape(org.apache.poi.hslf.usermodel.HSLFShape) AttributesImpl(org.xml.sax.helpers.AttributesImpl) MediaType(org.apache.tika.mime.MediaType) CloseShieldInputStream(org.apache.tika.io.CloseShieldInputStream)

Example 5 with OLEShape

use of org.apache.poi.hslf.model.OLEShape in project poi by apache.

the class TestExtractor method testExtractFromOwnEmbeded.

/**
     * A powerpoint file with embeded powerpoint files
     */
@Test
public void testExtractFromOwnEmbeded() throws IOException {
    PowerPointExtractor ppe = openExtractor("ppt_with_embeded.ppt");
    List<OLEShape> shapes = ppe.getOLEShapes();
    assertEquals("Expected 6 ole shapes", 6, shapes.size());
    int num_ppt = 0, num_doc = 0, num_xls = 0;
    for (OLEShape ole : shapes) {
        String name = ole.getInstanceName();
        InputStream data = ole.getObjectData().getData();
        if ("Worksheet".equals(name)) {
            HSSFWorkbook wb = new HSSFWorkbook(data);
            num_xls++;
            wb.close();
        } else if ("Document".equals(name)) {
            HWPFDocument doc = new HWPFDocument(data);
            num_doc++;
            doc.close();
        } else if ("Presentation".equals(name)) {
            num_ppt++;
            HSLFSlideShow ppt = new HSLFSlideShow(data);
            ppt.close();
        }
        data.close();
    }
    assertEquals("Expected 2 embedded Word Documents", 2, num_doc);
    assertEquals("Expected 2 embedded Excel Spreadsheets", 2, num_xls);
    assertEquals("Expected 2 embedded PowerPoint Presentations", 2, num_ppt);
    ppe.close();
}
Also used : HWPFDocument(org.apache.poi.hwpf.HWPFDocument) FileInputStream(java.io.FileInputStream) InputStream(java.io.InputStream) HSLFSlideShow(org.apache.poi.hslf.usermodel.HSLFSlideShow) OLEShape(org.apache.poi.hslf.model.OLEShape) HSSFWorkbook(org.apache.poi.hssf.usermodel.HSSFWorkbook) Test(org.junit.Test)

Aggregations

OLEShape (org.apache.poi.hslf.model.OLEShape)5 InputStream (java.io.InputStream)3 FileInputStream (java.io.FileInputStream)2 HSLFObjectData (org.apache.poi.hslf.usermodel.HSLFObjectData)2 HSLFShape (org.apache.poi.hslf.usermodel.HSLFShape)2 HSLFSlideShow (org.apache.poi.hslf.usermodel.HSLFSlideShow)2 HSSFWorkbook (org.apache.poi.hssf.usermodel.HSSFWorkbook)2 HWPFDocument (org.apache.poi.hwpf.HWPFDocument)2 Test (org.junit.Test)2 ByteArrayOutputStream (java.io.ByteArrayOutputStream)1 FileOutputStream (java.io.FileOutputStream)1 IOException (java.io.IOException)1 MovieShape (org.apache.poi.hslf.model.MovieShape)1 ExObjRefAtom (org.apache.poi.hslf.record.ExObjRefAtom)1 InteractiveInfo (org.apache.poi.hslf.record.InteractiveInfo)1 HSLFPictureData (org.apache.poi.hslf.usermodel.HSLFPictureData)1 HSLFPictureShape (org.apache.poi.hslf.usermodel.HSLFPictureShape)1 HSLFSlide (org.apache.poi.hslf.usermodel.HSLFSlide)1 HSLFSoundData (org.apache.poi.hslf.usermodel.HSLFSoundData)1 Paragraph (org.apache.poi.hwpf.usermodel.Paragraph)1