Search in sources :

Example 1 with HWPFDocument

use of org.apache.poi.hwpf.HWPFDocument in project poi by apache.

the class DataExtraction method main.

public static void main(String[] args) throws Exception {
    if (args.length == 0) {
        usage();
        return;
    }
    FileInputStream is = new FileInputStream(args[0]);
    HSLFSlideShow ppt = new HSLFSlideShow(is);
    is.close();
    //extract all sound files embedded in this presentation
    HSLFSoundData[] sound = ppt.getSoundData();
    for (int i = 0; i < sound.length; i++) {
        //*.wav
        String type = sound[i].getSoundType();
        //typically file name
        String name = sound[i].getSoundName();
        //raw bytes
        byte[] data = sound[i].getData();
        //save the sound  on disk
        FileOutputStream out = new FileOutputStream(name + type);
        out.write(data);
        out.close();
    }
    int oleIdx = -1, picIdx = -1;
    for (HSLFSlide slide : ppt.getSlides()) {
        //extract embedded OLE documents
        for (HSLFShape shape : slide.getShapes()) {
            if (shape instanceof OLEShape) {
                oleIdx++;
                OLEShape ole = (OLEShape) shape;
                HSLFObjectData data = ole.getObjectData();
                String name = ole.getInstanceName();
                if ("Worksheet".equals(name)) {
                    //read xls
                    @SuppressWarnings({ "unused", "resource" }) HSSFWorkbook wb = new HSSFWorkbook(data.getData());
                } else if ("Document".equals(name)) {
                    HWPFDocument doc = new HWPFDocument(data.getData());
                    //read the word document
                    Range r = doc.getRange();
                    for (int k = 0; k < r.numParagraphs(); k++) {
                        Paragraph p = r.getParagraph(k);
                        System.out.println(p.text());
                    }
                    //save on disk
                    FileOutputStream out = new FileOutputStream(name + "-(" + (oleIdx) + ").doc");
                    doc.write(out);
                    out.close();
                    doc.close();
                } else {
                    FileOutputStream out = new FileOutputStream(ole.getProgID() + "-" + (oleIdx + 1) + ".dat");
                    InputStream dis = data.getData();
                    byte[] chunk = new byte[2048];
                    int count;
                    while ((count = dis.read(chunk)) >= 0) {
                        out.write(chunk, 0, count);
                    }
                    is.close();
                    out.close();
                }
            } else //Pictures
            if (shape instanceof HSLFPictureShape) {
                picIdx++;
                HSLFPictureShape p = (HSLFPictureShape) shape;
                HSLFPictureData data = p.getPictureData();
                String ext = data.getType().extension;
                FileOutputStream out = new FileOutputStream("pict-" + picIdx + ext);
                out.write(data.getData());
                out.close();
            }
        }
    }
    ppt.close();
}
Also used : FileInputStream(java.io.FileInputStream) InputStream(java.io.InputStream) HSLFObjectData(org.apache.poi.hslf.usermodel.HSLFObjectData) Range(org.apache.poi.hwpf.usermodel.Range) HSLFSlideShow(org.apache.poi.hslf.usermodel.HSLFSlideShow) FileInputStream(java.io.FileInputStream) OLEShape(org.apache.poi.hslf.model.OLEShape) HSSFWorkbook(org.apache.poi.hssf.usermodel.HSSFWorkbook) Paragraph(org.apache.poi.hwpf.usermodel.Paragraph) HWPFDocument(org.apache.poi.hwpf.HWPFDocument) HSLFShape(org.apache.poi.hslf.usermodel.HSLFShape) HSLFPictureShape(org.apache.poi.hslf.usermodel.HSLFPictureShape) FileOutputStream(java.io.FileOutputStream) HSLFSoundData(org.apache.poi.hslf.usermodel.HSLFSoundData) HSLFPictureData(org.apache.poi.hslf.usermodel.HSLFPictureData) HSLFSlide(org.apache.poi.hslf.usermodel.HSLFSlide)

Example 2 with HWPFDocument

use of org.apache.poi.hwpf.HWPFDocument in project poi by apache.

the class LoadEmbedded method loadEmbedded.

public static void loadEmbedded(XSSFWorkbook workbook) throws IOException, InvalidFormatException, OpenXML4JException, XmlException {
    for (PackagePart pPart : workbook.getAllEmbedds()) {
        String contentType = pPart.getContentType();
        if (contentType.equals("application/vnd.ms-excel")) {
            // Excel Workbook - either binary or OpenXML
            HSSFWorkbook embeddedWorkbook = new HSSFWorkbook(pPart.getInputStream());
            embeddedWorkbook.close();
        } else if (contentType.equals("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")) {
            // Excel Workbook - OpenXML file format
            XSSFWorkbook embeddedWorkbook = new XSSFWorkbook(pPart.getInputStream());
            embeddedWorkbook.close();
        } else if (contentType.equals("application/msword")) {
            // Word Document - binary (OLE2CDF) file format
            HWPFDocument document = new HWPFDocument(pPart.getInputStream());
            document.close();
        } else if (contentType.equals("application/vnd.openxmlformats-officedocument.wordprocessingml.document")) {
            // Word Document - OpenXML file format
            XWPFDocument document = new XWPFDocument(pPart.getInputStream());
            document.close();
        } else if (contentType.equals("application/vnd.ms-powerpoint")) {
            // PowerPoint Document - binary file format
            HSLFSlideShow slideShow = new HSLFSlideShow(pPart.getInputStream());
            slideShow.close();
        } else if (contentType.equals("application/vnd.openxmlformats-officedocument.presentationml.presentation")) {
            // PowerPoint Document - OpenXML file format
            XMLSlideShow slideShow = new XMLSlideShow(pPart.getInputStream());
            slideShow.close();
        } else {
            // Any other type of embedded object.
            System.out.println("Unknown Embedded Document: " + contentType);
            InputStream inputStream = pPart.getInputStream();
            inputStream.close();
        }
    }
}
Also used : HWPFDocument(org.apache.poi.hwpf.HWPFDocument) InputStream(java.io.InputStream) XMLSlideShow(org.apache.poi.xslf.usermodel.XMLSlideShow) XSSFWorkbook(org.apache.poi.xssf.usermodel.XSSFWorkbook) XWPFDocument(org.apache.poi.xwpf.usermodel.XWPFDocument) PackagePart(org.apache.poi.openxml4j.opc.PackagePart) HSLFSlideShow(org.apache.poi.hslf.usermodel.HSLFSlideShow) HSSFWorkbook(org.apache.poi.hssf.usermodel.HSSFWorkbook)

Example 3 with HWPFDocument

use of org.apache.poi.hwpf.HWPFDocument in project poi by apache.

the class EmbeddedObjects method main.

public static void main(String[] args) throws Exception {
    XSSFWorkbook workbook = new XSSFWorkbook(args[0]);
    for (PackagePart pPart : workbook.getAllEmbedds()) {
        String contentType = pPart.getContentType();
        InputStream is = pPart.getInputStream();
        Closeable document;
        if (contentType.equals("application/vnd.ms-excel")) {
            // Excel Workbook - either binary or OpenXML
            document = new HSSFWorkbook(is);
        } else if (contentType.equals("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")) {
            // Excel Workbook - OpenXML file format
            document = new XSSFWorkbook(is);
        } else if (contentType.equals("application/msword")) {
            // Word Document - binary (OLE2CDF) file format
            document = new HWPFDocument(is);
        } else if (contentType.equals("application/vnd.openxmlformats-officedocument.wordprocessingml.document")) {
            // Word Document - OpenXML file format
            document = new XWPFDocument(is);
        } else if (contentType.equals("application/vnd.ms-powerpoint")) {
            // PowerPoint Document - binary file format
            document = new HSLFSlideShow(is);
        } else if (contentType.equals("application/vnd.openxmlformats-officedocument.presentationml.presentation")) {
            // PowerPoint Document - OpenXML file format
            document = new XMLSlideShow(is);
        } else {
            // Any other type of embedded object.
            document = is;
        }
        document.close();
        is.close();
    }
    workbook.close();
}
Also used : HWPFDocument(org.apache.poi.hwpf.HWPFDocument) InputStream(java.io.InputStream) Closeable(java.io.Closeable) XMLSlideShow(org.apache.poi.xslf.usermodel.XMLSlideShow) XSSFWorkbook(org.apache.poi.xssf.usermodel.XSSFWorkbook) XWPFDocument(org.apache.poi.xwpf.usermodel.XWPFDocument) PackagePart(org.apache.poi.openxml4j.opc.PackagePart) HSLFSlideShow(org.apache.poi.hslf.usermodel.HSLFSlideShow) HSSFWorkbook(org.apache.poi.hssf.usermodel.HSSFWorkbook)

Example 4 with HWPFDocument

use of org.apache.poi.hwpf.HWPFDocument in project poi by apache.

the class CharacterRun method getDropDownListDefaultItemIndex.

public Integer getDropDownListDefaultItemIndex() {
    if (getDocument() instanceof HWPFDocument) {
        char c = _text.charAt(_start);
        if (c == 0x01) {
            NilPICFAndBinData data = new NilPICFAndBinData(((HWPFDocument) getDocument()).getDataStream(), getPicOffset());
            FFData ffData = new FFData(data.getBinData(), 0);
            return Integer.valueOf(ffData.getDefaultDropDownItemIndex());
        }
    }
    return null;
}
Also used : HWPFDocument(org.apache.poi.hwpf.HWPFDocument) NilPICFAndBinData(org.apache.poi.hwpf.model.NilPICFAndBinData) FFData(org.apache.poi.hwpf.model.FFData)

Example 5 with HWPFDocument

use of org.apache.poi.hwpf.HWPFDocument in project poi by apache.

the class CharacterRun method getDropDownListValues.

public String[] getDropDownListValues() {
    if (getDocument() instanceof HWPFDocument) {
        char c = _text.charAt(_start);
        if (c == 0x01) {
            NilPICFAndBinData data = new NilPICFAndBinData(((HWPFDocument) getDocument()).getDataStream(), getPicOffset());
            FFData ffData = new FFData(data.getBinData(), 0);
            String[] values = ffData.getDropList();
            return values;
        }
    }
    return null;
}
Also used : HWPFDocument(org.apache.poi.hwpf.HWPFDocument) NilPICFAndBinData(org.apache.poi.hwpf.model.NilPICFAndBinData) FFData(org.apache.poi.hwpf.model.FFData)

Aggregations

HWPFDocument (org.apache.poi.hwpf.HWPFDocument)126 Test (org.junit.Test)66 InputStream (java.io.InputStream)15 FileInputStream (java.io.FileInputStream)10 Range (org.apache.poi.hwpf.usermodel.Range)9 ByteArrayInputStream (java.io.ByteArrayInputStream)8 HSLFSlideShow (org.apache.poi.hslf.usermodel.HSLFSlideShow)7 HSSFWorkbook (org.apache.poi.hssf.usermodel.HSSFWorkbook)7 WordExtractor (org.apache.poi.hwpf.extractor.WordExtractor)7 ByteArrayOutputStream (java.io.ByteArrayOutputStream)6 PicturesTable (org.apache.poi.hwpf.model.PicturesTable)6 Bookmark (org.apache.poi.hwpf.usermodel.Bookmark)6 NPOIFSFileSystem (org.apache.poi.poifs.filesystem.NPOIFSFileSystem)6 File (java.io.File)4 FileOutputStream (java.io.FileOutputStream)4 Transformer (javax.xml.transform.Transformer)4 DOMSource (javax.xml.transform.dom.DOMSource)4 Picture (org.apache.poi.hwpf.usermodel.Picture)4 DirectoryNode (org.apache.poi.poifs.filesystem.DirectoryNode)4 POIFSFileSystem (org.apache.poi.poifs.filesystem.POIFSFileSystem)4