Search in sources :

Example 26 with HWPFDocument

use of org.apache.poi.hwpf.HWPFDocument in project poi by apache.

the class TestTableRow method testInnerTableCellsDetection.

@Test
public void testInnerTableCellsDetection() throws IOException {
    InputStream is = SAMPLES.openResourceAsStream("innertable.doc");
    HWPFDocument hwpfDocument = new HWPFDocument(is);
    is.close();
    hwpfDocument.getRange();
    Range documentRange = hwpfDocument.getRange();
    Paragraph startOfInnerTable = documentRange.getParagraph(6);
    Table innerTable = documentRange.getTable(startOfInnerTable);
    assertEquals(2, innerTable.numRows());
    TableRow tableRow = innerTable.getRow(0);
    assertEquals(2, tableRow.numCells());
    hwpfDocument.close();
}
Also used : HWPFDocument(org.apache.poi.hwpf.HWPFDocument) InputStream(java.io.InputStream) Test(org.junit.Test)

Example 27 with HWPFDocument

use of org.apache.poi.hwpf.HWPFDocument in project poi by apache.

the class EmbeddedObjects method main.

@SuppressWarnings("unused")
public static void main(String[] args) throws Exception {
    POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream(args[0]));
    HSSFWorkbook workbook = new HSSFWorkbook(fs);
    for (HSSFObjectData obj : workbook.getAllEmbeddedObjects()) {
        //the OLE2 Class Name of the object
        String oleName = obj.getOLE2ClassName();
        DirectoryNode dn = (obj.hasDirectoryEntry()) ? (DirectoryNode) obj.getDirectory() : null;
        Closeable document = null;
        if (oleName.equals("Worksheet")) {
            document = new HSSFWorkbook(dn, fs, false);
        } else if (oleName.equals("Document")) {
            document = new HWPFDocument(dn);
        } else if (oleName.equals("Presentation")) {
            document = new HSLFSlideShow(dn);
        } else {
            if (dn != null) {
                // The DirectoryEntry is a DocumentNode. Examine its entries to find out what it is
                for (Entry entry : dn) {
                    String name = entry.getName();
                }
            } else {
                // There is no DirectoryEntry
                // Recover the object's data from the HSSFObjectData instance.
                byte[] objectData = obj.getObjectData();
            }
        }
        if (document != null) {
            document.close();
        }
    }
    workbook.close();
}
Also used : HWPFDocument(org.apache.poi.hwpf.HWPFDocument) Entry(org.apache.poi.poifs.filesystem.Entry) POIFSFileSystem(org.apache.poi.poifs.filesystem.POIFSFileSystem) Closeable(java.io.Closeable) HSSFObjectData(org.apache.poi.hssf.usermodel.HSSFObjectData) DirectoryNode(org.apache.poi.poifs.filesystem.DirectoryNode) HSLFSlideShow(org.apache.poi.hslf.usermodel.HSLFSlideShow) FileInputStream(java.io.FileInputStream) HSSFWorkbook(org.apache.poi.hssf.usermodel.HSSFWorkbook)

Example 28 with HWPFDocument

use of org.apache.poi.hwpf.HWPFDocument in project poi by apache.

the class LoadEmbedded method loadEmbedded.

public static void loadEmbedded(HSSFWorkbook workbook) throws IOException {
    for (HSSFObjectData obj : workbook.getAllEmbeddedObjects()) {
        //the OLE2 Class Name of the object
        String oleName = obj.getOLE2ClassName();
        if (oleName.equals("Worksheet")) {
            DirectoryNode dn = (DirectoryNode) obj.getDirectory();
            HSSFWorkbook embeddedWorkbook = new HSSFWorkbook(dn, false);
            embeddedWorkbook.close();
        } else if (oleName.equals("Document")) {
            DirectoryNode dn = (DirectoryNode) obj.getDirectory();
            HWPFDocument embeddedWordDocument = new HWPFDocument(dn);
            embeddedWordDocument.close();
        } else if (oleName.equals("Presentation")) {
            DirectoryNode dn = (DirectoryNode) obj.getDirectory();
            SlideShow<?, ?> embeddedSlieShow = new HSLFSlideShow(dn);
            embeddedSlieShow.close();
        } else {
            if (obj.hasDirectoryEntry()) {
                // The DirectoryEntry is a DocumentNode. Examine its entries to find out what it is
                DirectoryNode dn = (DirectoryNode) obj.getDirectory();
                for (Entry entry : dn) {
                //System.out.println(oleName + "." + entry.getName());
                }
            } else {
                // There is no DirectoryEntry
                // Recover the object's data from the HSSFObjectData instance.
                byte[] objectData = obj.getObjectData();
            }
        }
    }
}
Also used : HWPFDocument(org.apache.poi.hwpf.HWPFDocument) Entry(org.apache.poi.poifs.filesystem.Entry) HSSFObjectData(org.apache.poi.hssf.usermodel.HSSFObjectData) DirectoryNode(org.apache.poi.poifs.filesystem.DirectoryNode) HSLFSlideShow(org.apache.poi.hslf.usermodel.HSLFSlideShow) HSSFWorkbook(org.apache.poi.hssf.usermodel.HSSFWorkbook)

Example 29 with HWPFDocument

use of org.apache.poi.hwpf.HWPFDocument in project poi by apache.

the class AbstractWordConverter method processCharacters.

protected boolean processCharacters(final HWPFDocumentCore wordDocument, final int currentTableLevel, final Range range, final Element block) {
    if (range == null)
        return false;
    boolean haveAnyText = false;
    /*
         * In text there can be fields, bookmarks, may be other structures (code
         * below allows extension). Those structures can overlaps, so either we
         * should process char-by-char (slow) or find a correct way to
         * reconstruct the structure of range -- sergey
         */
    List<Structure> structures = new LinkedList<Structure>();
    if (wordDocument instanceof HWPFDocument) {
        final HWPFDocument doc = (HWPFDocument) wordDocument;
        Map<Integer, List<Bookmark>> rangeBookmarks = doc.getBookmarks().getBookmarksStartedBetween(range.getStartOffset(), range.getEndOffset());
        if (rangeBookmarks != null) {
            for (List<Bookmark> lists : rangeBookmarks.values()) {
                for (Bookmark bookmark : lists) {
                    if (!bookmarkStack.contains(bookmark))
                        addToStructures(structures, new Structure(bookmark));
                }
            }
        }
        // TODO: dead fields?
        int skipUntil = -1;
        for (int c = 0; c < range.numCharacterRuns(); c++) {
            CharacterRun characterRun = range.getCharacterRun(c);
            if (characterRun == null)
                throw new AssertionError();
            if (characterRun.getStartOffset() < skipUntil)
                continue;
            String text = characterRun.text();
            if (text == null || text.length() == 0 || text.charAt(0) != FIELD_BEGIN_MARK)
                continue;
            Field aliveField = ((HWPFDocument) wordDocument).getFields().getFieldByStartOffset(FieldsDocumentPart.MAIN, characterRun.getStartOffset());
            if (aliveField != null) {
                addToStructures(structures, new Structure(aliveField));
            } else {
                int[] separatorEnd = tryDeadField_lookupFieldSeparatorEnd(wordDocument, range, c);
                if (separatorEnd != null) {
                    addToStructures(structures, new Structure(new DeadFieldBoundaries(c, separatorEnd[0], separatorEnd[1]), characterRun.getStartOffset(), range.getCharacterRun(separatorEnd[1]).getEndOffset()));
                    c = separatorEnd[1];
                }
            }
        }
    }
    structures = new ArrayList<Structure>(structures);
    Collections.sort(structures);
    int previous = range.getStartOffset();
    for (Structure structure : structures) {
        if (structure.start != previous) {
            Range subrange = new Range(previous, structure.start, range) {

                @Override
                public String toString() {
                    return "BetweenStructuresSubrange " + super.toString();
                }
            };
            processCharacters(wordDocument, currentTableLevel, subrange, block);
        }
        if (structure.structure instanceof Bookmark) {
            // other bookmarks with same boundaries
            List<Bookmark> bookmarks = new LinkedList<Bookmark>();
            for (Bookmark bookmark : ((HWPFDocument) wordDocument).getBookmarks().getBookmarksStartedBetween(structure.start, structure.start + 1).values().iterator().next()) {
                if (bookmark.getStart() == structure.start && bookmark.getEnd() == structure.end) {
                    bookmarks.add(bookmark);
                }
            }
            bookmarkStack.addAll(bookmarks);
            try {
                int end = Math.min(range.getEndOffset(), structure.end);
                Range subrange = new Range(structure.start, end, range) {

                    @Override
                    public String toString() {
                        return "BookmarksSubrange " + super.toString();
                    }
                };
                processBookmarks(wordDocument, block, subrange, currentTableLevel, bookmarks);
            } finally {
                bookmarkStack.removeAll(bookmarks);
            }
        } else if (structure.structure instanceof Field) {
            Field field = (Field) structure.structure;
            processField((HWPFDocument) wordDocument, range, currentTableLevel, field, block);
        } else if (structure.structure instanceof DeadFieldBoundaries) {
            DeadFieldBoundaries boundaries = (DeadFieldBoundaries) structure.structure;
            processDeadField(wordDocument, block, range, currentTableLevel, boundaries.beginMark, boundaries.separatorMark, boundaries.endMark);
        } else {
            throw new UnsupportedOperationException("NYI: " + structure.structure.getClass());
        }
        previous = Math.min(range.getEndOffset(), structure.end);
    }
    if (previous != range.getStartOffset()) {
        if (previous > range.getEndOffset()) {
            logger.log(POILogger.WARN, "Latest structure in ", range, " ended at #" + previous, " after range boundaries [", range.getStartOffset() + "; " + range.getEndOffset(), ")");
            return true;
        }
        if (previous < range.getEndOffset()) {
            Range subrange = new Range(previous, range.getEndOffset(), range) {

                @Override
                public String toString() {
                    return "AfterStructureSubrange " + super.toString();
                }
            };
            processCharacters(wordDocument, currentTableLevel, subrange, block);
        }
        return true;
    }
    for (int c = 0; c < range.numCharacterRuns(); c++) {
        CharacterRun characterRun = range.getCharacterRun(c);
        if (characterRun == null)
            throw new AssertionError();
        if (wordDocument instanceof HWPFDocument && ((HWPFDocument) wordDocument).getPicturesTable().hasPicture(characterRun)) {
            HWPFDocument newFormat = (HWPFDocument) wordDocument;
            Picture picture = newFormat.getPicturesTable().extractPicture(characterRun, true);
            processImage(block, characterRun.text().charAt(0) == 0x01, picture);
            continue;
        }
        String text = characterRun.text();
        if (text.isEmpty())
            continue;
        if (characterRun.isSpecialCharacter()) {
            if (text.charAt(0) == SPECCHAR_AUTONUMBERED_FOOTNOTE_REFERENCE && (wordDocument instanceof HWPFDocument)) {
                HWPFDocument doc = (HWPFDocument) wordDocument;
                processNoteAnchor(doc, characterRun, block);
                continue;
            }
            if (text.charAt(0) == SPECCHAR_DRAWN_OBJECT && (wordDocument instanceof HWPFDocument)) {
                HWPFDocument doc = (HWPFDocument) wordDocument;
                processDrawnObject(doc, characterRun, block);
                continue;
            }
            if (characterRun.isOle2() && (wordDocument instanceof HWPFDocument)) {
                HWPFDocument doc = (HWPFDocument) wordDocument;
                processOle2(doc, characterRun, block);
                continue;
            }
            if (characterRun.isSymbol() && (wordDocument instanceof HWPFDocument)) {
                HWPFDocument doc = (HWPFDocument) wordDocument;
                processSymbol(doc, characterRun, block);
                continue;
            }
        }
        if (text.charAt(0) == FIELD_BEGIN_MARK) {
            if (wordDocument instanceof HWPFDocument) {
                Field aliveField = ((HWPFDocument) wordDocument).getFields().getFieldByStartOffset(FieldsDocumentPart.MAIN, characterRun.getStartOffset());
                if (aliveField != null) {
                    processField(((HWPFDocument) wordDocument), range, currentTableLevel, aliveField, block);
                    int continueAfter = aliveField.getFieldEndOffset();
                    while (c < range.numCharacterRuns() && range.getCharacterRun(c).getEndOffset() <= continueAfter) c++;
                    if (c < range.numCharacterRuns())
                        c--;
                    continue;
                }
            }
            int skipTo = tryDeadField(wordDocument, range, currentTableLevel, c, block);
            if (skipTo != c) {
                c = skipTo;
                continue;
            }
            continue;
        }
        if (text.charAt(0) == FIELD_SEPARATOR_MARK) {
            // shall not appear without FIELD_BEGIN_MARK
            continue;
        }
        if (text.charAt(0) == FIELD_END_MARK) {
            // shall not appear without FIELD_BEGIN_MARK
            continue;
        }
        if (characterRun.isSpecialCharacter() || characterRun.isObj() || characterRun.isOle2()) {
            continue;
        }
        if (text.endsWith("\r") || (text.charAt(text.length() - 1) == BEL_MARK && currentTableLevel != Integer.MIN_VALUE))
            text = text.substring(0, text.length() - 1);
        {
            // line breaks
            StringBuilder stringBuilder = new StringBuilder();
            for (char charChar : text.toCharArray()) {
                if (charChar == 11) {
                    if (stringBuilder.length() > 0) {
                        outputCharacters(block, characterRun, stringBuilder.toString());
                        stringBuilder.setLength(0);
                    }
                    processLineBreak(block, characterRun);
                } else if (charChar == 30) {
                    // Non-breaking hyphens are stored as ASCII 30
                    stringBuilder.append(UNICODECHAR_NONBREAKING_HYPHEN);
                } else if (charChar == 31) {
                    // Non-required hyphens to zero-width space
                    stringBuilder.append(UNICODECHAR_ZERO_WIDTH_SPACE);
                } else if (charChar >= 0x20 || charChar == 0x09 || charChar == 0x0A || charChar == 0x0D) {
                    stringBuilder.append(charChar);
                }
            }
            if (stringBuilder.length() > 0) {
                outputCharacters(block, characterRun, stringBuilder.toString());
                stringBuilder.setLength(0);
            }
        }
        haveAnyText |= text.trim().length() != 0;
    }
    return haveAnyText;
}
Also used : CharacterRun(org.apache.poi.hwpf.usermodel.CharacterRun) Range(org.apache.poi.hwpf.usermodel.Range) LinkedList(java.util.LinkedList) HWPFDocument(org.apache.poi.hwpf.HWPFDocument) Field(org.apache.poi.hwpf.usermodel.Field) Bookmark(org.apache.poi.hwpf.usermodel.Bookmark) Picture(org.apache.poi.hwpf.usermodel.Picture) ArrayList(java.util.ArrayList) LinkedList(java.util.LinkedList) List(java.util.List) HWPFList(org.apache.poi.hwpf.usermodel.HWPFList)

Example 30 with HWPFDocument

use of org.apache.poi.hwpf.HWPFDocument in project poi by apache.

the class HWPFLister method dumpBookmarks.

private void dumpBookmarks() {
    if (!(_doc instanceof HWPFDocument)) {
        System.out.println("Word 95 not supported so far");
        return;
    }
    HWPFDocument document = (HWPFDocument) _doc;
    Bookmarks bookmarks = document.getBookmarks();
    for (int b = 0; b < bookmarks.getBookmarksCount(); b++) {
        Bookmark bookmark = bookmarks.getBookmark(b);
        System.out.println("[" + bookmark.getStart() + "; " + bookmark.getEnd() + "): " + bookmark.getName());
    }
}
Also used : HWPFDocument(org.apache.poi.hwpf.HWPFDocument) Bookmarks(org.apache.poi.hwpf.usermodel.Bookmarks) Bookmark(org.apache.poi.hwpf.usermodel.Bookmark)

Aggregations

HWPFDocument (org.apache.poi.hwpf.HWPFDocument)126 Test (org.junit.Test)66 InputStream (java.io.InputStream)15 FileInputStream (java.io.FileInputStream)10 Range (org.apache.poi.hwpf.usermodel.Range)9 ByteArrayInputStream (java.io.ByteArrayInputStream)8 HSLFSlideShow (org.apache.poi.hslf.usermodel.HSLFSlideShow)7 HSSFWorkbook (org.apache.poi.hssf.usermodel.HSSFWorkbook)7 WordExtractor (org.apache.poi.hwpf.extractor.WordExtractor)7 ByteArrayOutputStream (java.io.ByteArrayOutputStream)6 PicturesTable (org.apache.poi.hwpf.model.PicturesTable)6 Bookmark (org.apache.poi.hwpf.usermodel.Bookmark)6 NPOIFSFileSystem (org.apache.poi.poifs.filesystem.NPOIFSFileSystem)6 File (java.io.File)4 FileOutputStream (java.io.FileOutputStream)4 Transformer (javax.xml.transform.Transformer)4 DOMSource (javax.xml.transform.dom.DOMSource)4 Picture (org.apache.poi.hwpf.usermodel.Picture)4 DirectoryNode (org.apache.poi.poifs.filesystem.DirectoryNode)4 POIFSFileSystem (org.apache.poi.poifs.filesystem.POIFSFileSystem)4