Search in sources :

Example 1 with POIXMLTextExtractor

use of org.apache.poi.ooxml.extractor.POIXMLTextExtractor in project yyl_example by Relucent.

the class WordExample method toString.

public static String toString(byte[] content) throws IOException {
    try (InputStream input = new ByteArrayInputStream(content)) {
        FileMagic magic = ofMagic(content);
        // DOC:OLE2
        if (FileMagic.OLE2.equals(magic)) {
            try (WordExtractor extractor = new WordExtractor(input)) {
                return extractor.getText();
            }
        }
        // DOCX:OOXML
        OPCPackage opcPackage = OPCPackage.open(input);
        try (POIXMLTextExtractor extractor = new XWPFWordExtractor(opcPackage)) {
            return extractor.getText();
        }
    } catch (XmlException | OpenXML4JException e) {
        throw new IOException(e);
    }
}
Also used : OpenXML4JException(org.apache.poi.openxml4j.exceptions.OpenXML4JException) POIXMLTextExtractor(org.apache.poi.ooxml.extractor.POIXMLTextExtractor) ByteArrayInputStream(java.io.ByteArrayInputStream) ByteArrayInputStream(java.io.ByteArrayInputStream) InputStream(java.io.InputStream) XmlException(org.apache.xmlbeans.XmlException) FileMagic(org.apache.poi.poifs.filesystem.FileMagic) XWPFWordExtractor(org.apache.poi.xwpf.extractor.XWPFWordExtractor) IOException(java.io.IOException) OPCPackage(org.apache.poi.openxml4j.opc.OPCPackage) XWPFWordExtractor(org.apache.poi.xwpf.extractor.XWPFWordExtractor) WordExtractor(org.apache.poi.hwpf.extractor.WordExtractor)

Aggregations

ByteArrayInputStream (java.io.ByteArrayInputStream)1 IOException (java.io.IOException)1 InputStream (java.io.InputStream)1 WordExtractor (org.apache.poi.hwpf.extractor.WordExtractor)1 POIXMLTextExtractor (org.apache.poi.ooxml.extractor.POIXMLTextExtractor)1 OpenXML4JException (org.apache.poi.openxml4j.exceptions.OpenXML4JException)1 OPCPackage (org.apache.poi.openxml4j.opc.OPCPackage)1 FileMagic (org.apache.poi.poifs.filesystem.FileMagic)1 XWPFWordExtractor (org.apache.poi.xwpf.extractor.XWPFWordExtractor)1 XmlException (org.apache.xmlbeans.XmlException)1