Search in sources :

Example 1 with OOXMLWordAndPowerPointTextHandler

use of org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler in project tika by apache.

the class XWPFEventBasedWordExtractor method handlePart.

private void handlePart(PackagePart packagePart, XWPFListManager xwpfListManager, StringBuilder buffer) throws IOException, SAXException {
    Map<String, String> hyperlinks = loadHyperlinkRelationships(packagePart);
    try (InputStream stream = packagePart.getInputStream()) {
        XMLReader reader = SAXHelper.newXMLReader();
        reader.setContentHandler(new OOXMLWordAndPowerPointTextHandler(new XWPFToTextContentHandler(buffer), hyperlinks));
        reader.parse(new InputSource(new CloseShieldInputStream(stream)));
    } catch (ParserConfigurationException e) {
        LOG.warn("Can't configure XMLReader", e);
    }
}
Also used : InputSource(org.xml.sax.InputSource) OOXMLWordAndPowerPointTextHandler(org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream) InputStream(java.io.InputStream) ParserConfigurationException(javax.xml.parsers.ParserConfigurationException) XMLReader(org.xml.sax.XMLReader) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream)

Aggregations

InputStream (java.io.InputStream)1 ParserConfigurationException (javax.xml.parsers.ParserConfigurationException)1 CloseShieldInputStream (org.apache.commons.io.input.CloseShieldInputStream)1 OOXMLWordAndPowerPointTextHandler (org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler)1 InputSource (org.xml.sax.InputSource)1 XMLReader (org.xml.sax.XMLReader)1