Search in sources :

Example 11 with OfflineContentHandler

use of org.apache.tika.sax.OfflineContentHandler in project tika by apache.

the class OpenDocumentContentParser method parseInternal.

void parseInternal(InputStream stream, final ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    DefaultHandler dh = new OpenDocumentElementMappingContentHandler(handler, MAPPINGS);
    SAXParser parser = context.getSAXParser();
    parser.parse(new CloseShieldInputStream(stream), new OfflineContentHandler(new NSNormalizerContentHandler(dh)));
}
Also used : OfflineContentHandler(org.apache.tika.sax.OfflineContentHandler) SAXParser(javax.xml.parsers.SAXParser) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream) DefaultHandler(org.xml.sax.helpers.DefaultHandler)

Aggregations

OfflineContentHandler (org.apache.tika.sax.OfflineContentHandler)11 CloseShieldInputStream (org.apache.commons.io.input.CloseShieldInputStream)9 TikaException (org.apache.tika.exception.TikaException)7 EmbeddedContentHandler (org.apache.tika.sax.EmbeddedContentHandler)6 XHTMLContentHandler (org.apache.tika.sax.XHTMLContentHandler)5 SAXException (org.xml.sax.SAXException)5 InputStream (java.io.InputStream)3 SAXParser (javax.xml.parsers.SAXParser)3 TaggedContentHandler (org.apache.tika.sax.TaggedContentHandler)3 BufferedInputStream (java.io.BufferedInputStream)1 IOException (java.io.IOException)1 SAXParserFactory (javax.xml.parsers.SAXParserFactory)1 ZipArchiveEntry (org.apache.commons.compress.archivers.zip.ZipArchiveEntry)1 ZipArchiveInputStream (org.apache.commons.compress.archivers.zip.ZipArchiveInputStream)1 InvalidFormatException (org.apache.poi.openxml4j.exceptions.InvalidFormatException)1 PackagePart (org.apache.poi.openxml4j.opc.PackagePart)1 PackageRelationship (org.apache.poi.openxml4j.opc.PackageRelationship)1 PackageRelationshipCollection (org.apache.poi.openxml4j.opc.PackageRelationshipCollection)1 CloseShieldInputStream (org.apache.tika.io.CloseShieldInputStream)1 ParseContext (org.apache.tika.parser.ParseContext)1