Search in sources :

Example 1 with TaggedContentHandler

use of org.apache.tika.sax.TaggedContentHandler in project tika by apache.

the class XMLParser method parse.

public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    if (metadata.get(Metadata.CONTENT_TYPE) == null) {
        metadata.set(Metadata.CONTENT_TYPE, "application/xml");
    }
    final XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
    xhtml.startDocument();
    xhtml.startElement("p");
    TaggedContentHandler tagged = new TaggedContentHandler(handler);
    try {
        context.getSAXParser().parse(new CloseShieldInputStream(stream), new OfflineContentHandler(new EmbeddedContentHandler(getContentHandler(tagged, metadata, context))));
    } catch (SAXException e) {
        tagged.throwIfCauseOf(e);
        throw new TikaException("XML parse error", e);
    } finally {
        xhtml.endElement("p");
        xhtml.endDocument();
    }
}
Also used : OfflineContentHandler(org.apache.tika.sax.OfflineContentHandler) TikaException(org.apache.tika.exception.TikaException) TaggedContentHandler(org.apache.tika.sax.TaggedContentHandler) EmbeddedContentHandler(org.apache.tika.sax.EmbeddedContentHandler) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream) SAXException(org.xml.sax.SAXException)

Example 2 with TaggedContentHandler

use of org.apache.tika.sax.TaggedContentHandler in project tika by apache.

the class CompositeParser method parse.

/**
     * Delegates the call to the matching component parser.
     * <p>
     * Potential {@link RuntimeException}s, {@link IOException}s and
     * {@link SAXException}s unrelated to the given input stream and content
     * handler are automatically wrapped into {@link TikaException}s to better
     * honor the {@link Parser} contract.
     */
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    Parser parser = getParser(metadata, context);
    TemporaryResources tmp = new TemporaryResources();
    try {
        TikaInputStream taggedStream = TikaInputStream.get(stream, tmp);
        TaggedContentHandler taggedHandler = handler != null ? new TaggedContentHandler(handler) : null;
        if (parser instanceof ParserDecorator) {
            metadata.add("X-Parsed-By", ((ParserDecorator) parser).getWrappedParser().getClass().getName());
        } else {
            metadata.add("X-Parsed-By", parser.getClass().getName());
        }
        try {
            parser.parse(taggedStream, taggedHandler, metadata, context);
        } catch (RuntimeException e) {
            throw new TikaException("Unexpected RuntimeException from " + parser, e);
        } catch (IOException e) {
            taggedStream.throwIfCauseOf(e);
            throw new TikaException("TIKA-198: Illegal IOException from " + parser, e);
        } catch (SAXException e) {
            if (taggedHandler != null)
                taggedHandler.throwIfCauseOf(e);
            throw new TikaException("TIKA-237: Illegal SAXException from " + parser, e);
        }
    } finally {
        tmp.dispose();
    }
}
Also used : TikaException(org.apache.tika.exception.TikaException) TemporaryResources(org.apache.tika.io.TemporaryResources) TikaInputStream(org.apache.tika.io.TikaInputStream) TaggedContentHandler(org.apache.tika.sax.TaggedContentHandler) IOException(java.io.IOException) SAXException(org.xml.sax.SAXException)

Example 3 with TaggedContentHandler

use of org.apache.tika.sax.TaggedContentHandler in project tika by apache.

the class DIFParser method parse.

@Override
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    // TODO Auto-generated method stub
    final XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
    xhtml.startDocument();
    xhtml.startElement("p");
    TaggedContentHandler tagged = new TaggedContentHandler(handler);
    try {
        context.getSAXParser().parse(new CloseShieldInputStream(stream), new OfflineContentHandler(new EmbeddedContentHandler(getContentHandler(tagged, metadata, context))));
    } catch (SAXException e) {
        tagged.throwIfCauseOf(e);
        throw new TikaException("XML parse error", e);
    } finally {
        xhtml.endElement("p");
        xhtml.endDocument();
    }
}
Also used : OfflineContentHandler(org.apache.tika.sax.OfflineContentHandler) TikaException(org.apache.tika.exception.TikaException) TaggedContentHandler(org.apache.tika.sax.TaggedContentHandler) EmbeddedContentHandler(org.apache.tika.sax.EmbeddedContentHandler) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream) SAXException(org.xml.sax.SAXException)

Example 4 with TaggedContentHandler

use of org.apache.tika.sax.TaggedContentHandler in project tika by apache.

the class AbstractXML2003Parser method parse.

@Override
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    setContentType(metadata);
    final XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
    xhtml.startDocument();
    TaggedContentHandler tagged = new TaggedContentHandler(xhtml);
    try {
        context.getSAXParser().parse(new CloseShieldInputStream(stream), new OfflineContentHandler(new EmbeddedContentHandler(getContentHandler(tagged, metadata, context))));
    } catch (SAXException e) {
        tagged.throwIfCauseOf(e);
        throw new TikaException("XML parse error", e);
    } finally {
        xhtml.endDocument();
    }
}
Also used : OfflineContentHandler(org.apache.tika.sax.OfflineContentHandler) TikaException(org.apache.tika.exception.TikaException) TaggedContentHandler(org.apache.tika.sax.TaggedContentHandler) EmbeddedContentHandler(org.apache.tika.sax.EmbeddedContentHandler) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream) SAXException(org.xml.sax.SAXException)

Aggregations

TikaException (org.apache.tika.exception.TikaException)4 TaggedContentHandler (org.apache.tika.sax.TaggedContentHandler)4 SAXException (org.xml.sax.SAXException)4 CloseShieldInputStream (org.apache.commons.io.input.CloseShieldInputStream)3 EmbeddedContentHandler (org.apache.tika.sax.EmbeddedContentHandler)3 OfflineContentHandler (org.apache.tika.sax.OfflineContentHandler)3 XHTMLContentHandler (org.apache.tika.sax.XHTMLContentHandler)3 IOException (java.io.IOException)1 TemporaryResources (org.apache.tika.io.TemporaryResources)1 TikaInputStream (org.apache.tika.io.TikaInputStream)1