Search in sources :

Example 1 with TaggedInputStream

use of org.apache.commons.io.input.TaggedInputStream in project tika by apache.

the class RTFParser method parse.

public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    metadata.set(Metadata.CONTENT_TYPE, "application/rtf");
    TaggedInputStream tagged = new TaggedInputStream(stream);
    try {
        XHTMLContentHandler xhtmlHandler = new XHTMLContentHandler(handler, metadata);
        RTFEmbObjHandler embObjHandler = new RTFEmbObjHandler(xhtmlHandler, metadata, context, getMemoryLimitInKb());
        final TextExtractor ert = new TextExtractor(xhtmlHandler, metadata, embObjHandler);
        ert.extract(stream);
    } catch (IOException e) {
        tagged.throwIfCauseOf(e);
        throw new TikaException("Error parsing an RTF document", e);
    }
}
Also used : TikaException(org.apache.tika.exception.TikaException) TaggedInputStream(org.apache.commons.io.input.TaggedInputStream) IOException(java.io.IOException) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler)

Aggregations

IOException (java.io.IOException)1 TaggedInputStream (org.apache.commons.io.input.TaggedInputStream)1 TikaException (org.apache.tika.exception.TikaException)1 XHTMLContentHandler (org.apache.tika.sax.XHTMLContentHandler)1