Search in sources :

Example 1 with ExpandedTitleContentHandler

use of org.apache.tika.sax.ExpandedTitleContentHandler in project tika by apache.

the class TikaResource method produceOutput.

private StreamingOutput produceOutput(final InputStream is, final MultivaluedMap<String, String> httpHeaders, final UriInfo info, final String format) {
    final Parser parser = createParser();
    final Metadata metadata = new Metadata();
    final ParseContext context = new ParseContext();
    fillMetadata(parser, metadata, context, httpHeaders);
    fillParseContext(context, httpHeaders, parser);
    logRequest(LOG, info, metadata);
    return new StreamingOutput() {

        public void write(OutputStream outputStream) throws IOException, WebApplicationException {
            Writer writer = new OutputStreamWriter(outputStream, UTF_8);
            ContentHandler content;
            try {
                SAXTransformerFactory factory = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
                TransformerHandler handler = factory.newTransformerHandler();
                handler.getTransformer().setOutputProperty(OutputKeys.METHOD, format);
                handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
                handler.getTransformer().setOutputProperty(OutputKeys.ENCODING, UTF_8.name());
                handler.setResult(new StreamResult(writer));
                content = new ExpandedTitleContentHandler(handler);
            } catch (TransformerConfigurationException e) {
                throw new WebApplicationException(e);
            }
            parse(parser, LOG, info.getPath(), is, content, metadata, context);
        }
    };
}
Also used : TransformerHandler(javax.xml.transform.sax.TransformerHandler) StreamResult(javax.xml.transform.stream.StreamResult) TransformerConfigurationException(javax.xml.transform.TransformerConfigurationException) WebApplicationException(javax.ws.rs.WebApplicationException) OutputStream(java.io.OutputStream) Metadata(org.apache.tika.metadata.Metadata) SAXTransformerFactory(javax.xml.transform.sax.SAXTransformerFactory) StreamingOutput(javax.ws.rs.core.StreamingOutput) BoilerpipeContentHandler(org.apache.tika.parser.html.BoilerpipeContentHandler) ExpandedTitleContentHandler(org.apache.tika.sax.ExpandedTitleContentHandler) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ContentHandler(org.xml.sax.ContentHandler) RichTextContentHandler(org.apache.tika.sax.RichTextContentHandler) ExpandedTitleContentHandler(org.apache.tika.sax.ExpandedTitleContentHandler) Parser(org.apache.tika.parser.Parser) HtmlParser(org.apache.tika.parser.html.HtmlParser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) DigestingParser(org.apache.tika.parser.DigestingParser) ParseContext(org.apache.tika.parser.ParseContext) OutputStreamWriter(java.io.OutputStreamWriter) Writer(java.io.Writer) OutputStreamWriter(java.io.OutputStreamWriter)

Example 2 with ExpandedTitleContentHandler

use of org.apache.tika.sax.ExpandedTitleContentHandler in project camel by apache.

the class TikaProducer method getContentHandler.

private ContentHandler getContentHandler(TikaConfiguration configuration, OutputStream outputStream) throws TransformerConfigurationException, UnsupportedEncodingException {
    ContentHandler result = null;
    TikaParseOutputFormat outputFormat = configuration.getTikaParseOutputFormat();
    switch(outputFormat) {
        case xml:
            result = getTransformerHandler(outputStream, "xml", true);
            break;
        case text:
            result = new BodyContentHandler(new OutputStreamWriter(outputStream, this.encoding));
            break;
        case textMain:
            result = new BoilerpipeContentHandler(new OutputStreamWriter(outputStream, this.encoding));
            break;
        case html:
            result = new ExpandedTitleContentHandler(getTransformerHandler(outputStream, "html", true));
            break;
        default:
            throw new IllegalArgumentException(String.format("Unknown format %s", tikaConfiguration.getTikaParseOutputFormat()));
    }
    return result;
}
Also used : BodyContentHandler(org.apache.tika.sax.BodyContentHandler) OutputStreamWriter(java.io.OutputStreamWriter) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) BoilerpipeContentHandler(org.apache.tika.parser.html.BoilerpipeContentHandler) ContentHandler(org.xml.sax.ContentHandler) ExpandedTitleContentHandler(org.apache.tika.sax.ExpandedTitleContentHandler) BoilerpipeContentHandler(org.apache.tika.parser.html.BoilerpipeContentHandler) ExpandedTitleContentHandler(org.apache.tika.sax.ExpandedTitleContentHandler)

Aggregations

OutputStreamWriter (java.io.OutputStreamWriter)2 BoilerpipeContentHandler (org.apache.tika.parser.html.BoilerpipeContentHandler)2 BodyContentHandler (org.apache.tika.sax.BodyContentHandler)2 ExpandedTitleContentHandler (org.apache.tika.sax.ExpandedTitleContentHandler)2 ContentHandler (org.xml.sax.ContentHandler)2 OutputStream (java.io.OutputStream)1 Writer (java.io.Writer)1 WebApplicationException (javax.ws.rs.WebApplicationException)1 StreamingOutput (javax.ws.rs.core.StreamingOutput)1 TransformerConfigurationException (javax.xml.transform.TransformerConfigurationException)1 SAXTransformerFactory (javax.xml.transform.sax.SAXTransformerFactory)1 TransformerHandler (javax.xml.transform.sax.TransformerHandler)1 StreamResult (javax.xml.transform.stream.StreamResult)1 Metadata (org.apache.tika.metadata.Metadata)1 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)1 DigestingParser (org.apache.tika.parser.DigestingParser)1 ParseContext (org.apache.tika.parser.ParseContext)1 Parser (org.apache.tika.parser.Parser)1 HtmlParser (org.apache.tika.parser.html.HtmlParser)1 RichTextContentHandler (org.apache.tika.sax.RichTextContentHandler)1