Search in sources :

Example 26 with SAXTransformerFactory

use of javax.xml.transform.sax.SAXTransformerFactory in project tika by apache.

the class TikaGUI method getHtmlHandler.

/**
     * Creates and returns a content handler that turns XHTML input to
     * simplified HTML output that can be correctly parsed and displayed
     * by {@link JEditorPane}.
     * <p>
     * The returned content handler is set to output <code>html</code>
     * to the given writer. The XHTML namespace is removed from the output
     * to prevent the serializer from using the &lt;tag/&gt; empty element
     * syntax that causes extra "&gt;" characters to be displayed.
     * The &lt;head&gt; tags are dropped to prevent the serializer from
     * generating a &lt;META&gt; content type tag that makes
     * {@link JEditorPane} fail thinking that the document character set
     * is inconsistent.
     * <p>
     * Additionally, it will use ImageSavingParser to re-write embedded:(image) 
     * image links to be file:///(temporary file) so that they can be loaded.
     *
     * @param writer output writer
     * @return HTML content handler
     * @throws TransformerConfigurationException if an error occurs
     */
private ContentHandler getHtmlHandler(Writer writer) throws TransformerConfigurationException {
    SAXTransformerFactory factory = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
    TransformerHandler handler = factory.newTransformerHandler();
    handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
    handler.setResult(new StreamResult(writer));
    return new ContentHandlerDecorator(handler) {

        @Override
        public void startElement(String uri, String localName, String name, Attributes atts) throws SAXException {
            if (XHTMLContentHandler.XHTML.equals(uri)) {
                uri = null;
            }
            if (!"head".equals(localName)) {
                if ("img".equals(localName)) {
                    AttributesImpl newAttrs;
                    if (atts instanceof AttributesImpl) {
                        newAttrs = (AttributesImpl) atts;
                    } else {
                        newAttrs = new AttributesImpl(atts);
                    }
                    for (int i = 0; i < newAttrs.getLength(); i++) {
                        if ("src".equals(newAttrs.getLocalName(i))) {
                            String src = newAttrs.getValue(i);
                            if (src.startsWith("embedded:")) {
                                String filename = src.substring(src.indexOf(':') + 1);
                                try {
                                    File img = imageParser.requestSave(filename);
                                    String newSrc = img.toURI().toString();
                                    newAttrs.setValue(i, newSrc);
                                } catch (IOException e) {
                                    System.err.println("Error creating temp image file " + filename);
                                // The html viewer will show a broken image too to alert them
                                }
                            }
                        }
                    }
                    super.startElement(uri, localName, name, newAttrs);
                } else {
                    super.startElement(uri, localName, name, atts);
                }
            }
        }

        @Override
        public void endElement(String uri, String localName, String name) throws SAXException {
            if (XHTMLContentHandler.XHTML.equals(uri)) {
                uri = null;
            }
            if (!"head".equals(localName)) {
                super.endElement(uri, localName, name);
            }
        }

        @Override
        public void startPrefixMapping(String prefix, String uri) {
        }

        @Override
        public void endPrefixMapping(String prefix) {
        }
    };
}
Also used : TransformerHandler(javax.xml.transform.sax.TransformerHandler) AttributesImpl(org.xml.sax.helpers.AttributesImpl) StreamResult(javax.xml.transform.stream.StreamResult) SAXTransformerFactory(javax.xml.transform.sax.SAXTransformerFactory) Attributes(org.xml.sax.Attributes) IOException(java.io.IOException) ContentHandlerDecorator(org.apache.tika.sax.ContentHandlerDecorator) File(java.io.File)

Example 27 with SAXTransformerFactory

use of javax.xml.transform.sax.SAXTransformerFactory in project tika by apache.

the class TikaGUI method getXmlContentHandler.

private ContentHandler getXmlContentHandler(Writer writer) throws TransformerConfigurationException {
    SAXTransformerFactory factory = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
    TransformerHandler handler = factory.newTransformerHandler();
    handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml");
    handler.setResult(new StreamResult(writer));
    return handler;
}
Also used : TransformerHandler(javax.xml.transform.sax.TransformerHandler) StreamResult(javax.xml.transform.stream.StreamResult) SAXTransformerFactory(javax.xml.transform.sax.SAXTransformerFactory)

Aggregations

SAXTransformerFactory (javax.xml.transform.sax.SAXTransformerFactory)27 TransformerHandler (javax.xml.transform.sax.TransformerHandler)22 StreamResult (javax.xml.transform.stream.StreamResult)22 AttributesImpl (org.xml.sax.helpers.AttributesImpl)7 InputStream (java.io.InputStream)6 StringWriter (java.io.StringWriter)6 Metadata (org.apache.tika.metadata.Metadata)6 File (java.io.File)5 IOException (java.io.IOException)5 Transformer (javax.xml.transform.Transformer)5 ParseContext (org.apache.tika.parser.ParseContext)5 OutputStream (java.io.OutputStream)4 OutputStreamWriter (java.io.OutputStreamWriter)4 TikaTest (org.apache.tika.TikaTest)4 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)4 Test (org.junit.Test)4 ByteArrayOutputStream (java.io.ByteArrayOutputStream)3 FileOutputStream (java.io.FileOutputStream)3 Map (java.util.Map)3 TransformerConfigurationException (javax.xml.transform.TransformerConfigurationException)3