Search in sources :

Example 26 with XHTMLContentHandler

use of org.apache.tika.sax.XHTMLContentHandler in project jackrabbit by apache.

the class BlockingParser method parse.

@Override
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws SAXException {
    waitIfBlocked();
    XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
    xhtml.startDocument();
    xhtml.element("p", "The quick brown fox jumped over the lazy dog.");
    xhtml.endDocument();
}
Also used : XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler)

Example 27 with XHTMLContentHandler

use of org.apache.tika.sax.XHTMLContentHandler in project jackrabbit-oak by apache.

the class HtmlRepresentation method render.

public void render(Tree tree, HttpServletResponse response) throws IOException {
    try {
        XHTMLContentHandler xhtml = startResponse(response, tree.getPath());
        xhtml.startDocument();
        xhtml.startElement("dl");
        for (PropertyState property : tree.getProperties()) {
            xhtml.element("dt", property.getName());
            if (property.isArray()) {
                xhtml.startElement("dd");
                xhtml.startElement("ol");
                for (String value : property.getValue(STRINGS)) {
                    xhtml.element("li", value);
                }
                xhtml.endElement("ol");
                xhtml.endElement("dd");
            } else {
                xhtml.element("dd", property.getValue(STRING));
            }
        }
        for (Tree child : tree.getChildren()) {
            String name = child.getName();
            xhtml.element("dt", name);
            xhtml.startElement("dd");
            xhtml.startElement("a", "href", response.encodeRedirectURL(URLEncoder.encode(name, Charsets.UTF_8.name()) + "/"));
            xhtml.characters(child.getPath());
            xhtml.endElement("a");
            xhtml.endElement("dd");
        }
        xhtml.endElement("dl");
        xhtml.endDocument();
    } catch (SAXException e) {
        throw new IOException(e);
    }
}
Also used : Tree(org.apache.jackrabbit.oak.api.Tree) IOException(java.io.IOException) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) PropertyState(org.apache.jackrabbit.oak.api.PropertyState) SAXException(org.xml.sax.SAXException)

Example 28 with XHTMLContentHandler

use of org.apache.tika.sax.XHTMLContentHandler in project jackrabbit-oak by apache.

the class HtmlRepresentation method startResponse.

private XHTMLContentHandler startResponse(HttpServletResponse response, String title) throws IOException {
    try {
        response.setContentType("text/html");
        response.setCharacterEncoding("UTF-8");
        SAXTransformerFactory factory = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
        TransformerHandler handler = factory.newTransformerHandler();
        Transformer transformer = handler.getTransformer();
        transformer.setOutputProperty(OutputKeys.METHOD, "html");
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        handler.setResult(new StreamResult(response.getOutputStream()));
        Metadata metadata = new Metadata();
        metadata.set(Metadata.TITLE, title);
        return new XHTMLContentHandler(handler, metadata);
    } catch (TransformerConfigurationException e) {
        throw new IOException(e);
    }
}
Also used : TransformerHandler(javax.xml.transform.sax.TransformerHandler) Transformer(javax.xml.transform.Transformer) StreamResult(javax.xml.transform.stream.StreamResult) TransformerConfigurationException(javax.xml.transform.TransformerConfigurationException) SAXTransformerFactory(javax.xml.transform.sax.SAXTransformerFactory) Metadata(org.apache.tika.metadata.Metadata) IOException(java.io.IOException) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler)

Example 29 with XHTMLContentHandler

use of org.apache.tika.sax.XHTMLContentHandler in project tika by apache.

the class DummyParser method parse.

public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    for (Entry<String, String> m : this.metadata.entrySet()) {
        metadata.add(m.getKey(), m.getValue());
    }
    XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
    xhtml.startDocument();
    if (xmlText != null) {
        xhtml.characters(xmlText.toCharArray(), 0, xmlText.length());
    }
    xhtml.endDocument();
}
Also used : XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler)

Example 30 with XHTMLContentHandler

use of org.apache.tika.sax.XHTMLContentHandler in project tika by apache.

the class MockParser method parse.

@Override
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    Document doc = null;
    try {
        DocumentBuilder docBuilder = context.getDocumentBuilder();
        doc = docBuilder.parse(stream);
    } catch (SAXException e) {
        //to distinguish between SAX on read vs SAX while writing
        throw new IOExceptionWithCause(e);
    }
    Node root = doc.getDocumentElement();
    NodeList actions = root.getChildNodes();
    XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
    xhtml.startDocument();
    for (int i = 0; i < actions.getLength(); i++) {
        executeAction(actions.item(i), metadata, context, xhtml);
    }
    xhtml.endDocument();
}
Also used : IOExceptionWithCause(org.apache.tika.io.IOExceptionWithCause) DocumentBuilder(javax.xml.parsers.DocumentBuilder) Node(org.w3c.dom.Node) NodeList(org.w3c.dom.NodeList) Document(org.w3c.dom.Document) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) SAXException(org.xml.sax.SAXException)

Aggregations

XHTMLContentHandler (org.apache.tika.sax.XHTMLContentHandler)72 TikaException (org.apache.tika.exception.TikaException)26 TikaInputStream (org.apache.tika.io.TikaInputStream)22 TemporaryResources (org.apache.tika.io.TemporaryResources)14 CloseShieldInputStream (org.apache.commons.io.input.CloseShieldInputStream)13 IOException (java.io.IOException)12 SAXException (org.xml.sax.SAXException)9 File (java.io.File)6 EmbeddedDocumentExtractor (org.apache.tika.extractor.EmbeddedDocumentExtractor)6 Metadata (org.apache.tika.metadata.Metadata)6 BufferedInputStream (java.io.BufferedInputStream)5 InputStream (java.io.InputStream)5 EmbeddedContentHandler (org.apache.tika.sax.EmbeddedContentHandler)5 ByteArrayInputStream (java.io.ByteArrayInputStream)4 Charset (java.nio.charset.Charset)4 ArrayList (java.util.ArrayList)4 Map (java.util.Map)4 MediaType (org.apache.tika.mime.MediaType)4 OfflineContentHandler (org.apache.tika.sax.OfflineContentHandler)4 InputStreamReader (java.io.InputStreamReader)3