Search in sources :

Example 11 with DOMParser

use of org.cyberneko.html.parsers.DOMParser in project ofbiz-framework by apache.

the class UelFunctions method readHtmlDocument.

public static Document readHtmlDocument(String str) {
    Document document = null;
    try {
        URL url = FlexibleLocation.resolveLocation(str);
        if (url != null) {
            DOMParser parser = new DOMParser();
            parser.setFeature("http://xml.org/sax/features/namespaces", false);
            parser.parse(url.toExternalForm());
            document = parser.getDocument();
        } else {
            Debug.logError("Unable to locate HTML document " + str, module);
        }
    } catch (IOException | SAXException e) {
        Debug.logError(e, "Error while reading HTML document " + str, module);
    }
    return document;
}
Also used : DOMParser(org.cyberneko.html.parsers.DOMParser) IOException(java.io.IOException) Document(org.w3c.dom.Document) URL(java.net.URL) SAXException(org.xml.sax.SAXException)

Example 12 with DOMParser

use of org.cyberneko.html.parsers.DOMParser in project nimbus by nimbus-org.

the class DOMHTMLConverter method toDOM.

protected Document toDOM(InputStream is) throws ConvertException {
    DOMParser parser = new DOMParser();
    try {
        final InputSource inputSource = new InputSource(is);
        if (characterEncodingToObject != null) {
            String encoding = (String) IANA2JAVA_ENCODING_MAP.get(characterEncodingToObject);
            if (encoding == null) {
                encoding = characterEncodingToObject;
            }
            inputSource.setEncoding(encoding);
        }
        if (isSynchronizedDomParse) {
            final Object lock = parser.getClass();
            synchronized (lock) {
                parser.parse(inputSource);
            }
        } else {
            parser.parse(inputSource);
        }
        return parser.getDocument();
    } catch (SAXException e) {
        throw new ConvertException(e);
    } catch (IOException e) {
        throw new ConvertException(e);
    }
}
Also used : DOMParser(org.cyberneko.html.parsers.DOMParser)

Aggregations

DOMParser (org.cyberneko.html.parsers.DOMParser)12 Document (org.w3c.dom.Document)10 IOException (java.io.IOException)8 InputSource (org.xml.sax.InputSource)8 SAXException (org.xml.sax.SAXException)7 Node (org.w3c.dom.Node)6 NodeList (org.w3c.dom.NodeList)5 StringReader (java.io.StringReader)4 ArrayList (java.util.ArrayList)3 CrawlerSystemException (org.codelibs.fess.crawler.exception.CrawlerSystemException)3 BufferedInputStream (java.io.BufferedInputStream)2 InputStream (java.io.InputStream)2 UnsupportedEncodingException (java.io.UnsupportedEncodingException)2 URL (java.net.URL)2 LinkedHashMap (java.util.LinkedHashMap)2 Map (java.util.Map)2 TransformerException (javax.xml.transform.TransformerException)2 CrawlingAccessException (org.codelibs.fess.crawler.exception.CrawlingAccessException)2 NamedNodeMap (org.w3c.dom.NamedNodeMap)2 ProgressIndicator (com.intellij.openapi.progress.ProgressIndicator)1