Search in sources :

Example 1 with SAXParser

use of org.cyberneko.html.parsers.SAXParser in project Java-readability by basis-technology-corp.

the class NekoJsoupParser method parse.

public Document parse(String data, String baseUri) throws SAXException, IOException {
    InputSource source = new InputSource();
    source.setCharacterStream(new StringReader(data));
    SAXParser nekoParser = new SAXParser();
    Document document = new Document(baseUri);
    nekoParser.setContentHandler(new Handler(document));
    nekoParser.setErrorHandler(new LocalErrorHandler());
    nekoParser.parse(source);
    return document;
}
Also used : InputSource(org.xml.sax.InputSource) StringReader(java.io.StringReader) SAXParser(org.cyberneko.html.parsers.SAXParser) DefaultHandler(org.xml.sax.helpers.DefaultHandler) ErrorHandler(org.xml.sax.ErrorHandler) Document(org.jsoup.nodes.Document)

Example 2 with SAXParser

use of org.cyberneko.html.parsers.SAXParser in project Java-readability by basis-technology-corp.

the class NekoJsoupParser method parse.

public Document parse(InputStream data, String baseUri) throws SAXException, IOException {
    InputSource source = new InputSource();
    source.setByteStream(data);
    SAXParser nekoParser = new SAXParser();
    Document document = new Document(baseUri);
    nekoParser.setContentHandler(new Handler(document));
    nekoParser.setErrorHandler(new LocalErrorHandler());
    nekoParser.parse(source);
    return document;
}
Also used : InputSource(org.xml.sax.InputSource) SAXParser(org.cyberneko.html.parsers.SAXParser) DefaultHandler(org.xml.sax.helpers.DefaultHandler) ErrorHandler(org.xml.sax.ErrorHandler) Document(org.jsoup.nodes.Document)

Example 3 with SAXParser

use of org.cyberneko.html.parsers.SAXParser in project gocd by gocd.

the class HtmlSaxParserContext method createParser.

@Override
protected AbstractSAXParser createParser() throws SAXException {
    SAXParser parser = new SAXParser();
    try {
        parser.setProperty("http://cyberneko.org/html/properties/names/elems", "lower");
        parser.setProperty("http://cyberneko.org/html/properties/names/attrs", "lower");
        return parser;
    } catch (SAXException ex) {
        throw new SAXException("Problem while creating HTML SAX Parser: " + ex.toString());
    }
}
Also used : SAXParser(org.cyberneko.html.parsers.SAXParser) AbstractSAXParser(org.apache.xerces.parsers.AbstractSAXParser) SAXException(org.xml.sax.SAXException)

Example 4 with SAXParser

use of org.cyberneko.html.parsers.SAXParser in project gradle by gradle.

the class ApacheDirectoryListingParser method parse.

public List<String> parse(URI baseURI, InputStream content, String contentType) throws Exception {
    baseURI = addTrailingSlashes(baseURI);
    if (contentType == null || !contentType.startsWith("text/html")) {
        throw new ResourceException(baseURI, String.format("Unsupported ContentType %s for directory listing '%s'", contentType, baseURI));
    }
    String contentEncoding = UriTextResource.extractCharacterEncoding(contentType, "utf-8");
    final Reader htmlText = new InputStreamReader(content, contentEncoding);
    final InputSource inputSource = new InputSource(htmlText);
    final SAXParser htmlParser = new SAXParser();
    final AnchorListerHandler anchorListerHandler = new AnchorListerHandler();
    htmlParser.setContentHandler(anchorListerHandler);
    htmlParser.parse(inputSource);
    List<String> hrefs = anchorListerHandler.getHrefs();
    List<URI> uris = resolveURIs(baseURI, hrefs);
    return filterNonDirectChilds(baseURI, uris);
}
Also used : InputSource(org.xml.sax.InputSource) InputStreamReader(java.io.InputStreamReader) Reader(java.io.Reader) InputStreamReader(java.io.InputStreamReader) SAXParser(org.cyberneko.html.parsers.SAXParser) ResourceException(org.gradle.api.resources.ResourceException) URI(java.net.URI)

Example 5 with SAXParser

use of org.cyberneko.html.parsers.SAXParser in project nokogiri by sparklemotion.

the class HtmlSaxParserContext method createParser.

@Override
protected AbstractSAXParser createParser() throws SAXException {
    SAXParser parser = new SAXParser();
    try {
        parser.setProperty("http://cyberneko.org/html/properties/names/elems", "lower");
        parser.setProperty("http://cyberneko.org/html/properties/names/attrs", "lower");
        return parser;
    } catch (SAXException ex) {
        throw new SAXException("Problem while creating HTML SAX Parser: " + ex.toString());
    }
}
Also used : SAXParser(org.cyberneko.html.parsers.SAXParser) AbstractSAXParser(org.apache.xerces.parsers.AbstractSAXParser) SAXException(org.xml.sax.SAXException)

Aggregations

SAXParser (org.cyberneko.html.parsers.SAXParser)5 InputSource (org.xml.sax.InputSource)3 AbstractSAXParser (org.apache.xerces.parsers.AbstractSAXParser)2 Document (org.jsoup.nodes.Document)2 ErrorHandler (org.xml.sax.ErrorHandler)2 SAXException (org.xml.sax.SAXException)2 DefaultHandler (org.xml.sax.helpers.DefaultHandler)2 InputStreamReader (java.io.InputStreamReader)1 Reader (java.io.Reader)1 StringReader (java.io.StringReader)1 URI (java.net.URI)1 ResourceException (org.gradle.api.resources.ResourceException)1