Search in sources :

Example 6 with ParserException

use of org.htmlparser.util.ParserException in project lucida by claritylab.

the class HTMLConverter method html2text.

/**
	 * Converts an HTML document into plain text.
	 * 
	 * @param html HTML document
	 * @return plain text or <code>null</code> if the conversion failed
	 */
public static synchronized String html2text(String html) {
    // convert HTML document
    StringBean sb = new StringBean();
    // no links
    sb.setLinks(false);
    // replace non-breaking spaces
    sb.setReplaceNonBreakingSpaces(true);
    // replace sequences of whitespaces
    sb.setCollapse(true);
    Parser parser = new Parser();
    try {
        parser.setInputHTML(html);
        parser.visitAllNodesWith(sb);
    } catch (ParserException e) {
        return null;
    }
    String docText = sb.getStrings();
    // no content
    if (docText == null)
        docText = "";
    return docText;
}
Also used : ParserException(org.htmlparser.util.ParserException) StringBean(org.htmlparser.beans.StringBean) Parser(org.htmlparser.Parser)

Aggregations

ParserException (org.htmlparser.util.ParserException)6 Parser (org.htmlparser.Parser)5 IOException (java.io.IOException)3 StringBean (org.htmlparser.beans.StringBean)2 URL (java.net.URL)1 LinkedList (java.util.LinkedList)1 PatternSyntaxException (java.util.regex.PatternSyntaxException)1 ConsumerPriceIndex (name.abuchen.portfolio.model.ConsumerPriceIndex)1 NodeClassFilter (org.htmlparser.filters.NodeClassFilter)1 Lexer (org.htmlparser.lexer.Lexer)1 LinkTag (org.htmlparser.tags.LinkTag)1 NodeList (org.htmlparser.util.NodeList)1