Search in sources :

Example 6 with HtmlContentHandler

use of edu.uci.ics.crawler4j.parser.HtmlContentHandler in project crawler4j by yasserg.

the class HtmlContentHandlerTest method testTableInBody.

@Test
public void testTableInBody() throws Exception {
    HtmlContentHandler parse = parseHtml("<html><body><table><tr><th>Hello</th><th>there</th></tr>" + "<tr><td>mr</td><td>bear</td></tr></html>");
    assertEquals("Hello there mr bear", parse.getBodyText());
}
Also used : HtmlContentHandler(edu.uci.ics.crawler4j.parser.HtmlContentHandler) Test(org.junit.Test)

Aggregations

HtmlContentHandler (edu.uci.ics.crawler4j.parser.HtmlContentHandler)6 Test (org.junit.Test)5 AllTagMapper (edu.uci.ics.crawler4j.parser.AllTagMapper)1 ExtractedUrlAnchorPair (edu.uci.ics.crawler4j.parser.ExtractedUrlAnchorPair)1 ByteArrayInputStream (java.io.ByteArrayInputStream)1 Metadata (org.apache.tika.metadata.Metadata)1