Search in sources :

Example 11 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class ParseTest method testBaiduVariant.

@Test
public void testBaiduVariant() throws IOException {
    // tests <meta charset> when preceded by another <meta>
    File in = getFile("/htmltests/baidu-variant.html");
    Document doc = Jsoup.parse(in, null, // http charset is gb2312, but NOT specifying it, to test http-equiv parse
    "http://www.baidu.com/");
    // check auto-detect from meta
    assertEquals("GB2312", doc.outputSettings().charset().displayName());
    assertEquals("<title>百度一下,你就知道</title>", doc.select("title").outerHtml());
}
Also used : Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Example 12 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class UrlConnectTest method sendHeadRequest.

@Test
public void sendHeadRequest() throws IOException {
    String url = "http://direct.infohound.net/tools/parse-xml.xml";
    Connection con = Jsoup.connect(url).method(Connection.Method.HEAD);
    final Connection.Response response = con.execute();
    assertEquals("text/xml", response.header("Content-Type"));
    // head ought to have no body
    assertEquals("", response.body());
    Document doc = response.parse();
    assertEquals("", doc.text());
}
Also used : Connection(org.jsoup.Connection) Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Example 13 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class UrlConnectTest method ignores500tExceptionIfSoConfigured.

@Test
public void ignores500tExceptionIfSoConfigured() throws IOException {
    Connection con = Jsoup.connect("http://direct.infohound.net/tools/500.pl").ignoreHttpErrors(true);
    Connection.Response res = con.execute();
    Document doc = res.parse();
    assertEquals(500, res.statusCode());
    assertEquals("Application Error", res.statusMessage());
    assertEquals("Woops", doc.select("h1").first().text());
}
Also used : Connection(org.jsoup.Connection) Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Example 14 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class UrlConnectTest method fetchToW3c.

@Test
public void fetchToW3c() throws IOException {
    String url = "https://jsoup.org";
    Document doc = Jsoup.connect(url).get();
    W3CDom dom = new W3CDom();
    org.w3c.dom.Document wDoc = dom.fromJsoup(doc);
    assertEquals(url, wDoc.getDocumentURI());
    String html = dom.asString(wDoc);
    assertTrue(html.contains("jsoup"));
}
Also used : W3CDom(org.jsoup.helper.W3CDom) Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Example 15 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class UrlConnectTest method baseHrefCorrectAfterHttpEquiv.

@Test
public void baseHrefCorrectAfterHttpEquiv() throws IOException {
    // https://github.com/jhy/jsoup/issues/440
    Connection.Response res = Jsoup.connect("http://direct.infohound.net/tools/charset-base.html").execute();
    Document doc = res.parse();
    assertEquals("http://example.com/foo.jpg", doc.select("img").first().absUrl("src"));
}
Also used : Connection(org.jsoup.Connection) Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Aggregations

Document (org.jsoup.nodes.Document)391 Test (org.junit.Test)194 Element (org.jsoup.nodes.Element)153 IOException (java.io.IOException)100 File (java.io.File)81 Elements (org.jsoup.select.Elements)70 ElementHandlerImpl (org.asqatasun.ruleimplementation.ElementHandlerImpl)51 Connection (org.jsoup.Connection)37 ArrayList (java.util.ArrayList)36 URL (java.net.URL)24 HashMap (java.util.HashMap)16 InputStream (java.io.InputStream)13 List (java.util.List)9 MalformedURLException (java.net.MalformedURLException)8 Matcher (java.util.regex.Matcher)7 Logger (org.slf4j.Logger)7 Pattern (java.util.regex.Pattern)6 HttpGet (org.apache.http.client.methods.HttpGet)6 Jsoup (org.jsoup.Jsoup)6 LoggerFactory (org.slf4j.LoggerFactory)6