Search in sources :

Example 81 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class UrlConnectTest method invalidProxyFails.

@Test
public void invalidProxyFails() throws IOException {
    boolean caught = false;
    String url = "https://jsoup.org";
    try {
        Document doc = Jsoup.connect(url).proxy("localhost", 8889).get();
    } catch (IOException e) {
        caught = e instanceof ConnectException;
    }
    assertTrue(caught);
}
Also used : IOException(java.io.IOException) Document(org.jsoup.nodes.Document) ConnectException(java.net.ConnectException) Test(org.junit.Test)

Example 82 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class UrlConnectTest method fetchHandlesXml.

@Test
public void fetchHandlesXml() throws IOException {
    // should auto-detect xml and use XML parser, unless explicitly requested the html parser
    String xmlUrl = "http://direct.infohound.net/tools/parse-xml.xml";
    Connection con = Jsoup.connect(xmlUrl);
    Document doc = con.get();
    Connection.Request req = con.request();
    assertTrue(req.parser().getTreeBuilder() instanceof XmlTreeBuilder);
    assertEquals("<xml> <link> one </link> <table> Two </table> </xml>", StringUtil.normaliseWhitespace(doc.outerHtml()));
}
Also used : Connection(org.jsoup.Connection) Document(org.jsoup.nodes.Document) XmlTreeBuilder(org.jsoup.parser.XmlTreeBuilder) Test(org.junit.Test)

Example 83 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class UrlConnectTest method fetchURIWithWihtespace.

@Test
public void fetchURIWithWihtespace() throws IOException {
    Connection con = Jsoup.connect("http://try.jsoup.org/#with whitespaces");
    Document doc = con.get();
    assertTrue(doc.title().contains("jsoup"));
}
Also used : Connection(org.jsoup.Connection) Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Example 84 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class UrlConnectTest method handles200WithNoContent.

@Test
public void handles200WithNoContent() throws IOException {
    Connection con = Jsoup.connect("http://direct.infohound.net/tools/200-no-content.pl").userAgent(browserUa);
    Connection.Response res = con.execute();
    Document doc = res.parse();
    assertEquals(200, res.statusCode());
    con = Jsoup.connect("http://direct.infohound.net/tools/200-no-content.pl").parser(Parser.xmlParser()).userAgent(browserUa);
    res = con.execute();
    doc = res.parse();
    assertEquals(200, res.statusCode());
}
Also used : Connection(org.jsoup.Connection) Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Example 85 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class UrlConnectTest method handlesUt8fInUrl.

@Test
public void handlesUt8fInUrl() throws IOException {
    String url = "http://direct.infohound.net/tools/test💩.html";
    String urlEscaped = "http://direct.infohound.net/tools/test%F0%9F%92%A9.html";
    Connection.Response res = Jsoup.connect(url).execute();
    Document doc = res.parse();
    assertEquals("💩!", doc.body().text());
    assertEquals(urlEscaped, doc.location());
}
Also used : Connection(org.jsoup.Connection) Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Aggregations

Document (org.jsoup.nodes.Document)405 Test (org.junit.Test)194 Element (org.jsoup.nodes.Element)164 IOException (java.io.IOException)102 File (java.io.File)81 Elements (org.jsoup.select.Elements)78 ElementHandlerImpl (org.asqatasun.ruleimplementation.ElementHandlerImpl)51 ArrayList (java.util.ArrayList)41 Connection (org.jsoup.Connection)38 URL (java.net.URL)25 HashMap (java.util.HashMap)17 InputStream (java.io.InputStream)14 List (java.util.List)10 MalformedURLException (java.net.MalformedURLException)8 Logger (org.slf4j.Logger)8 Matcher (java.util.regex.Matcher)7 Jsoup (org.jsoup.Jsoup)7 LoggerFactory (org.slf4j.LoggerFactory)7 Pattern (java.util.regex.Pattern)6 HttpGet (org.apache.http.client.methods.HttpGet)6