Search in sources :

Example 6 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class ParseTest method testNewsHomepage.

@Test
public void testNewsHomepage() throws IOException {
    File in = getFile("/htmltests/news-com-au-home.html");
    Document doc = Jsoup.parse(in, "UTF-8", "http://www.news.com.au/");
    assertEquals("News.com.au | News from Australia and around the world online | NewsComAu", doc.title());
    assertEquals("Brace yourself for Metro meltdown", doc.select(".id1225817868581 h4").text().trim());
    Element a = doc.select("a[href=/entertainment/horoscopes]").first();
    assertEquals("/entertainment/horoscopes", a.attr("href"));
    assertEquals("http://www.news.com.au/entertainment/horoscopes", a.attr("abs:href"));
    Element hs = doc.select("a[href*=naughty-corners-are-a-bad-idea]").first();
    assertEquals("http://www.heraldsun.com.au/news/naughty-corners-are-a-bad-idea-for-kids/story-e6frf7jo-1225817899003", hs.attr("href"));
    assertEquals(hs.attr("href"), hs.attr("abs:href"));
}
Also used : Element(org.jsoup.nodes.Element) Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Example 7 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class ParseTest method testBinary.

@Test
public void testBinary() throws IOException {
    File in = getFile("/htmltests/thumb.jpg");
    Document doc = Jsoup.parse(in, "UTF-8");
    // nothing useful, but did not blow up
    assertTrue(doc.text().contains("gd-jpeg"));
}
Also used : Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Example 8 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class ParseTest method testNytArticle.

@Test
public void testNytArticle() throws IOException {
    // has tags like <nyt_text>
    File in = getFile("/htmltests/nyt-article-1.html");
    Document doc = Jsoup.parse(in, null, "http://www.nytimes.com/2010/07/26/business/global/26bp.html?hp");
    Element headline = doc.select("nyt_headline[version=1.0]").first();
    assertEquals("As BP Lays Out Future, It Will Not Include Hayward", headline.text());
}
Also used : Element(org.jsoup.nodes.Element) Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Example 9 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class ParseTest method testBrokenHtml5CharsetWithASingleDoubleQuote.

@Test
public void testBrokenHtml5CharsetWithASingleDoubleQuote() throws IOException {
    InputStream in = inputStreamFrom("<html>\n" + "<head><meta charset=UTF-8\"></head>\n" + "<body></body>\n" + "</html>");
    Document doc = Jsoup.parse(in, null, "http://example.com/");
    assertEquals("UTF-8", doc.outputSettings().charset().displayName());
}
Also used : Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Example 10 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class ParseTest method testGoogleSearchIpod.

@Test
public void testGoogleSearchIpod() throws IOException {
    File in = getFile("/htmltests/google-ipod.html");
    Document doc = Jsoup.parse(in, "UTF-8", "http://www.google.com/search?hl=en&q=ipod&aq=f&oq=&aqi=g10");
    assertEquals("ipod - Google Search", doc.title());
    Elements results = doc.select("h3.r > a");
    assertEquals(12, results.size());
    assertEquals("http://news.google.com/news?hl=en&q=ipod&um=1&ie=UTF-8&ei=uYlKS4SbBoGg6gPf-5XXCw&sa=X&oi=news_group&ct=title&resnum=1&ved=0CCIQsQQwAA", results.get(0).attr("href"));
    assertEquals("http://www.apple.com/itunes/", results.get(1).attr("href"));
}
Also used : Document(org.jsoup.nodes.Document) Elements(org.jsoup.select.Elements) Test(org.junit.Test)

Aggregations

Document (org.jsoup.nodes.Document)391 Test (org.junit.Test)194 Element (org.jsoup.nodes.Element)153 IOException (java.io.IOException)100 File (java.io.File)81 Elements (org.jsoup.select.Elements)70 ElementHandlerImpl (org.asqatasun.ruleimplementation.ElementHandlerImpl)51 Connection (org.jsoup.Connection)37 ArrayList (java.util.ArrayList)36 URL (java.net.URL)24 HashMap (java.util.HashMap)16 InputStream (java.io.InputStream)13 List (java.util.List)9 MalformedURLException (java.net.MalformedURLException)8 Matcher (java.util.regex.Matcher)7 Logger (org.slf4j.Logger)7 Pattern (java.util.regex.Pattern)6 HttpGet (org.apache.http.client.methods.HttpGet)6 Jsoup (org.jsoup.Jsoup)6 LoggerFactory (org.slf4j.LoggerFactory)6