Search in sources :

Example 56 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class SelectorTest method testRelaxedTags.

@Test
public void testRelaxedTags() {
    Document doc = Jsoup.parse("<abc_def id=1>Hello</abc_def> <abc-def id=2>There</abc-def>");
    Elements el1 = doc.select("abc_def");
    assertEquals(1, el1.size());
    assertEquals("1", el1.first().id());
    Elements el2 = doc.select("abc-def");
    assertEquals(1, el2.size());
    assertEquals("2", el2.first().id());
}
Also used : Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Example 57 with Document

use of org.jsoup.nodes.Document in project ZhihuDailyPurify by izzyleung.

the class NewsListFromZhihuObservable method convertToDailyNews.

private static Optional<DailyNews> convertToDailyNews(Pair<Story, Document> pair) {
    DailyNews result = null;
    Story story = pair.first;
    Document document = pair.second;
    String dailyTitle = story.getDailyTitle();
    List<Question> questions = getQuestions(document, dailyTitle);
    if (Stream.of(questions).allMatch(Question::isValidZhihuQuestion)) {
        result = new DailyNews();
        result.setDailyTitle(dailyTitle);
        result.setThumbnailUrl(story.getThumbnailUrl());
        result.setQuestions(questions);
    }
    return Optional.ofNullable(result);
}
Also used : Question(io.github.izzyleung.zhihudailypurify.bean.Question) DailyNews(io.github.izzyleung.zhihudailypurify.bean.DailyNews) Document(org.jsoup.nodes.Document) Story(io.github.izzyleung.zhihudailypurify.bean.Story)

Example 58 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class ParseTest method testYahooJp.

@Test
public void testYahooJp() throws IOException {
    File in = getFile("/htmltests/yahoo-jp.html");
    // http charset is utf-8.
    Document doc = Jsoup.parse(in, "UTF-8", "http://www.yahoo.co.jp/index.html");
    assertEquals("Yahoo! JAPAN", doc.title());
    Element a = doc.select("a[href=t/2322m2]").first();
    assertEquals("http://www.yahoo.co.jp/_ylh=X3oDMTB0NWxnaGxsBF9TAzIwNzcyOTYyNjUEdGlkAzEyBHRtcGwDZ2Ex/t/2322m2", // session put into <base>
    a.attr("abs:href"));
    assertEquals("全国、人気の駅ランキング", a.text());
}
Also used : Element(org.jsoup.nodes.Element) Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Example 59 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class ParseTest method testHtml5Charset.

@Test
public void testHtml5Charset() throws IOException {
    // test that <meta charset="gb2312"> works
    File in = getFile("/htmltests/meta-charset-1.html");
    //gb2312, has html5 <meta charset>
    Document doc = Jsoup.parse(in, null, "http://example.com/");
    assertEquals("新", doc.text());
    assertEquals("GB2312", doc.outputSettings().charset().displayName());
    // double check, no charset, falls back to utf8 which is incorrect
    //
    in = getFile("/htmltests/meta-charset-2.html");
    // gb2312, no charset
    doc = Jsoup.parse(in, null, "http://example.com");
    assertEquals("UTF-8", doc.outputSettings().charset().displayName());
    assertFalse("新".equals(doc.text()));
    // confirm fallback to utf8
    in = getFile("/htmltests/meta-charset-3.html");
    // utf8, no charset
    doc = Jsoup.parse(in, null, "http://example.com/");
    assertEquals("UTF-8", doc.outputSettings().charset().displayName());
    assertEquals("新", doc.text());
}
Also used : Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Example 60 with Document

use of org.jsoup.nodes.Document in project jsoup by jhy.

the class ParseTest method testNewsHomepage.

@Test
public void testNewsHomepage() throws IOException {
    File in = getFile("/htmltests/news-com-au-home.html");
    Document doc = Jsoup.parse(in, "UTF-8", "http://www.news.com.au/");
    assertEquals("News.com.au | News from Australia and around the world online | NewsComAu", doc.title());
    assertEquals("Brace yourself for Metro meltdown", doc.select(".id1225817868581 h4").text().trim());
    Element a = doc.select("a[href=/entertainment/horoscopes]").first();
    assertEquals("/entertainment/horoscopes", a.attr("href"));
    assertEquals("http://www.news.com.au/entertainment/horoscopes", a.attr("abs:href"));
    Element hs = doc.select("a[href*=naughty-corners-are-a-bad-idea]").first();
    assertEquals("http://www.heraldsun.com.au/news/naughty-corners-are-a-bad-idea-for-kids/story-e6frf7jo-1225817899003", hs.attr("href"));
    assertEquals(hs.attr("href"), hs.attr("abs:href"));
}
Also used : Element(org.jsoup.nodes.Element) Document(org.jsoup.nodes.Document) Test(org.junit.Test)

Aggregations

Document (org.jsoup.nodes.Document)405 Test (org.junit.Test)194 Element (org.jsoup.nodes.Element)164 IOException (java.io.IOException)102 File (java.io.File)81 Elements (org.jsoup.select.Elements)78 ElementHandlerImpl (org.asqatasun.ruleimplementation.ElementHandlerImpl)51 ArrayList (java.util.ArrayList)41 Connection (org.jsoup.Connection)38 URL (java.net.URL)25 HashMap (java.util.HashMap)17 InputStream (java.io.InputStream)14 List (java.util.List)10 MalformedURLException (java.net.MalformedURLException)8 Logger (org.slf4j.Logger)8 Matcher (java.util.regex.Matcher)7 Jsoup (org.jsoup.Jsoup)7 LoggerFactory (org.slf4j.LoggerFactory)7 Pattern (java.util.regex.Pattern)6 HttpGet (org.apache.http.client.methods.HttpGet)6