Search in sources :

Example 6 with HtmlListItem

use of com.gargoylesoftware.htmlunit.html.HtmlListItem in project core by z1lc.

the class MovieEtl method getObjects.

@Override
public List<Movie> getObjects() {
    URI wantToSeeUri = Unchecked.get(() -> new URIBuilder().setScheme("https").setHost("rottentomatoes.com").setPath(String.format("user/id/%s/wts/", USER_ID)).setParameter("mediaType", "1").setParameter("wtsni", "wts").build());
    try (WebClient webClient = CommonProvider.getHtmlUnitWebClient()) {
        HtmlPage page = CommonProvider.retrying().get(() -> webClient.getPage(wantToSeeUri.toURL()));
        List<HtmlListItem> ul = page.getByXPath("//li[contains(@class, 'bottom_divider')]");
        return ul.stream().map(li -> {
            HtmlElement link = li.getElementsByTagName("a").get(0);
            String href = link.getAttribute("href");
            String title = link.getAttribute("title");
            String score = li.getElementsByTagName("span").stream().filter(elem -> elem.getAttribute("class").equals("tMeterScore")).findFirst().orElseThrow().asNormalizedText();
            Matcher matcher = yearRegex.matcher(li.asNormalizedText());
            Long year = null;
            if (matcher.find()) {
                year = Long.parseLong(matcher.group());
            }
            return Movie.MovieBuilder.aMovie().withId(ID_ISSUER.getAndIncrement()).withTitle(title).withUrl("https://www.rottentomatoes.com/" + href).withRating(Long.valueOf(score.replace("%", ""))).withYear(year).build();
        }).collect(Collectors.toList());
    }
}
Also used : Etl(com.robertsanek.data.etl.Etl) URIBuilder(org.apache.http.client.utils.URIBuilder) HtmlPage(com.gargoylesoftware.htmlunit.html.HtmlPage) Unchecked(com.robertsanek.util.Unchecked) Collectors(java.util.stream.Collectors) CommonProvider(com.robertsanek.util.CommonProvider) HtmlListItem(com.gargoylesoftware.htmlunit.html.HtmlListItem) HtmlElement(com.gargoylesoftware.htmlunit.html.HtmlElement) AtomicLong(java.util.concurrent.atomic.AtomicLong) List(java.util.List) Matcher(java.util.regex.Matcher) DoNotRun(com.robertsanek.data.etl.DoNotRun) WebClient(com.gargoylesoftware.htmlunit.WebClient) URI(java.net.URI) Pattern(java.util.regex.Pattern) HtmlPage(com.gargoylesoftware.htmlunit.html.HtmlPage) Matcher(java.util.regex.Matcher) HtmlElement(com.gargoylesoftware.htmlunit.html.HtmlElement) AtomicLong(java.util.concurrent.atomic.AtomicLong) HtmlListItem(com.gargoylesoftware.htmlunit.html.HtmlListItem) URI(java.net.URI) WebClient(com.gargoylesoftware.htmlunit.WebClient) URIBuilder(org.apache.http.client.utils.URIBuilder)

Aggregations

HtmlListItem (com.gargoylesoftware.htmlunit.html.HtmlListItem)6 HtmlElement (com.gargoylesoftware.htmlunit.html.HtmlElement)5 HtmlPage (com.gargoylesoftware.htmlunit.html.HtmlPage)5 HtmlAnchor (com.gargoylesoftware.htmlunit.html.HtmlAnchor)3 HtmlDivision (com.gargoylesoftware.htmlunit.html.HtmlDivision)3 HtmlHeading2 (com.gargoylesoftware.htmlunit.html.HtmlHeading2)3 HtmlParagraph (com.gargoylesoftware.htmlunit.html.HtmlParagraph)3 HtmlSection (com.gargoylesoftware.htmlunit.html.HtmlSection)3 HtmlUnorderedList (com.gargoylesoftware.htmlunit.html.HtmlUnorderedList)3 HtmlDefinitionDescription (com.gargoylesoftware.htmlunit.html.HtmlDefinitionDescription)2 HtmlDefinitionList (com.gargoylesoftware.htmlunit.html.HtmlDefinitionList)2 HtmlDefinitionTerm (com.gargoylesoftware.htmlunit.html.HtmlDefinitionTerm)2 HtmlOrderedList (com.gargoylesoftware.htmlunit.html.HtmlOrderedList)2 WebClient (com.gargoylesoftware.htmlunit.WebClient)1 DomNode (com.gargoylesoftware.htmlunit.html.DomNode)1 HtmlHeading4 (com.gargoylesoftware.htmlunit.html.HtmlHeading4)1 HtmlInput (com.gargoylesoftware.htmlunit.html.HtmlInput)1 HtmlPreformattedText (com.gargoylesoftware.htmlunit.html.HtmlPreformattedText)1 HtmlTable (com.gargoylesoftware.htmlunit.html.HtmlTable)1 HtmlTableDataCell (com.gargoylesoftware.htmlunit.html.HtmlTableDataCell)1