Search in sources :

Example 6 with PlainText

use of us.codecraft.webmagic.selector.PlainText in project webmagic by code4craft.

the class HttpClientDownloader method handleResponse.

protected Page handleResponse(Request request, String charset, HttpResponse httpResponse, Task task) throws IOException {
    String content = getContent(charset, httpResponse);
    Page page = new Page();
    page.setRawText(content);
    page.setUrl(new PlainText(request.getUrl()));
    page.setRequest(request);
    page.setStatusCode(httpResponse.getStatusLine().getStatusCode());
    return page;
}
Also used : PlainText(us.codecraft.webmagic.selector.PlainText) Page(us.codecraft.webmagic.Page)

Example 7 with PlainText

use of us.codecraft.webmagic.selector.PlainText in project webmagic by code4craft.

the class PhantomJSDownloader method download.

@Override
public Page download(Request request, Task task) {
    if (logger.isInfoEnabled()) {
        logger.info("downloading page: " + request.getUrl());
    }
    String content = getPage(request);
    if (content.contains("HTTP request failed")) {
        for (int i = 1; i <= getRetryNum(); i++) {
            content = getPage(request);
            if (!content.contains("HTTP request failed")) {
                break;
            }
        }
        if (content.contains("HTTP request failed")) {
            //when failed
            Page page = new Page();
            page.setRequest(request);
            return page;
        }
    }
    Page page = new Page();
    page.setRawText(content);
    page.setUrl(new PlainText(request.getUrl()));
    page.setRequest(request);
    page.setStatusCode(200);
    return page;
}
Also used : PlainText(us.codecraft.webmagic.selector.PlainText) Page(us.codecraft.webmagic.Page)

Aggregations

PlainText (us.codecraft.webmagic.selector.PlainText)7 Page (us.codecraft.webmagic.Page)6 Request (us.codecraft.webmagic.Request)3 Html (us.codecraft.webmagic.selector.Html)3 Map (java.util.Map)1 Ignore (org.junit.Ignore)1 Test (org.junit.Test)1 Cookie (org.openqa.selenium.Cookie)1 WebDriver (org.openqa.selenium.WebDriver)1 WebElement (org.openqa.selenium.WebElement)1 Site (us.codecraft.webmagic.Site)1