Search in sources :

Example 1 with Proxy

use of us.codecraft.webmagic.proxy.Proxy in project webmagic by code4craft.

the class HttpClientDownloader method download.

@Override
public Page download(Request request, Task task) {
    Site site = null;
    if (task != null) {
        site = task.getSite();
    }
    Set<Integer> acceptStatCode;
    String charset = null;
    Map<String, String> headers = null;
    if (site != null) {
        acceptStatCode = site.getAcceptStatCode();
        charset = site.getCharset();
        headers = site.getHeaders();
    } else {
        acceptStatCode = WMCollections.newHashSet(200);
    }
    logger.info("downloading page {}", request.getUrl());
    CloseableHttpResponse httpResponse = null;
    int statusCode = 0;
    try {
        HttpHost proxyHost = null;
        //TODO
        Proxy proxy = null;
        if (site != null && site.getHttpProxyPool() != null && site.getHttpProxyPool().isEnable()) {
            proxy = site.getHttpProxyFromPool();
            proxyHost = proxy.getHttpHost();
        } else if (site != null && site.getHttpProxy() != null) {
            proxyHost = site.getHttpProxy();
        }
        HttpUriRequest httpUriRequest = getHttpUriRequest(request, site, headers, proxyHost);
        httpResponse = getHttpClient(site, proxy).execute(httpUriRequest);
        statusCode = httpResponse.getStatusLine().getStatusCode();
        request.putExtra(Request.STATUS_CODE, statusCode);
        if (statusAccept(acceptStatCode, statusCode)) {
            Page page = handleResponse(request, charset, httpResponse, task);
            onSuccess(request);
            return page;
        } else {
            logger.warn("get page {} error, status code {} ", request.getUrl(), statusCode);
            return null;
        }
    } catch (IOException e) {
        logger.warn("download page {} error", request.getUrl(), e);
        if (site != null && site.getCycleRetryTimes() > 0) {
            return addToCycleRetry(request, site);
        }
        onError(request);
        return null;
    } finally {
        if (httpResponse != null) {
            //ensure the connection is released back to pool
            EntityUtils.consumeQuietly(httpResponse.getEntity());
        }
        request.putExtra(Request.STATUS_CODE, statusCode);
        if (site != null && site.getHttpProxyPool() != null && site.getHttpProxyPool().isEnable()) {
            site.returnHttpProxyToPool((HttpHost) request.getExtra(Request.PROXY), (Integer) request.getExtra(Request.STATUS_CODE));
        }
    }
}
Also used : Site(us.codecraft.webmagic.Site) HttpUriRequest(org.apache.http.client.methods.HttpUriRequest) Proxy(us.codecraft.webmagic.proxy.Proxy) HttpHost(org.apache.http.HttpHost) CloseableHttpResponse(org.apache.http.client.methods.CloseableHttpResponse) Page(us.codecraft.webmagic.Page) IOException(java.io.IOException)

Aggregations

IOException (java.io.IOException)1 HttpHost (org.apache.http.HttpHost)1 CloseableHttpResponse (org.apache.http.client.methods.CloseableHttpResponse)1 HttpUriRequest (org.apache.http.client.methods.HttpUriRequest)1 Page (us.codecraft.webmagic.Page)1 Site (us.codecraft.webmagic.Site)1 Proxy (us.codecraft.webmagic.proxy.Proxy)1