Search in sources :

Example 1 with ProxyBean

use of com.cdeledu.crawler.common.bean.ProxyBean in project wechat by dllwh.

the class HtmlUnitHandler method getWebClient.

/**
 * ----------------------------------------------------- Fields start
 */
/**
 * ----------------------------------------------------- Fields end
 */
/**
 * ----------------------------------------------- [私有方法]
 */
/**
 * @方法描述: 模拟特定浏览器
 * @param browser
 * @param proxy
 * @return
 */
private WebClient getWebClient(CrawlParameter crawlPara) {
    /**
     * 模拟一个浏览器,可以选择IE、Chrome、Firefox等等
     */
    WebClient webClient = null;
    BrowserVersion browser = crawlPara.getBrowse();
    if (null == crawlPara.getProxy()) {
        webClient = new WebClient(browser);
    } else {
        // 代理服务器的配置,代理的配置很简单,只需要配置好地址、端口、用户名与密码即可
        ProxyBean proxy = crawlPara.getProxy();
        webClient = new WebClient(browser, proxy.getProxyHost(), proxy.getProxyPort());
    }
    /**
     * 设置webClient的相关参数
     */
    // 启用JavaScript解释器,默认为true(对于某些动态页面,这是必须的)
    webClient.getOptions().setJavaScriptEnabled(crawlPara.isUseJs());
    // 禁用css支持,可避免自动二次请求CSS进行渲染(对于某些动态页面,这是必须的)
    webClient.getOptions().setCssEnabled(false);
    // 启动客户端重定向
    // webClient.getOptions().setRedirectEnabled(true);
    // 忽略ssl认证
    webClient.getOptions().setUseInsecureSSL(true);
    // JavaScript运行错误时,是否抛出异常
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    // JavaScript运行错误时,是否抛出 response 的错误
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
    // 设置连接超时时间 ,这里是10S。如果为0,则无限期等待
    webClient.getOptions().setTimeout(10 * 1000);
    // 设置Ajax异步
    webClient.setAjaxController(new NicelyResynchronizingAjaxController());
    webClient.setJavaScriptTimeout(600 * 1000);
    webClient.getOptions().setActiveXNative(false);
    return webClient;
}
Also used : ProxyBean(com.cdeledu.crawler.common.bean.ProxyBean) NicelyResynchronizingAjaxController(com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController) WebClient(com.gargoylesoftware.htmlunit.WebClient) BrowserVersion(com.gargoylesoftware.htmlunit.BrowserVersion)

Aggregations

ProxyBean (com.cdeledu.crawler.common.bean.ProxyBean)1 BrowserVersion (com.gargoylesoftware.htmlunit.BrowserVersion)1 NicelyResynchronizingAjaxController (com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController)1 WebClient (com.gargoylesoftware.htmlunit.WebClient)1