Search in sources :

Example 6 with Site

use of us.codecraft.webmagic.Site in project webmagic by code4craft.

the class FilePipelineTest method before.

@BeforeClass
public static void before() {
    resultItems = new ResultItems();
    resultItems.put("content", "webmagic 爬虫工具");
    Request request = new Request("http://www.baidu.com");
    resultItems.setRequest(request);
    task = new Task() {

        @Override
        public String getUUID() {
            return UUID.randomUUID().toString();
        }

        @Override
        public Site getSite() {
            return null;
        }
    };
}
Also used : Site(us.codecraft.webmagic.Site) Task(us.codecraft.webmagic.Task) ResultItems(us.codecraft.webmagic.ResultItems) Request(us.codecraft.webmagic.Request) BeforeClass(org.junit.BeforeClass)

Example 7 with Site

use of us.codecraft.webmagic.Site in project webmagic by code4craft.

the class HttpClientDownloaderTest method test_set_site_cookie.

@Test
public void test_set_site_cookie() throws Exception {
    HttpServer server = httpServer(13423);
    server.get(eq(cookie("cookie"), "cookie-webmagic")).response("ok");
    Runner.running(server, new Runnable() {

        @Override
        public void run() throws Exception {
            HttpClientDownloader httpClientDownloader = new HttpClientDownloader();
            Request request = new Request();
            request.setUrl("http://127.0.0.1:13423");
            Site site = Site.me().addCookie("cookie", "cookie-webmagic").setDomain("127.0.0.1");
            Page page = httpClientDownloader.download(request, site.toTask());
            assertThat(page.getRawText()).isEqualTo("ok");
        }
    });
}
Also used : Site(us.codecraft.webmagic.Site) Runnable(com.github.dreamhead.moco.Runnable) HttpServer(com.github.dreamhead.moco.HttpServer) HttpUriRequest(org.apache.http.client.methods.HttpUriRequest) Request(us.codecraft.webmagic.Request) Page(us.codecraft.webmagic.Page) IOException(java.io.IOException) UnsupportedEncodingException(java.io.UnsupportedEncodingException) Test(org.junit.Test)

Example 8 with Site

use of us.codecraft.webmagic.Site in project webmagic by code4craft.

the class SeleniumDownloader method download.

@Override
public Page download(Request request, Task task) {
    checkInit();
    WebDriver webDriver;
    try {
        webDriver = webDriverPool.get();
    } catch (InterruptedException e) {
        logger.warn("interrupted", e);
        return null;
    }
    logger.info("downloading page " + request.getUrl());
    webDriver.get(request.getUrl());
    try {
        Thread.sleep(sleepTime);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    WebDriver.Options manage = webDriver.manage();
    Site site = task.getSite();
    if (site.getCookies() != null) {
        for (Map.Entry<String, String> cookieEntry : site.getCookies().entrySet()) {
            Cookie cookie = new Cookie(cookieEntry.getKey(), cookieEntry.getValue());
            manage.addCookie(cookie);
        }
    }
    /*
		 * TODO You can add mouse event or other processes
		 * 
		 * @author: bob.li.0718@gmail.com
		 */
    WebElement webElement = webDriver.findElement(By.xpath("/html"));
    String content = webElement.getAttribute("outerHTML");
    Page page = new Page();
    page.setRawText(content);
    page.setHtml(new Html(content, request.getUrl()));
    page.setUrl(new PlainText(request.getUrl()));
    page.setRequest(request);
    webDriverPool.returnToPool(webDriver);
    return page;
}
Also used : WebDriver(org.openqa.selenium.WebDriver) Site(us.codecraft.webmagic.Site) Cookie(org.openqa.selenium.Cookie) PlainText(us.codecraft.webmagic.selector.PlainText) Html(us.codecraft.webmagic.selector.Html) Page(us.codecraft.webmagic.Page) WebElement(org.openqa.selenium.WebElement) Map(java.util.Map)

Aggregations

Site (us.codecraft.webmagic.Site)8 Request (us.codecraft.webmagic.Request)6 Test (org.junit.Test)5 Page (us.codecraft.webmagic.Page)5 HttpServer (com.github.dreamhead.moco.HttpServer)3 Runnable (com.github.dreamhead.moco.Runnable)3 IOException (java.io.IOException)3 HttpUriRequest (org.apache.http.client.methods.HttpUriRequest)3 Task (us.codecraft.webmagic.Task)3 UnsupportedEncodingException (java.io.UnsupportedEncodingException)2 Map (java.util.Map)2 Ignore (org.junit.Ignore)2 ResultItems (us.codecraft.webmagic.ResultItems)2 ArrayList (java.util.ArrayList)1 HashedMap (org.apache.commons.collections.map.HashedMap)1 CloseableHttpResponse (org.apache.http.client.methods.CloseableHttpResponse)1 CloseableHttpClient (org.apache.http.impl.client.CloseableHttpClient)1 BeforeClass (org.junit.BeforeClass)1 Cookie (org.openqa.selenium.Cookie)1 WebDriver (org.openqa.selenium.WebDriver)1