Search in sources :

Example 36 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class HttpClientDownloaderTest method test_set_site_cookie.

@Test
public void test_set_site_cookie() throws Exception {
    HttpServer server = httpServer(13423);
    server.get(eq(cookie("cookie"), "cookie-webmagic")).response("ok");
    Runner.running(server, new Runnable() {

        @Override
        public void run() throws Exception {
            HttpClientDownloader httpClientDownloader = new HttpClientDownloader();
            Request request = new Request();
            request.setUrl("http://127.0.0.1:13423");
            Site site = Site.me().addCookie("cookie", "cookie-webmagic").setDomain("127.0.0.1");
            Page page = httpClientDownloader.download(request, site.toTask());
            assertThat(page.getRawText()).isEqualTo("ok");
        }
    });
}
Also used : Site(us.codecraft.webmagic.Site) Runnable(com.github.dreamhead.moco.Runnable) HttpServer(com.github.dreamhead.moco.HttpServer) HttpUriRequest(org.apache.http.client.methods.HttpUriRequest) Request(us.codecraft.webmagic.Request) Page(us.codecraft.webmagic.Page) IOException(java.io.IOException) UnsupportedEncodingException(java.io.UnsupportedEncodingException) Test(org.junit.Test)

Example 37 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class HttpClientDownloaderTest method test_disableCookieManagement.

@Test
public void test_disableCookieManagement() throws Exception {
    HttpServer server = httpServer(13423);
    server.get(not(eq(cookie("cookie"), "cookie-webmagic"))).response("ok");
    Runner.running(server, new Runnable() {

        @Override
        public void run() throws Exception {
            HttpClientDownloader httpClientDownloader = new HttpClientDownloader();
            Request request = new Request();
            request.setUrl("http://127.0.0.1:13423");
            request.addCookie("cookie", "cookie-webmagic");
            Page page = httpClientDownloader.download(request, Site.me().setDisableCookieManagement(true).toTask());
            assertThat(page.getRawText()).isEqualTo("ok");
        }
    });
}
Also used : Runnable(com.github.dreamhead.moco.Runnable) HttpServer(com.github.dreamhead.moco.HttpServer) HttpUriRequest(org.apache.http.client.methods.HttpUriRequest) Request(us.codecraft.webmagic.Request) Page(us.codecraft.webmagic.Page) IOException(java.io.IOException) UnsupportedEncodingException(java.io.UnsupportedEncodingException) Test(org.junit.Test)

Example 38 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class HttpClientDownloaderTest method test_download_binary_content.

@Test
public void test_download_binary_content() throws Exception {
    HttpServer server = httpServer(13423);
    server.response("binary");
    Runner.running(server, new Runnable() {

        @Override
        public void run() throws Exception {
            final HttpClientDownloader httpClientDownloader = new HttpClientDownloader();
            Request request = new Request();
            request.setBinaryContent(true);
            request.setUrl("http://127.0.0.1:13423/");
            Page page = httpClientDownloader.download(request, Site.me().toTask());
            assertThat(page.getRawText()).isNull();
            assertThat(page.getBytes()).isEqualTo("binary".getBytes());
        }
    });
}
Also used : Runnable(com.github.dreamhead.moco.Runnable) HttpServer(com.github.dreamhead.moco.HttpServer) HttpUriRequest(org.apache.http.client.methods.HttpUriRequest) Request(us.codecraft.webmagic.Request) Page(us.codecraft.webmagic.Page) IOException(java.io.IOException) UnsupportedEncodingException(java.io.UnsupportedEncodingException) Test(org.junit.Test)

Example 39 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class HttpClientDownloaderTest method test_download_fail.

@Test
public void test_download_fail() {
    HttpClientDownloader httpClientDownloader = new HttpClientDownloader();
    Task task = Site.me().setDomain("localhost").setCycleRetryTimes(5).toTask();
    Request request = new Request(PAGE_ALWAYS_NOT_EXISTS);
    Page page = httpClientDownloader.download(request, task);
    assertThat(page.isDownloadSuccess()).isFalse();
}
Also used : Task(us.codecraft.webmagic.Task) HttpUriRequest(org.apache.http.client.methods.HttpUriRequest) Request(us.codecraft.webmagic.Request) Page(us.codecraft.webmagic.Page) Test(org.junit.Test)

Example 40 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class ZipCodePageProcessor method processProvince.

private void processProvince(Page page) {
    // 这里仅靠xpath没法精准定位,所以使用正则作为筛选,不符合正则的会被过滤掉
    List<String> districts = page.getHtml().xpath("//body/table/tbody/tr[@bgcolor=\"#ffffff\"]").all();
    Pattern pattern = Pattern.compile("<td>([^<>]+)</td>.*?href=\"(.*?)\"", Pattern.DOTALL);
    for (String district : districts) {
        Matcher matcher = pattern.matcher(district);
        while (matcher.find()) {
            String title = matcher.group(1);
            String link = matcher.group(2);
            Request request = new Request(link).setPriority(1).putExtra("province", page.getRequest().getExtra("province")).putExtra("district", title);
            page.addTargetRequest(request);
        }
    }
}
Also used : Pattern(java.util.regex.Pattern) Matcher(java.util.regex.Matcher) Request(us.codecraft.webmagic.Request)

Aggregations

Request (us.codecraft.webmagic.Request)45 Test (org.junit.Test)32 Page (us.codecraft.webmagic.Page)22 HttpUriRequest (org.apache.http.client.methods.HttpUriRequest)13 HttpServer (com.github.dreamhead.moco.HttpServer)12 Runnable (com.github.dreamhead.moco.Runnable)12 IOException (java.io.IOException)12 UnsupportedEncodingException (java.io.UnsupportedEncodingException)11 Task (us.codecraft.webmagic.Task)10 Ignore (org.junit.Ignore)8 Site (us.codecraft.webmagic.Site)6 PlainText (us.codecraft.webmagic.selector.PlainText)6 DuplicateRemover (us.codecraft.webmagic.scheduler.component.DuplicateRemover)4 Matcher (java.util.regex.Matcher)2 ResultItems (us.codecraft.webmagic.ResultItems)2 HashSetDuplicateRemover (us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover)2 JSONObject (com.alibaba.fastjson.JSONObject)1 URI (java.net.URI)1 ArrayList (java.util.ArrayList)1 Map (java.util.Map)1