Search in sources :

Example 26 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class DuplicateRemovedSchedulerTest method test_duplicate_removed_for_get_request.

@Test
public void test_duplicate_removed_for_get_request() throws Exception {
    DuplicateRemover duplicateRemover = Mockito.mock(DuplicateRemover.class);
    duplicateRemovedScheduler.setDuplicateRemover(duplicateRemover);
    Request request = new Request("https://www.google.com/");
    request.setMethod(HttpConstant.Method.GET);
    duplicateRemovedScheduler.push(request, null);
    verify(duplicateRemover, times(1)).isDuplicate(any(Request.class), any(Task.class));
}
Also used : Task(us.codecraft.webmagic.Task) Request(us.codecraft.webmagic.Request) DuplicateRemover(us.codecraft.webmagic.scheduler.component.DuplicateRemover) Test(org.junit.Test)

Example 27 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class HttpClientDownloaderTest method testCycleTriedTimes.

@Test
public void testCycleTriedTimes() {
    HttpClientDownloader httpClientDownloader = new HttpClientDownloader();
    Task task = Site.me().setDomain("localhost").setCycleRetryTimes(5).toTask();
    Request request = new Request(PAGE_ALWAYS_NOT_EXISTS);
    Page page = httpClientDownloader.download(request, task);
    assertThat(page.getTargetRequests().size() > 0);
    assertThat((Integer) page.getTargetRequests().get(0).getExtra(Request.CYCLE_TRIED_TIMES)).isEqualTo(1);
    page = httpClientDownloader.download(page.getTargetRequests().get(0), task);
    assertThat((Integer) page.getTargetRequests().get(0).getExtra(Request.CYCLE_TRIED_TIMES)).isEqualTo(2);
}
Also used : Task(us.codecraft.webmagic.Task) Request(us.codecraft.webmagic.Request) Page(us.codecraft.webmagic.Page) Test(org.junit.Test)

Aggregations

Request (us.codecraft.webmagic.Request)27 Test (org.junit.Test)19 Page (us.codecraft.webmagic.Page)9 Ignore (org.junit.Ignore)8 Task (us.codecraft.webmagic.Task)8 Site (us.codecraft.webmagic.Site)4 DuplicateRemover (us.codecraft.webmagic.scheduler.component.DuplicateRemover)4 HttpServer (com.github.dreamhead.moco.HttpServer)3 Runnable (com.github.dreamhead.moco.Runnable)3 IOException (java.io.IOException)3 PlainText (us.codecraft.webmagic.selector.PlainText)3 UnsupportedEncodingException (java.io.UnsupportedEncodingException)2 HashSetDuplicateRemover (us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover)2 Html (us.codecraft.webmagic.selector.Html)2 Matcher (java.util.regex.Matcher)1 Pattern (java.util.regex.Pattern)1 CloseableHttpResponse (org.apache.http.client.methods.CloseableHttpResponse)1 RequestBuilder (org.apache.http.client.methods.RequestBuilder)1 CloseableHttpClient (org.apache.http.impl.client.CloseableHttpClient)1 BeforeClass (org.junit.BeforeClass)1