Search in sources :

Example 6 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class ModelPageProcessorTest method getMockPage.

private Page getMockPage() throws IOException {
    Page page = new Page();
    page.setRawText(IOUtils.toString(getClass().getClassLoader().getResourceAsStream("html/mock-webmagic.html")));
    page.setRequest(new Request("http://webmagic.io/list/0"));
    page.setUrl(new PlainText("http://webmagic.io/list/0"));
    return page;
}
Also used : PlainText(us.codecraft.webmagic.selector.PlainText) Request(us.codecraft.webmagic.Request) Page(us.codecraft.webmagic.Page)

Example 7 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class ModelPageProcessorTest method testMultiModel_should_not_skip_when_match.

@Test
public void testMultiModel_should_not_skip_when_match() throws Exception {
    Page page = new Page();
    page.setRawText("<div foo='foo'></div>");
    page.setRequest(new Request("http://codecraft.us/foo"));
    page.setUrl(PlainText.create("http://codecraft.us/foo"));
    ModelPageProcessor modelPageProcessor = ModelPageProcessor.create(null, ModelFoo.class, ModelBar.class);
    modelPageProcessor.process(page);
    assertThat(page.getResultItems().isSkip()).isFalse();
}
Also used : Request(us.codecraft.webmagic.Request) Page(us.codecraft.webmagic.Page) Test(org.junit.Test)

Example 8 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class ModelPageProcessorTest method testExtractLinks.

@Test
public void testExtractLinks() throws Exception {
    ModelPageProcessor modelPageProcessor = ModelPageProcessor.create(null, MockModel.class);
    Page page = getMockPage();
    modelPageProcessor.process(page);
    assertThat(page.getTargetRequests()).containsExactly(new Request("http://webmagic.io/list/1"), new Request("http://webmagic.io/list/2"), new Request("http://webmagic.io/post/1"), new Request("http://webmagic.io/post/2"));
}
Also used : Request(us.codecraft.webmagic.Request) Page(us.codecraft.webmagic.Page) Test(org.junit.Test)

Example 9 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class BloomFilterDuplicateRemoverTest method testMemory.

@Ignore("long time")
@Test
public void testMemory() throws Exception {
    int times = 5000000;
    DuplicateRemover duplicateRemover = new BloomFilterDuplicateRemover(times, 0.005);
    long freeMemory = Runtime.getRuntime().freeMemory();
    long time = System.currentTimeMillis();
    for (int i = 0; i < times; i++) {
        duplicateRemover.isDuplicate(new Request(String.valueOf(i)), null);
    }
    System.out.println("Time used by bloomfilter:" + (System.currentTimeMillis() - time));
    System.out.println("Memory used by bloomfilter:" + (freeMemory - Runtime.getRuntime().freeMemory()));
    duplicateRemover = new HashSetDuplicateRemover();
    System.gc();
    freeMemory = Runtime.getRuntime().freeMemory();
    time = System.currentTimeMillis();
    for (int i = 0; i < times; i++) {
        duplicateRemover.isDuplicate(new Request(String.valueOf(i)), null);
    }
    System.out.println("Time used by hashset:" + (System.currentTimeMillis() - time));
    System.out.println("Memory used by hashset:" + (freeMemory - Runtime.getRuntime().freeMemory()));
}
Also used : HashSetDuplicateRemover(us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover) Request(us.codecraft.webmagic.Request) DuplicateRemover(us.codecraft.webmagic.scheduler.component.DuplicateRemover) HashSetDuplicateRemover(us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover) Ignore(org.junit.Ignore) Test(org.junit.Test)

Example 10 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class BloomFilterDuplicateRemoverTest method testRemove.

@Test
public void testRemove() throws Exception {
    BloomFilterDuplicateRemover bloomFilterDuplicateRemover = new BloomFilterDuplicateRemover(10);
    boolean isDuplicate = bloomFilterDuplicateRemover.isDuplicate(new Request("a"), null);
    assertThat(isDuplicate).isFalse();
    isDuplicate = bloomFilterDuplicateRemover.isDuplicate(new Request("a"), null);
    assertThat(isDuplicate).isTrue();
    isDuplicate = bloomFilterDuplicateRemover.isDuplicate(new Request("b"), null);
    assertThat(isDuplicate).isFalse();
    isDuplicate = bloomFilterDuplicateRemover.isDuplicate(new Request("b"), null);
    assertThat(isDuplicate).isTrue();
}
Also used : Request(us.codecraft.webmagic.Request) Test(org.junit.Test)

Aggregations

Request (us.codecraft.webmagic.Request)27 Test (org.junit.Test)19 Page (us.codecraft.webmagic.Page)9 Ignore (org.junit.Ignore)8 Task (us.codecraft.webmagic.Task)8 Site (us.codecraft.webmagic.Site)4 DuplicateRemover (us.codecraft.webmagic.scheduler.component.DuplicateRemover)4 HttpServer (com.github.dreamhead.moco.HttpServer)3 Runnable (com.github.dreamhead.moco.Runnable)3 IOException (java.io.IOException)3 PlainText (us.codecraft.webmagic.selector.PlainText)3 UnsupportedEncodingException (java.io.UnsupportedEncodingException)2 HashSetDuplicateRemover (us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover)2 Html (us.codecraft.webmagic.selector.Html)2 Matcher (java.util.regex.Matcher)1 Pattern (java.util.regex.Pattern)1 CloseableHttpResponse (org.apache.http.client.methods.CloseableHttpResponse)1 RequestBuilder (org.apache.http.client.methods.RequestBuilder)1 CloseableHttpClient (org.apache.http.impl.client.CloseableHttpClient)1 BeforeClass (org.junit.BeforeClass)1