Search in sources :

Example 41 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class ProcessorBenchmark method test.

@Ignore
@Test
public void test() {
    ModelPageProcessor modelPageProcessor = ModelPageProcessor.create(Site.me(), OschinaBlog.class);
    Page page = new Page();
    page.setRequest(new Request("http://my.oschina.net/flashsword/blog"));
    page.setUrl(new PlainText("http://my.oschina.net/flashsword/blog"));
    page.setHtml(new Html(html));
    long time = System.currentTimeMillis();
    for (int i = 0; i < 1000; i++) {
        modelPageProcessor.process(page);
    }
    System.out.println(System.currentTimeMillis() - time);
    time = System.currentTimeMillis();
    for (int i = 0; i < 1000; i++) {
        modelPageProcessor.process(page);
    }
    System.out.println(System.currentTimeMillis() - time);
}
Also used : PlainText(us.codecraft.webmagic.selector.PlainText) Request(us.codecraft.webmagic.Request) Html(us.codecraft.webmagic.selector.Html) Page(us.codecraft.webmagic.Page) Ignore(org.junit.Ignore) Test(org.junit.Test)

Example 42 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class RedisSchedulerTest method test.

@Ignore("environment depended")
@Test
public void test() {
    Task task = new Task() {

        @Override
        public String getUUID() {
            return "1";
        }

        @Override
        public Site getSite() {
            return null;
        }
    };
    Request request = new Request("http://www.ibm.com/developerworks/cn/java/j-javadev2-22/");
    request.putExtra("1", "2");
    redisScheduler.push(request, task);
    Request poll = redisScheduler.poll(task);
    assertThat(poll).isEqualTo(request);
}
Also used : Task(us.codecraft.webmagic.Task) Request(us.codecraft.webmagic.Request) Ignore(org.junit.Ignore) Test(org.junit.Test)

Example 43 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class RequestUtilsTest method test_generate_range.

@Test
public void test_generate_range() throws Exception {
    List<Request> requests = RequestUtils.from("http://angularjs.cn/api/article/latest?p=[1-3]&s=20");
    assertThat(requests).containsExactly(new Request("http://angularjs.cn/api/article/latest?p=1&s=20"), new Request("http://angularjs.cn/api/article/latest?p=2&s=20"), new Request("http://angularjs.cn/api/article/latest?p=3&s=20"));
}
Also used : Request(us.codecraft.webmagic.Request) Test(org.junit.Test)

Example 44 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class RequestUtils method from.

public static List<Request> from(String exp) {
    Matcher matcher = p4Range.matcher(exp);
    if (!matcher.find()) {
        return Collections.singletonList(new Request(exp));
    }
    int rangeFrom = Integer.parseInt(matcher.group(1));
    int rangeTo = Integer.parseInt(matcher.group(2));
    if (rangeFrom > rangeTo) {
        return Collections.emptyList();
    }
    List<Request> requests = new ArrayList<Request>(rangeTo - rangeFrom + 1);
    for (int i = rangeFrom; i <= rangeTo; i++) {
        requests.add(new Request(matcher.replaceAll(String.valueOf(i))));
    }
    return requests;
}
Also used : Matcher(java.util.regex.Matcher) Request(us.codecraft.webmagic.Request) ArrayList(java.util.ArrayList)

Example 45 with Request

use of us.codecraft.webmagic.Request in project webmagic by code4craft.

the class PageMocker method getMockPage.

public Page getMockPage() throws IOException {
    Page page = new Page();
    page.setRawText(IOUtils.toString(PageMocker.class.getClassLoader().getResourceAsStream("html/mock-webmagic.html")));
    page.setRequest(new Request("http://webmagic.io/list/0"));
    page.setUrl(new PlainText("http://webmagic.io/list/0"));
    return page;
}
Also used : PlainText(us.codecraft.webmagic.selector.PlainText) Request(us.codecraft.webmagic.Request) Page(us.codecraft.webmagic.Page)

Aggregations

Request (us.codecraft.webmagic.Request)45 Test (org.junit.Test)32 Page (us.codecraft.webmagic.Page)22 HttpUriRequest (org.apache.http.client.methods.HttpUriRequest)13 HttpServer (com.github.dreamhead.moco.HttpServer)12 Runnable (com.github.dreamhead.moco.Runnable)12 IOException (java.io.IOException)12 UnsupportedEncodingException (java.io.UnsupportedEncodingException)11 Task (us.codecraft.webmagic.Task)10 Ignore (org.junit.Ignore)8 Site (us.codecraft.webmagic.Site)6 PlainText (us.codecraft.webmagic.selector.PlainText)6 DuplicateRemover (us.codecraft.webmagic.scheduler.component.DuplicateRemover)4 Matcher (java.util.regex.Matcher)2 ResultItems (us.codecraft.webmagic.ResultItems)2 HashSetDuplicateRemover (us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover)2 JSONObject (com.alibaba.fastjson.JSONObject)1 URI (java.net.URI)1 ArrayList (java.util.ArrayList)1 Map (java.util.Map)1