Search in sources :

Example 1 with Scheduler

use of us.codecraft.webmagic.scheduler.Scheduler in project webmagic by code4craft.

the class SpiderTest method testRound.

private void testRound() {
    Spider spider = Spider.create(new PageProcessor() {

        private AtomicInteger count = new AtomicInteger();

        @Override
        public void process(Page page) {
            page.setSkip(true);
        }

        @Override
        public Site getSite() {
            return Site.me().setSleepTime(0);
        }
    }).setDownloader(new Downloader() {

        @Override
        public Page download(Request request, Task task) {
            return new Page().setRawText("");
        }

        @Override
        public void setThread(int threadNum) {
        }
    }).setScheduler(new Scheduler() {

        private AtomicInteger count = new AtomicInteger();

        private Random random = new Random();

        @Override
        public void push(Request request, Task task) {
        }

        @Override
        public synchronized Request poll(Task task) {
            if (count.incrementAndGet() > 1000) {
                return null;
            }
            if (random.nextInt(100) > 90) {
                return null;
            }
            return new Request("test");
        }
    }).thread(10);
    spider.run();
}
Also used : PageProcessor(us.codecraft.webmagic.processor.PageProcessor) SimplePageProcessor(us.codecraft.webmagic.processor.SimplePageProcessor) Random(java.util.Random) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) Scheduler(us.codecraft.webmagic.scheduler.Scheduler) Downloader(us.codecraft.webmagic.downloader.Downloader)

Example 2 with Scheduler

use of us.codecraft.webmagic.scheduler.Scheduler in project webmagic by code4craft.

the class Spider method setScheduler.

/**
     * set scheduler for Spider
     *
     * @param scheduler scheduler
     * @return this
     * @see Scheduler
     * @since 0.2.1
     */
public Spider setScheduler(Scheduler scheduler) {
    checkIfRunning();
    Scheduler oldScheduler = this.scheduler;
    this.scheduler = scheduler;
    if (oldScheduler != null) {
        Request request;
        while ((request = oldScheduler.poll(this)) != null) {
            this.scheduler.push(request, this);
        }
    }
    return this;
}
Also used : Scheduler(us.codecraft.webmagic.scheduler.Scheduler) QueueScheduler(us.codecraft.webmagic.scheduler.QueueScheduler)

Aggregations

Scheduler (us.codecraft.webmagic.scheduler.Scheduler)2 Random (java.util.Random)1 AtomicInteger (java.util.concurrent.atomic.AtomicInteger)1 Downloader (us.codecraft.webmagic.downloader.Downloader)1 PageProcessor (us.codecraft.webmagic.processor.PageProcessor)1 SimplePageProcessor (us.codecraft.webmagic.processor.SimplePageProcessor)1 QueueScheduler (us.codecraft.webmagic.scheduler.QueueScheduler)1