Search in sources :

Example 1 with SegmentResolver

use of com.virjar.vscrawler.core.seed.SegmentResolver in project vscrawler by virjar.

the class FutureCrawler method main.

public static void main(String[] args) {
    VSCrawler vsCrawler = VSCrawlerBuilder.create().setStopWhileTaskEmptyDuration(2000).setSegmentResolver(new SegmentResolver() {

        @Override
        public long resolveSegmentKey(long activeTime) {
            // 按分钟分段,每隔一分钟重新抓取链接,这里只是为了测试,实际上不能设置这么短,建议按天分段
            return new DateTime(activeTime).withSecondOfMinute(0).getMillis();
        }
    }).setProcessor(new SeedProcessor() {

        @Override
        public void process(Seed seed, CrawlerSession crawlerSession, GrabResult crawlResult) {
            // 建立一个种子副本
            Seed copy = seed.copy();
            // 设置生效时间为两分钟后
            copy.setActiveTimeStamp(DateTime.now().plusMinutes(1).getMillis());
            // 返回新种子
            crawlResult.addSeed(copy);
        }
    }).build();
    // 当前所有demo都会清空task,否则不同爬虫的数据可能紊乱
    vsCrawler.clearTask();
    vsCrawler.pushSeed("https://www.meitulu.com/item/6892.htm");
    vsCrawler.start();
}
Also used : VSCrawler(com.virjar.vscrawler.core.VSCrawler) SegmentResolver(com.virjar.vscrawler.core.seed.SegmentResolver) Seed(com.virjar.vscrawler.core.seed.Seed) GrabResult(com.virjar.vscrawler.core.processor.GrabResult) SeedProcessor(com.virjar.vscrawler.core.processor.SeedProcessor) CrawlerSession(com.virjar.vscrawler.core.net.session.CrawlerSession) DateTime(org.joda.time.DateTime)

Aggregations

VSCrawler (com.virjar.vscrawler.core.VSCrawler)1 CrawlerSession (com.virjar.vscrawler.core.net.session.CrawlerSession)1 GrabResult (com.virjar.vscrawler.core.processor.GrabResult)1 SeedProcessor (com.virjar.vscrawler.core.processor.SeedProcessor)1 Seed (com.virjar.vscrawler.core.seed.Seed)1 SegmentResolver (com.virjar.vscrawler.core.seed.SegmentResolver)1 DateTime (org.joda.time.DateTime)1