use of us.codecraft.webmagic.Spider in project webmagic by code4craft.
the class ZipCodePageProcessor method main.
public static void main(String[] args) {
Spider spider = Spider.create(new ZipCodePageProcessor()).scheduler(new PriorityScheduler()).addUrl("http://www.ip138.com/post/");
spider.run();
}
use of us.codecraft.webmagic.Spider in project webmagic by code4craft.
the class OschinaBlogPageProcesser method main.
public static void main(String[] args) throws JMException {
Spider spider = Spider.create(new OschinaBlogPageProcesser()).setScheduler(new QueueScheduler().setDuplicateRemover(new BloomFilterDuplicateRemover(2000)));
SpiderMonitor.instance().register(spider);
spider.run();
}
use of us.codecraft.webmagic.Spider in project webmagic by code4craft.
the class BaiduBaikePageProcessor method main.
public static void main(String[] args) {
//single download
Spider spider = Spider.create(new BaiduBaikePageProcessor()).thread(2);
String urlTemplate = "http://baike.baidu.com/search/word?word=%s&pic=1&sug=1&enc=utf8";
ResultItems resultItems = spider.<ResultItems>get(String.format(urlTemplate, "水力发电"));
System.out.println(resultItems);
//multidownload
List<String> list = new ArrayList<String>();
list.add(String.format(urlTemplate, "风力发电"));
list.add(String.format(urlTemplate, "太阳能"));
list.add(String.format(urlTemplate, "地热发电"));
list.add(String.format(urlTemplate, "地热发电"));
List<ResultItems> resultItemses = spider.<ResultItems>getAll(list);
for (ResultItems resultItemse : resultItemses) {
System.out.println(resultItemse.getAll());
}
spider.close();
}
use of us.codecraft.webmagic.Spider in project webmagic by code4craft.
the class MonitorExample method main.
public static void main(String[] args) throws Exception {
Spider zhihuSpider = Spider.create(new ZhihuPageProcessor()).addUrl("http://my.oschina.net/flashsword/blog");
Spider githubSpider = Spider.create(new GithubRepoPageProcessor()).addUrl("https://github.com/code4craft");
SpiderMonitor.instance().register(zhihuSpider);
SpiderMonitor.instance().register(githubSpider);
zhihuSpider.start();
githubSpider.start();
}
use of us.codecraft.webmagic.Spider in project webmagic by code4craft.
the class SpiderMonitor method register.
/**
* Register spider for monitor.
*
* @param spiders spiders
* @return this
* @throws JMException JMException
*/
public synchronized SpiderMonitor register(Spider... spiders) throws JMException {
for (Spider spider : spiders) {
MonitorSpiderListener monitorSpiderListener = new MonitorSpiderListener();
if (spider.getSpiderListeners() == null) {
List<SpiderListener> spiderListeners = new ArrayList<SpiderListener>();
spiderListeners.add(monitorSpiderListener);
spider.setSpiderListeners(spiderListeners);
} else {
spider.getSpiderListeners().add(monitorSpiderListener);
}
SpiderStatusMXBean spiderStatusMBean = getSpiderStatusMBean(spider, monitorSpiderListener);
registerMBean(spiderStatusMBean);
spiderStatuses.add(spiderStatusMBean);
}
return this;
}
Aggregations