Search in sources :

Example 6 with ResultItems

use of us.codecraft.webmagic.ResultItems in project vscrawler by virjar.

the class WebMagicPipelineDelegator method saveItem.

@Override
public void saveItem(GrabResult grabResult, Seed seed) {
    for (Object str : grabResult.allEntityResult()) {
        ResultItems resultItems = new ResultItems();
        resultItems.setRequest(CovertUtil.convertSeed(seed));
        if (str instanceof CharSequence) {
            handleJson(resultItems, str.toString());
        } else {
            handleJsonObject(resultItems, str);
        }
        try {
            webMagicPipeline.process(resultItems, null);
        } catch (Exception e) {
            log.error("error when process result", e);
        }
    }
}
Also used : ResultItems(us.codecraft.webmagic.ResultItems) JSONObject(com.alibaba.fastjson.JSONObject)

Example 7 with ResultItems

use of us.codecraft.webmagic.ResultItems in project webmagic by code4craft.

the class BaiduBaikePageProcessor method main.

public static void main(String[] args) {
    // single download
    Spider spider = Spider.create(new BaiduBaikePageProcessor()).thread(2);
    String urlTemplate = "http://baike.baidu.com/search/word?word=%s&pic=1&sug=1&enc=utf8";
    ResultItems resultItems = spider.<ResultItems>get(String.format(urlTemplate, "水力发电"));
    System.out.println(resultItems);
    // multidownload
    List<String> list = new ArrayList<String>();
    list.add(String.format(urlTemplate, "风力发电"));
    list.add(String.format(urlTemplate, "太阳能"));
    list.add(String.format(urlTemplate, "地热发电"));
    list.add(String.format(urlTemplate, "地热发电"));
    List<ResultItems> resultItemses = spider.<ResultItems>getAll(list);
    for (ResultItems resultItemse : resultItemses) {
        System.out.println(resultItemse.getAll());
    }
    spider.close();
}
Also used : ResultItems(us.codecraft.webmagic.ResultItems) Spider(us.codecraft.webmagic.Spider) ArrayList(java.util.ArrayList)

Example 8 with ResultItems

use of us.codecraft.webmagic.ResultItems in project webmagic by code4craft.

the class PhantomJSPageProcessor method main.

public static void main(String[] args) throws Exception {
    PhantomJSDownloader phantomDownloader = new PhantomJSDownloader().setRetryNum(3);
    CollectorPipeline<ResultItems> collectorPipeline = new ResultItemsCollectorPipeline();
    Spider.create(new PhantomJSPageProcessor()).addUrl(// %B6%AC%D7%B0为冬装的GBK编码
    "http://s.taobao.com/search?q=%B6%AC%D7%B0&sort=sale-desc").setDownloader(phantomDownloader).addPipeline(collectorPipeline).thread((Runtime.getRuntime().availableProcessors() - 1) << 1).run();
    List<ResultItems> resultItemsList = collectorPipeline.getCollected();
    System.out.println(resultItemsList.get(0).get("html").toString());
}
Also used : PhantomJSDownloader(us.codecraft.webmagic.downloader.PhantomJSDownloader) ResultItemsCollectorPipeline(us.codecraft.webmagic.pipeline.ResultItemsCollectorPipeline) ResultItems(us.codecraft.webmagic.ResultItems)

Aggregations

ResultItems (us.codecraft.webmagic.ResultItems)8 ArrayList (java.util.ArrayList)3 Spider (us.codecraft.webmagic.Spider)3 Request (us.codecraft.webmagic.Request)2 Site (us.codecraft.webmagic.Site)2 Task (us.codecraft.webmagic.Task)2 JSONObject (com.alibaba.fastjson.JSONObject)1 BeforeClass (org.junit.BeforeClass)1 Test (org.junit.Test)1 Page (us.codecraft.webmagic.Page)1 MockGithubDownloader (us.codecraft.webmagic.downloader.MockGithubDownloader)1 PhantomJSDownloader (us.codecraft.webmagic.downloader.PhantomJSDownloader)1 Pipeline (us.codecraft.webmagic.pipeline.Pipeline)1 ResultItemsCollectorPipeline (us.codecraft.webmagic.pipeline.ResultItemsCollectorPipeline)1 PageProcessor (us.codecraft.webmagic.processor.PageProcessor)1 BaiduBaikePageProcessor (us.codecraft.webmagic.processor.example.BaiduBaikePageProcessor)1 PlainText (us.codecraft.webmagic.selector.PlainText)1