Search in sources :

Example 1 with SitemapUrl

use of org.codelibs.fess.crawler.entity.SitemapUrl in project fess-crawler by codelibs.

the class SitemapsHelper method parseTextSitemaps.

protected SitemapSet parseTextSitemaps(final InputStream in) {
    final SitemapSet sitemapSet = new SitemapSet();
    sitemapSet.setType(SitemapSet.URLSET);
    try {
        final BufferedReader br = new BufferedReader(new InputStreamReader(in, Constants.UTF_8));
        String line;
        while ((line = br.readLine()) != null) {
            final String url = line.trim();
            if (StringUtil.isNotBlank(url) && (url.startsWith("http://") || url.startsWith("https://"))) {
                final SitemapUrl sitemapUrl = new SitemapUrl();
                sitemapUrl.setLoc(url);
                sitemapSet.addSitemap(sitemapUrl);
            }
        }
        return sitemapSet;
    } catch (final Exception e) {
        throw new SitemapsException("Could not parse Text Sitemaps.", e);
    }
}
Also used : SitemapUrl(org.codelibs.fess.crawler.entity.SitemapUrl) InputStreamReader(java.io.InputStreamReader) SitemapSet(org.codelibs.fess.crawler.entity.SitemapSet) BufferedReader(java.io.BufferedReader) SitemapsException(org.codelibs.fess.crawler.exception.SitemapsException) CrawlingAccessException(org.codelibs.fess.crawler.exception.CrawlingAccessException) SitemapsException(org.codelibs.fess.crawler.exception.SitemapsException)

Aggregations

BufferedReader (java.io.BufferedReader)1 InputStreamReader (java.io.InputStreamReader)1 SitemapSet (org.codelibs.fess.crawler.entity.SitemapSet)1 SitemapUrl (org.codelibs.fess.crawler.entity.SitemapUrl)1 CrawlingAccessException (org.codelibs.fess.crawler.exception.CrawlingAccessException)1 SitemapsException (org.codelibs.fess.crawler.exception.SitemapsException)1