Search in sources :

Example 1 with SitemapsException

use of org.codelibs.fess.crawler.exception.SitemapsException in project fess-crawler by codelibs.

the class SitemapsHelper method parseTextSitemaps.

protected SitemapSet parseTextSitemaps(final InputStream in) {
    final SitemapSet sitemapSet = new SitemapSet();
    sitemapSet.setType(SitemapSet.URLSET);
    try {
        final BufferedReader br = new BufferedReader(new InputStreamReader(in, Constants.UTF_8));
        String line;
        while ((line = br.readLine()) != null) {
            final String url = line.trim();
            if (StringUtil.isNotBlank(url) && (url.startsWith("http://") || url.startsWith("https://"))) {
                final SitemapUrl sitemapUrl = new SitemapUrl();
                sitemapUrl.setLoc(url);
                sitemapSet.addSitemap(sitemapUrl);
            }
        }
        return sitemapSet;
    } catch (final Exception e) {
        throw new SitemapsException("Could not parse Text Sitemaps.", e);
    }
}
Also used : SitemapUrl(org.codelibs.fess.crawler.entity.SitemapUrl) InputStreamReader(java.io.InputStreamReader) SitemapSet(org.codelibs.fess.crawler.entity.SitemapSet) BufferedReader(java.io.BufferedReader) SitemapsException(org.codelibs.fess.crawler.exception.SitemapsException) CrawlingAccessException(org.codelibs.fess.crawler.exception.CrawlingAccessException) SitemapsException(org.codelibs.fess.crawler.exception.SitemapsException)

Example 2 with SitemapsException

use of org.codelibs.fess.crawler.exception.SitemapsException in project fess-crawler by codelibs.

the class SitemapsHelper method parseXmlSitemaps.

protected SitemapSet parseXmlSitemaps(final InputStream in) {
    final XmlSitemapsHandler handler = new XmlSitemapsHandler();
    try {
        final SAXParserFactory spfactory = SAXParserFactory.newInstance();
        final SAXParser parser = spfactory.newSAXParser();
        parser.parse(in, handler);
    } catch (final Exception e) {
        throw new SitemapsException("Could not parse XML Sitemaps.", e);
    }
    return handler.getSitemapSet();
}
Also used : SAXParser(javax.xml.parsers.SAXParser) SitemapsException(org.codelibs.fess.crawler.exception.SitemapsException) CrawlingAccessException(org.codelibs.fess.crawler.exception.CrawlingAccessException) SitemapsException(org.codelibs.fess.crawler.exception.SitemapsException) SAXParserFactory(javax.xml.parsers.SAXParserFactory)

Example 3 with SitemapsException

use of org.codelibs.fess.crawler.exception.SitemapsException in project fess-crawler by codelibs.

the class SitemapsHelper method parseXmlSitemapsIndex.

protected SitemapSet parseXmlSitemapsIndex(final InputStream in) {
    final XmlSitemapsIndexHandler handler = new XmlSitemapsIndexHandler();
    try {
        final SAXParserFactory spfactory = SAXParserFactory.newInstance();
        final SAXParser parser = spfactory.newSAXParser();
        parser.parse(in, handler);
    } catch (final Exception e) {
        throw new SitemapsException("Could not parse XML Sitemaps Index.", e);
    }
    return handler.getSitemapSet();
}
Also used : SAXParser(javax.xml.parsers.SAXParser) SitemapsException(org.codelibs.fess.crawler.exception.SitemapsException) CrawlingAccessException(org.codelibs.fess.crawler.exception.CrawlingAccessException) SitemapsException(org.codelibs.fess.crawler.exception.SitemapsException) SAXParserFactory(javax.xml.parsers.SAXParserFactory)

Aggregations

CrawlingAccessException (org.codelibs.fess.crawler.exception.CrawlingAccessException)3 SitemapsException (org.codelibs.fess.crawler.exception.SitemapsException)3 SAXParser (javax.xml.parsers.SAXParser)2 SAXParserFactory (javax.xml.parsers.SAXParserFactory)2 BufferedReader (java.io.BufferedReader)1 InputStreamReader (java.io.InputStreamReader)1 SitemapSet (org.codelibs.fess.crawler.entity.SitemapSet)1 SitemapUrl (org.codelibs.fess.crawler.entity.SitemapUrl)1