Search in sources :

Example 6 with SitemapSet

use of org.codelibs.fess.crawler.entity.SitemapSet in project fess-crawler by codelibs.

the class SitemapsHelperTest method test_parseXmlSitemapsIndexGz.

public void test_parseXmlSitemapsIndexGz() {
    final InputStream in = ResourceUtil.getResourceAsStream("sitemaps/sitemap2.xml.gz");
    final SitemapSet sitemapSet = sitemapsHelper.parse(in);
    final Sitemap[] sitemaps = sitemapSet.getSitemaps();
    assertEquals(2, sitemaps.length);
    assertFalse(sitemapSet.isUrlSet());
    assertTrue(sitemapSet.isIndex());
    assertEquals("2004-10-01T18:23:17+00:00", sitemaps[0].getLastmod());
    assertEquals("http://www.example.com/sitemap1.xml.gz", sitemaps[0].getLoc());
    assertEquals("2005-01-01", sitemaps[1].getLastmod());
    assertEquals("http://www.example.com/sitemap2.xml.gz", sitemaps[1].getLoc());
}
Also used : Sitemap(org.codelibs.fess.crawler.entity.Sitemap) ByteArrayInputStream(java.io.ByteArrayInputStream) InputStream(java.io.InputStream) SitemapSet(org.codelibs.fess.crawler.entity.SitemapSet)

Example 7 with SitemapSet

use of org.codelibs.fess.crawler.entity.SitemapSet in project fess-crawler by codelibs.

the class SitemapsHelperTest method test_parseTextSitemaps.

public void test_parseTextSitemaps() {
    final InputStream in = ResourceUtil.getResourceAsStream("sitemaps/sitemap1.txt");
    final SitemapSet sitemapSet = sitemapsHelper.parse(in);
    final Sitemap[] sitemaps = sitemapSet.getSitemaps();
    assertEquals(5, sitemaps.length);
    assertTrue(sitemapSet.isUrlSet());
    assertFalse(sitemapSet.isIndex());
    assertNull(sitemaps[0].getLastmod());
    assertEquals("http://www.example.com/", sitemaps[0].getLoc());
    assertNull(((SitemapUrl) sitemaps[0]).getChangefreq());
    assertNull(((SitemapUrl) sitemaps[0]).getPriority());
    assertNull(sitemaps[1].getLastmod());
    assertEquals("http://www.example.com/catalog?item=12&desc=vacation_hawaii", sitemaps[1].getLoc());
    assertNull(((SitemapUrl) sitemaps[1]).getChangefreq());
    assertNull(((SitemapUrl) sitemaps[1]).getPriority());
    assertNull(sitemaps[2].getLastmod());
    assertEquals("http://www.example.com/catalog?item=73&desc=vacation_new_zealand", sitemaps[2].getLoc());
    assertNull(((SitemapUrl) sitemaps[2]).getChangefreq());
    assertNull(((SitemapUrl) sitemaps[2]).getPriority());
    assertNull(sitemaps[3].getLastmod());
    assertEquals("http://www.example.com/catalog?item=74&desc=vacation_newfoundland", sitemaps[3].getLoc());
    assertNull(((SitemapUrl) sitemaps[3]).getChangefreq());
    assertNull(((SitemapUrl) sitemaps[3]).getPriority());
    assertNull(sitemaps[4].getLastmod());
    assertEquals("http://www.example.com/catalog?item=83&desc=vacation_usa", sitemaps[4].getLoc());
    assertNull(((SitemapUrl) sitemaps[4]).getChangefreq());
    assertNull(((SitemapUrl) sitemaps[4]).getPriority());
}
Also used : Sitemap(org.codelibs.fess.crawler.entity.Sitemap) ByteArrayInputStream(java.io.ByteArrayInputStream) InputStream(java.io.InputStream) SitemapSet(org.codelibs.fess.crawler.entity.SitemapSet)

Aggregations

SitemapSet (org.codelibs.fess.crawler.entity.SitemapSet)7 InputStream (java.io.InputStream)6 Sitemap (org.codelibs.fess.crawler.entity.Sitemap)6 ByteArrayInputStream (java.io.ByteArrayInputStream)5 BufferedReader (java.io.BufferedReader)1 IOException (java.io.IOException)1 InputStreamReader (java.io.InputStreamReader)1 LinkedHashSet (java.util.LinkedHashSet)1 IORuntimeException (org.codelibs.core.exception.IORuntimeException)1 RequestData (org.codelibs.fess.crawler.entity.RequestData)1 SitemapUrl (org.codelibs.fess.crawler.entity.SitemapUrl)1 ChildUrlsException (org.codelibs.fess.crawler.exception.ChildUrlsException)1 CrawlingAccessException (org.codelibs.fess.crawler.exception.CrawlingAccessException)1 SitemapsException (org.codelibs.fess.crawler.exception.SitemapsException)1 SitemapsHelper (org.codelibs.fess.crawler.helper.SitemapsHelper)1