Search in sources :

Example 1 with CrawlerWebServer

use of org.codelibs.fess.crawler.util.CrawlerWebServer in project fess-crawler by codelibs.

the class HcHttpClientTest method test_processRobotsTxt.

public void test_processRobotsTxt() {
    final CrawlerWebServer server = new CrawlerWebServer(7070);
    server.start();
    final String url = "http://localhost:7070/hoge.html";
    try {
        final CrawlerContext crawlerContext = new CrawlerContext();
        final String sessionId = "id1";
        urlFilter.init(sessionId);
        crawlerContext.setUrlFilter(urlFilter);
        CrawlingParameterUtil.setCrawlerContext(crawlerContext);
        httpClient.init();
        httpClient.processRobotsTxt(url);
        assertEquals(1, crawlerContext.getRobotsTxtUrlSet().size());
        assertTrue(crawlerContext.getRobotsTxtUrlSet().contains("http://localhost:7070/robots.txt"));
        assertFalse(urlFilter.match("http://localhost:7070/admin/"));
        assertFalse(urlFilter.match("http://localhost:7070/websvn/"));
    } finally {
        server.stop();
    }
}
Also used : CrawlerWebServer(org.codelibs.fess.crawler.util.CrawlerWebServer) CrawlerContext(org.codelibs.fess.crawler.CrawlerContext)

Example 2 with CrawlerWebServer

use of org.codelibs.fess.crawler.util.CrawlerWebServer in project fess-crawler by codelibs.

the class HcHttpClientTest method test_doGet.

public void test_doGet() {
    final CrawlerWebServer server = new CrawlerWebServer(7070);
    server.start();
    final String url = "http://localhost:7070/";
    try {
        final ResponseData responseData = httpClient.doGet(url);
        assertEquals(200, responseData.getHttpStatusCode());
    } finally {
        server.stop();
    }
}
Also used : CrawlerWebServer(org.codelibs.fess.crawler.util.CrawlerWebServer) ResponseData(org.codelibs.fess.crawler.entity.ResponseData)

Example 3 with CrawlerWebServer

use of org.codelibs.fess.crawler.util.CrawlerWebServer in project fess-crawler by codelibs.

the class HcHttpClientTest method test_doHead.

public void test_doHead() throws Exception {
    final CrawlerWebServer server = new CrawlerWebServer(7070);
    server.start();
    final String url = "http://localhost:7070/";
    try {
        final ResponseData responseData = httpClient.doHead(url);
        Thread.sleep(100);
        assertNotNull(responseData.getLastModified());
        assertTrue(responseData.getLastModified().getTime() < new Date().getTime());
    } finally {
        server.stop();
    }
}
Also used : CrawlerWebServer(org.codelibs.fess.crawler.util.CrawlerWebServer) ResponseData(org.codelibs.fess.crawler.entity.ResponseData) Date(java.util.Date)

Example 4 with CrawlerWebServer

use of org.codelibs.fess.crawler.util.CrawlerWebServer in project fess-crawler by codelibs.

the class CrawlerTest method test_execute_bg.

public void test_execute_bg() throws Exception {
    final CrawlerWebServer server = new CrawlerWebServer(7070);
    server.start();
    try {
        final String url = "http://localhost:7070/";
        final int maxCount = 50;
        final int numOfThread = 10;
        final File file = File.createTempFile("crawler-", "");
        file.delete();
        file.mkdirs();
        file.deleteOnExit();
        fileTransformer.setPath(file.getAbsolutePath());
        crawler.setBackground(true);
        ((UrlFilterImpl) crawler.urlFilter).setIncludeFilteringPattern("$1$2$3.*");
        crawler.addUrl(url);
        crawler.getCrawlerContext().setMaxAccessCount(maxCount);
        crawler.getCrawlerContext().setNumOfThread(numOfThread);
        final String sessionId = crawler.execute();
        Thread.sleep(3000);
        assertEquals(CrawlerStatus.RUNNING, crawler.crawlerContext.getStatus());
        crawler.awaitTermination();
        assertEquals(maxCount, dataService.getCount(sessionId));
        dataService.delete(sessionId);
    } finally {
        server.stop();
    }
}
Also used : UrlFilterImpl(org.codelibs.fess.crawler.filter.impl.UrlFilterImpl) CrawlerWebServer(org.codelibs.fess.crawler.util.CrawlerWebServer) File(java.io.File)

Example 5 with CrawlerWebServer

use of org.codelibs.fess.crawler.util.CrawlerWebServer in project fess-crawler by codelibs.

the class WebDriverClientTest method test_doHead.

public void test_doHead() throws Exception {
    File docRootDir = new File(ResourceUtil.getBuildDir("ajax/index.html"), "ajax");
    final CrawlerWebServer server = new CrawlerWebServer(7070, docRootDir);
    final String url = "http://localhost:7070/";
    try {
        server.start();
        final ResponseData responseData = webDriverClient.execute(RequestDataBuilder.newRequestData().head().url(url).build());
        Thread.sleep(100);
        assertNotNull(responseData.getLastModified());
        assertTrue(responseData.getLastModified().getTime() < SystemUtil.currentTimeMillis());
    } finally {
        server.stop();
    }
}
Also used : CrawlerWebServer(org.codelibs.fess.crawler.util.CrawlerWebServer) ResponseData(org.codelibs.fess.crawler.entity.ResponseData) File(java.io.File)

Aggregations

CrawlerWebServer (org.codelibs.fess.crawler.util.CrawlerWebServer)12 File (java.io.File)9 ResponseData (org.codelibs.fess.crawler.entity.ResponseData)4 UrlFilterImpl (org.codelibs.fess.crawler.filter.impl.UrlFilterImpl)3 UrlQueue (org.codelibs.fess.crawler.entity.UrlQueue)2 Date (java.util.Date)1 Crawler (org.codelibs.fess.crawler.Crawler)1 CrawlerContext (org.codelibs.fess.crawler.CrawlerContext)1 RequestData (org.codelibs.fess.crawler.entity.RequestData)1