Search in sources :

Example 21 with StandardCrawlerContainer

use of org.codelibs.fess.crawler.container.StandardCrawlerContainer in project fess-crawler by codelibs.

the class WebDriverClientTest method setUp.

@Override
protected void setUp() throws Exception {
    super.setUp();
    CrawlerPooledObjectFactory<CrawlerWebDriver> pooledObjectFactory = new CrawlerPooledObjectFactory<>();
    pooledObjectFactory.setComponentName("webDriver");
    pooledObjectFactory.setOnDestroyListener(p -> {
        final CrawlerWebDriver driver = p.getObject();
        driver.quit();
    });
    final StandardCrawlerContainer container = new StandardCrawlerContainer();
    container.prototype("webDriver", CrawlerWebDriver.class).singleton("mimeTypeHelper", MimeTypeHelperImpl.class).singleton("pooledObjectFactory", pooledObjectFactory).singleton("webDriverPool", new GenericObjectPool<>(pooledObjectFactory), null, pool -> {
        pool.close();
    }).<AOnClickAction>singleton("aOnClickAction", AOnClickAction.class).<FormAction>singleton("formAction", FormAction.class).<WebDriverClient>singleton("webDriverClient", WebDriverClient.class, client -> {
        AOnClickAction aOnClick = container.getComponent("aOnClickAction");
        aOnClick.setName("aOnClick");
        aOnClick.setCssQuery("a");
        client.addUrlAction(aOnClick);
        FormAction formAction = container.getComponent("formAction");
        formAction.setName("form");
        formAction.setCssQuery("form");
        client.addUrlAction(formAction);
    });
    webDriverClient = container.getComponent("webDriverClient");
}
Also used : StandardCrawlerContainer(org.codelibs.fess.crawler.container.StandardCrawlerContainer) MimeTypeHelperImpl(org.codelibs.fess.crawler.helper.impl.MimeTypeHelperImpl) Iterator(java.util.Iterator) CrawlerPooledObjectFactory(org.codelibs.fess.crawler.pool.CrawlerPooledObjectFactory) ResourceUtil(org.codelibs.core.io.ResourceUtil) PlainTestCase(org.dbflute.utflute.core.PlainTestCase) CrawlerWebDriver(org.codelibs.fess.crawler.client.http.webdriver.CrawlerWebDriver) Resource(javax.annotation.Resource) GenericObjectPool(org.apache.commons.pool2.impl.GenericObjectPool) Set(java.util.Set) File(java.io.File) AOnClickAction(org.codelibs.fess.crawler.client.http.action.AOnClickAction) SystemUtil(org.codelibs.core.lang.SystemUtil) Constants(org.codelibs.fess.crawler.Constants) RequestData(org.codelibs.fess.crawler.entity.RequestData) FormAction(org.codelibs.fess.crawler.client.http.action.FormAction) WebDriverClient(org.codelibs.fess.crawler.client.http.WebDriverClient) CrawlerWebServer(org.codelibs.fess.crawler.util.CrawlerWebServer) RequestDataBuilder(org.codelibs.fess.crawler.builder.RequestDataBuilder) InputStreamUtil(org.codelibs.core.io.InputStreamUtil) ResponseData(org.codelibs.fess.crawler.entity.ResponseData) AOnClickAction(org.codelibs.fess.crawler.client.http.action.AOnClickAction) WebDriverClient(org.codelibs.fess.crawler.client.http.WebDriverClient) FormAction(org.codelibs.fess.crawler.client.http.action.FormAction) CrawlerWebDriver(org.codelibs.fess.crawler.client.http.webdriver.CrawlerWebDriver) StandardCrawlerContainer(org.codelibs.fess.crawler.container.StandardCrawlerContainer) CrawlerPooledObjectFactory(org.codelibs.fess.crawler.pool.CrawlerPooledObjectFactory) GenericObjectPool(org.apache.commons.pool2.impl.GenericObjectPool) MimeTypeHelperImpl(org.codelibs.fess.crawler.helper.impl.MimeTypeHelperImpl)

Example 22 with StandardCrawlerContainer

use of org.codelibs.fess.crawler.container.StandardCrawlerContainer in project fess-crawler by codelibs.

the class RuleManagerImplTest method setUp.

@Override
protected void setUp() throws Exception {
    super.setUp();
    StandardCrawlerContainer container = new StandardCrawlerContainer().singleton("sitemapsHelper", // 
    SitemapsHelper.class).singleton("sitemapsRule", // 
    SitemapsRule.class).singleton("fileRule", // 
    RegexRule.class).singleton("ruleManager", RuleManagerImpl.class);
    ruleManager = container.getComponent("ruleManager");
    SitemapsRule sitemapsRule = container.getComponent("sitemapsRule");
    sitemapsRule.setRuleId("sitemapsRule");
    sitemapsRule.addRule("url", ".*sitemap.*");
    ruleManager.addRule(sitemapsRule);
    RegexRule fileRule = container.getComponent("fileRule");
    fileRule.setRuleId("fileRule");
    fileRule.setDefaultRule(true);
    ruleManager.addRule(fileRule);
}
Also used : StandardCrawlerContainer(org.codelibs.fess.crawler.container.StandardCrawlerContainer) SitemapsHelper(org.codelibs.fess.crawler.helper.SitemapsHelper)

Example 23 with StandardCrawlerContainer

use of org.codelibs.fess.crawler.container.StandardCrawlerContainer in project fess-crawler by codelibs.

the class TextTransformerTest method setUp.

@Override
protected void setUp() throws Exception {
    super.setUp();
    StandardCrawlerContainer container = new StandardCrawlerContainer().singleton("extractorFactory", ExtractorFactory.class).singleton("textTransformer", TextTransformer.class).singleton("tikaExtractor", TikaExtractor.class);
    textTransformer = container.getComponent("textTransformer");
    textTransformer.setName("textTransformer");
    ExtractorFactory extractorFactory = container.getComponent("extractorFactory");
    TikaExtractor tikaExtractor = container.getComponent("tikaExtractor");
    extractorFactory.addExtractor("text/plain", tikaExtractor);
    extractorFactory.addExtractor("text/html", tikaExtractor);
}
Also used : ExtractorFactory(org.codelibs.fess.crawler.extractor.ExtractorFactory) StandardCrawlerContainer(org.codelibs.fess.crawler.container.StandardCrawlerContainer) TikaExtractor(org.codelibs.fess.crawler.extractor.impl.TikaExtractor)

Example 24 with StandardCrawlerContainer

use of org.codelibs.fess.crawler.container.StandardCrawlerContainer in project fess-crawler by codelibs.

the class EmlExtractorTest method setUp.

@Override
protected void setUp() throws Exception {
    super.setUp();
    StandardCrawlerContainer container = new StandardCrawlerContainer().singleton("emlExtractor", EmlExtractor.class);
    container.singleton("mimeTypeHelper", MimeTypeHelperImpl.class).singleton("tikaExtractor", TikaExtractor.class).singleton("zipExtractor", ZipExtractor.class).<ExtractorFactory>singleton("extractorFactory", ExtractorFactory.class, factory -> {
        TikaExtractor tikaExtractor = container.getComponent("tikaExtractor");
        factory.addExtractor("application/pdf", tikaExtractor);
    });
    emlExtractor = container.getComponent("emlExtractor");
}
Also used : ExtractorFactory(org.codelibs.fess.crawler.extractor.ExtractorFactory) StandardCrawlerContainer(org.codelibs.fess.crawler.container.StandardCrawlerContainer) MimeTypeHelperImpl(org.codelibs.fess.crawler.helper.impl.MimeTypeHelperImpl)

Example 25 with StandardCrawlerContainer

use of org.codelibs.fess.crawler.container.StandardCrawlerContainer in project fess-crawler by codelibs.

the class HtmlExtractorTest method setUp.

@Override
protected void setUp() throws Exception {
    super.setUp();
    StandardCrawlerContainer container = new StandardCrawlerContainer().singleton("htmlExtractor", HtmlExtractor.class);
    htmlExtractor = container.getComponent("htmlExtractor");
}
Also used : StandardCrawlerContainer(org.codelibs.fess.crawler.container.StandardCrawlerContainer)

Aggregations

StandardCrawlerContainer (org.codelibs.fess.crawler.container.StandardCrawlerContainer)32 MimeTypeHelperImpl (org.codelibs.fess.crawler.helper.impl.MimeTypeHelperImpl)9 ExtractorFactory (org.codelibs.fess.crawler.extractor.ExtractorFactory)7 TikaExtractor (org.codelibs.fess.crawler.extractor.impl.TikaExtractor)3 MemoryDataHelper (org.codelibs.fess.crawler.helper.MemoryDataHelper)3 SitemapsHelper (org.codelibs.fess.crawler.helper.SitemapsHelper)3 UrlFilterServiceImpl (org.codelibs.fess.crawler.service.impl.UrlFilterServiceImpl)3 File (java.io.File)2 ArchiveStreamFactory (org.apache.commons.compress.archivers.ArchiveStreamFactory)2 ResourceUtil (org.codelibs.core.io.ResourceUtil)2 HcHttpClient (org.codelibs.fess.crawler.client.http.HcHttpClient)2 RobotsTxtHelper (org.codelibs.fess.crawler.helper.RobotsTxtHelper)2 CrawlerWebServer (org.codelibs.fess.crawler.util.CrawlerWebServer)2 PlainTestCase (org.dbflute.utflute.core.PlainTestCase)2 Iterator (java.util.Iterator)1 Map (java.util.Map)1 Set (java.util.Set)1 TimeUnit (java.util.concurrent.TimeUnit)1 Resource (javax.annotation.Resource)1 GenericObjectPool (org.apache.commons.pool2.impl.GenericObjectPool)1