Search in sources :

Example 11 with StandardCrawlerContainer

use of org.codelibs.fess.crawler.container.StandardCrawlerContainer in project fess-crawler by codelibs.

the class MsWordExtractorTest method setUp.

@Override
protected void setUp() throws Exception {
    super.setUp();
    StandardCrawlerContainer container = new StandardCrawlerContainer().singleton("msWordExtractor", MsWordExtractor.class);
    msWordExtractor = container.getComponent("msWordExtractor");
}
Also used : StandardCrawlerContainer(org.codelibs.fess.crawler.container.StandardCrawlerContainer)

Example 12 with StandardCrawlerContainer

use of org.codelibs.fess.crawler.container.StandardCrawlerContainer in project fess-crawler by codelibs.

the class TarExtractorTest method setUp.

@Override
protected void setUp() throws Exception {
    super.setUp();
    StandardCrawlerContainer container = new StandardCrawlerContainer();
    container.singleton("archiveStreamFactory", ArchiveStreamFactory.class).singleton("compressorStreamFactory", CompressorStreamFactory.class).singleton("mimeTypeHelper", MimeTypeHelperImpl.class).singleton("tikaExtractor", TikaExtractor.class).singleton("tarExtractor", TarExtractor.class).<ExtractorFactory>singleton("extractorFactory", ExtractorFactory.class, factory -> {
        TikaExtractor tikaExtractor = container.getComponent("tikaExtractor");
        TarExtractor tarExtractor = container.getComponent("tarExtractor");
        factory.addExtractor("text/plain", tikaExtractor);
        factory.addExtractor("text/html", tikaExtractor);
        factory.addExtractor("application/tar", tarExtractor);
    });
    tarExtractor = container.getComponent("tarExtractor");
}
Also used : ArchiveStreamFactory(org.apache.commons.compress.archivers.ArchiveStreamFactory) ExtractorFactory(org.codelibs.fess.crawler.extractor.ExtractorFactory) StandardCrawlerContainer(org.codelibs.fess.crawler.container.StandardCrawlerContainer) MimeTypeHelperImpl(org.codelibs.fess.crawler.helper.impl.MimeTypeHelperImpl)

Example 13 with StandardCrawlerContainer

use of org.codelibs.fess.crawler.container.StandardCrawlerContainer in project fess-crawler by codelibs.

the class XmlExtractorTest method setUp.

@Override
protected void setUp() throws Exception {
    super.setUp();
    StandardCrawlerContainer container = new StandardCrawlerContainer().singleton("xmlExtractor", XmlExtractor.class);
    xmlExtractor = container.getComponent("xmlExtractor");
}
Also used : StandardCrawlerContainer(org.codelibs.fess.crawler.container.StandardCrawlerContainer)

Example 14 with StandardCrawlerContainer

use of org.codelibs.fess.crawler.container.StandardCrawlerContainer in project fess-crawler by codelibs.

the class ZipExtractorTest method setUp.

@Override
protected void setUp() throws Exception {
    super.setUp();
    StandardCrawlerContainer container = new StandardCrawlerContainer();
    container.singleton("archiveStreamFactory", ArchiveStreamFactory.class).singleton("compressorStreamFactory", CompressorStreamFactory.class).singleton("mimeTypeHelper", MimeTypeHelperImpl.class).singleton("tikaExtractor", TikaExtractor.class).singleton("zipExtractor", ZipExtractor.class).<ExtractorFactory>singleton("extractorFactory", ExtractorFactory.class, factory -> {
        TikaExtractor tikaExtractor = container.getComponent("tikaExtractor");
        ZipExtractor zipExtractor = container.getComponent("zipExtractor");
        factory.addExtractor("text/plain", tikaExtractor);
        factory.addExtractor("text/html", tikaExtractor);
        factory.addExtractor("application/zip", zipExtractor);
    });
    zipExtractor = container.getComponent("zipExtractor");
}
Also used : ArchiveStreamFactory(org.apache.commons.compress.archivers.ArchiveStreamFactory) ExtractorFactory(org.codelibs.fess.crawler.extractor.ExtractorFactory) StandardCrawlerContainer(org.codelibs.fess.crawler.container.StandardCrawlerContainer) MimeTypeHelperImpl(org.codelibs.fess.crawler.helper.impl.MimeTypeHelperImpl)

Example 15 with StandardCrawlerContainer

use of org.codelibs.fess.crawler.container.StandardCrawlerContainer in project fess-crawler by codelibs.

the class CustomUrlFilterImplTest method setUp.

@Override
protected void setUp() throws Exception {
    super.setUp();
    StandardCrawlerContainer container = new StandardCrawlerContainer().singleton("dataHelper", // 
    MemoryDataHelper.class).singleton("urlFilterService", // 
    UrlFilterServiceImpl.class).singleton("includeFilter", // 
    UrlFilterImpl.class).singleton("excludeFilter", // 
    UrlFilterImpl.class).singleton("domainFilter", // 
    UrlFilterImpl.class);
    includeFilter = container.getComponent("includeFilter");
    includeFilter.setIncludeFilteringPattern("$1$2$3.*");
    excludeFilter = container.getComponent("excludeFilter");
    excludeFilter.setExcludeFilteringPattern("$1$2$3.*");
    domainFilter = container.getComponent("domainFilter");
    domainFilter.setIncludeFilteringPattern("http://$2/.*");
    domainFilter.setExcludeFilteringPattern("http://$2/.*");
}
Also used : UrlFilterServiceImpl(org.codelibs.fess.crawler.service.impl.UrlFilterServiceImpl) StandardCrawlerContainer(org.codelibs.fess.crawler.container.StandardCrawlerContainer)

Aggregations

StandardCrawlerContainer (org.codelibs.fess.crawler.container.StandardCrawlerContainer)32 MimeTypeHelperImpl (org.codelibs.fess.crawler.helper.impl.MimeTypeHelperImpl)9 ExtractorFactory (org.codelibs.fess.crawler.extractor.ExtractorFactory)7 TikaExtractor (org.codelibs.fess.crawler.extractor.impl.TikaExtractor)3 MemoryDataHelper (org.codelibs.fess.crawler.helper.MemoryDataHelper)3 SitemapsHelper (org.codelibs.fess.crawler.helper.SitemapsHelper)3 UrlFilterServiceImpl (org.codelibs.fess.crawler.service.impl.UrlFilterServiceImpl)3 File (java.io.File)2 ArchiveStreamFactory (org.apache.commons.compress.archivers.ArchiveStreamFactory)2 ResourceUtil (org.codelibs.core.io.ResourceUtil)2 HcHttpClient (org.codelibs.fess.crawler.client.http.HcHttpClient)2 RobotsTxtHelper (org.codelibs.fess.crawler.helper.RobotsTxtHelper)2 CrawlerWebServer (org.codelibs.fess.crawler.util.CrawlerWebServer)2 PlainTestCase (org.dbflute.utflute.core.PlainTestCase)2 Iterator (java.util.Iterator)1 Map (java.util.Map)1 Set (java.util.Set)1 TimeUnit (java.util.concurrent.TimeUnit)1 Resource (javax.annotation.Resource)1 GenericObjectPool (org.apache.commons.pool2.impl.GenericObjectPool)1