Search in sources :

Example 1 with FileConfig

use of org.codelibs.fess.es.config.exentity.FileConfig in project fess by codelibs.

the class ApiAdminFileconfigAction method put$setting.

// PUT /api/admin/fileconfig/setting
@Execute
public JsonResponse<ApiResult> put$setting(final CreateBody body) {
    validateApi(body, messages -> {
    });
    body.crudMode = CrudMode.CREATE;
    final FileConfig fileConfig = getFileConfig(body).map(entity -> {
        try {
            fileConfigService.store(entity);
        } catch (final Exception e) {
            throwValidationErrorApi(messages -> messages.addErrorsCrudFailedToCreateCrudTable(GLOBAL, buildThrowableMessage(e)));
        }
        return entity;
    }).orElseGet(() -> {
        throwValidationErrorApi(messages -> messages.addErrorsCrudFailedToCreateInstance(GLOBAL));
        return null;
    });
    return asJson(new ApiUpdateResponse().id(fileConfig.getId()).created(true).status(Status.OK).result());
}
Also used : Constants(org.codelibs.fess.Constants) StreamUtil.stream(org.codelibs.core.stream.StreamUtil.stream) FessApiAdminAction(org.codelibs.fess.app.web.api.admin.FessApiAdminAction) AdminFileconfigAction.getFileConfig(org.codelibs.fess.app.web.admin.fileconfig.AdminFileconfigAction.getFileConfig) Resource(javax.annotation.Resource) StringUtil(org.codelibs.core.lang.StringUtil) ApiConfigResponse(org.codelibs.fess.app.web.api.ApiResult.ApiConfigResponse) JsonResponse(org.lastaflute.web.response.JsonResponse) PermissionHelper(org.codelibs.fess.helper.PermissionHelper) Collectors(java.util.stream.Collectors) ApiResult(org.codelibs.fess.app.web.api.ApiResult) ApiUpdateResponse(org.codelibs.fess.app.web.api.ApiResult.ApiUpdateResponse) FileConfigPager(org.codelibs.fess.app.pager.FileConfigPager) Status(org.codelibs.fess.app.web.api.ApiResult.Status) List(java.util.List) CrudMode(org.codelibs.fess.app.web.CrudMode) ComponentUtil(org.codelibs.fess.util.ComponentUtil) FileConfigService(org.codelibs.fess.app.service.FileConfigService) Execute(org.lastaflute.web.Execute) ApiResponse(org.codelibs.fess.app.web.api.ApiResult.ApiResponse) FileConfig(org.codelibs.fess.es.config.exentity.FileConfig) AdminFileconfigAction.getFileConfig(org.codelibs.fess.app.web.admin.fileconfig.AdminFileconfigAction.getFileConfig) FileConfig(org.codelibs.fess.es.config.exentity.FileConfig) ApiUpdateResponse(org.codelibs.fess.app.web.api.ApiResult.ApiUpdateResponse) Execute(org.lastaflute.web.Execute)

Example 2 with FileConfig

use of org.codelibs.fess.es.config.exentity.FileConfig in project fess by codelibs.

the class ApiAdminFileconfigAction method createEditBody.

protected EditBody createEditBody(final FileConfig entity) {
    final EditBody body = new EditBody();
    copyBeanToBean(entity, body, copyOp -> {
        copyOp.excludeNull();
        copyOp.exclude(Constants.PERMISSIONS, Constants.VIRTUAL_HOSTS);
    });
    final PermissionHelper permissionHelper = ComponentUtil.getPermissionHelper();
    body.permissions = stream(entity.getPermissions()).get(stream -> stream.map(s -> permissionHelper.decode(s)).filter(StringUtil::isNotBlank).distinct().collect(Collectors.joining("\n")));
    body.virtualHosts = stream(entity.getVirtualHosts()).get(stream -> stream.filter(StringUtil::isNotBlank).distinct().map(String::trim).collect(Collectors.joining("\n")));
    return body;
}
Also used : Constants(org.codelibs.fess.Constants) StreamUtil.stream(org.codelibs.core.stream.StreamUtil.stream) FessApiAdminAction(org.codelibs.fess.app.web.api.admin.FessApiAdminAction) AdminFileconfigAction.getFileConfig(org.codelibs.fess.app.web.admin.fileconfig.AdminFileconfigAction.getFileConfig) Resource(javax.annotation.Resource) StringUtil(org.codelibs.core.lang.StringUtil) ApiConfigResponse(org.codelibs.fess.app.web.api.ApiResult.ApiConfigResponse) JsonResponse(org.lastaflute.web.response.JsonResponse) PermissionHelper(org.codelibs.fess.helper.PermissionHelper) Collectors(java.util.stream.Collectors) ApiResult(org.codelibs.fess.app.web.api.ApiResult) ApiUpdateResponse(org.codelibs.fess.app.web.api.ApiResult.ApiUpdateResponse) FileConfigPager(org.codelibs.fess.app.pager.FileConfigPager) Status(org.codelibs.fess.app.web.api.ApiResult.Status) List(java.util.List) CrudMode(org.codelibs.fess.app.web.CrudMode) ComponentUtil(org.codelibs.fess.util.ComponentUtil) FileConfigService(org.codelibs.fess.app.service.FileConfigService) Execute(org.lastaflute.web.Execute) ApiResponse(org.codelibs.fess.app.web.api.ApiResult.ApiResponse) FileConfig(org.codelibs.fess.es.config.exentity.FileConfig) PermissionHelper(org.codelibs.fess.helper.PermissionHelper) StringUtil(org.codelibs.core.lang.StringUtil)

Example 3 with FileConfig

use of org.codelibs.fess.es.config.exentity.FileConfig in project fess by codelibs.

the class AdminWizardAction method crawlingConfigInternal.

protected String crawlingConfigInternal(final CrawlingConfigForm form) {
    String configName = form.crawlingConfigName;
    String configPath = form.crawlingConfigPath.trim();
    if (StringUtil.isBlank(configName)) {
        configName = StringUtils.abbreviate(configPath, 30);
    }
    // normalize
    final StringBuilder buf = new StringBuilder(1000);
    for (int i = 0; i < configPath.length(); i++) {
        final char c = configPath.charAt(i);
        if (c == '\\') {
            buf.append('/');
        } else if (c == ' ') {
            buf.append("%20");
        } else if (CharUtil.isUrlChar(c)) {
            buf.append(c);
        } else {
            try {
                buf.append(URLEncoder.encode(String.valueOf(c), Constants.UTF_8));
            } catch (final UnsupportedEncodingException e) {
            }
        }
    }
    configPath = convertCrawlingPath(buf.toString());
    final String username = systemHelper.getUsername();
    final long now = systemHelper.getCurrentTimeAsLong();
    try {
        if (isWebCrawlingPath(configPath)) {
            // web
            final WebConfig wConfig = new WebConfig();
            wConfig.setAvailable(Constants.T);
            wConfig.setBoost(1.0f);
            wConfig.setCreatedBy(username);
            wConfig.setCreatedTime(now);
            if (form.depth != null) {
                wConfig.setDepth(form.depth);
            }
            wConfig.setExcludedDocUrls(getDefaultString("default.config.web.excludedDocUrls", StringUtil.EMPTY));
            wConfig.setExcludedUrls(getDefaultString("default.config.web.excludedUrls", StringUtil.EMPTY));
            wConfig.setIncludedDocUrls(getDefaultString("default.config.web.includedDocUrls", StringUtil.EMPTY));
            wConfig.setIncludedUrls(getDefaultString("default.config.web.includedUrls", StringUtil.EMPTY));
            wConfig.setIntervalTime(getDefaultInteger("default.config.web.intervalTime", Constants.DEFAULT_INTERVAL_TIME_FOR_WEB));
            if (form.maxAccessCount != null) {
                wConfig.setMaxAccessCount(form.maxAccessCount);
            }
            wConfig.setName(configName);
            wConfig.setNumOfThread(getDefaultInteger("default.config.web.numOfThread", Constants.DEFAULT_NUM_OF_THREAD_FOR_WEB));
            wConfig.setSortOrder(getDefaultInteger("default.config.web.sortOrder", 1));
            wConfig.setUpdatedBy(username);
            wConfig.setUpdatedTime(now);
            wConfig.setUrls(configPath);
            wConfig.setUserAgent(getDefaultString("default.config.web.userAgent", fessConfig.getUserAgentName()));
            wConfig.setPermissions(ComponentUtil.getFessConfig().getSearchDefaultDisplayEncodedPermissions());
            webConfigService.store(wConfig);
        } else {
            // file
            final FileConfig fConfig = new FileConfig();
            fConfig.setAvailable(Constants.T);
            fConfig.setBoost(1.0f);
            fConfig.setCreatedBy(username);
            fConfig.setCreatedTime(now);
            if (form.depth != null) {
                fConfig.setDepth(form.depth);
            }
            fConfig.setExcludedDocPaths(getDefaultString("default.config.file.excludedDocPaths", StringUtil.EMPTY));
            fConfig.setExcludedPaths(getDefaultString("default.config.file.excludedPaths", StringUtil.EMPTY));
            fConfig.setIncludedDocPaths(getDefaultString("default.config.file.includedDocPaths", StringUtil.EMPTY));
            fConfig.setIncludedPaths(getDefaultString("default.config.file.includedPaths", StringUtil.EMPTY));
            fConfig.setIntervalTime(getDefaultInteger("default.config.file.intervalTime", Constants.DEFAULT_INTERVAL_TIME_FOR_FS));
            if (form.maxAccessCount != null) {
                fConfig.setMaxAccessCount(form.maxAccessCount);
            }
            fConfig.setName(configName);
            fConfig.setNumOfThread(getDefaultInteger("default.config.file.numOfThread", Constants.DEFAULT_NUM_OF_THREAD_FOR_FS));
            fConfig.setSortOrder(getDefaultInteger("default.config.file.sortOrder", 1));
            fConfig.setUpdatedBy(username);
            fConfig.setUpdatedTime(now);
            fConfig.setPaths(configPath);
            fConfig.setPermissions(ComponentUtil.getFessConfig().getSearchDefaultDisplayEncodedPermissions());
            fileConfigService.store(fConfig);
        }
        return configName;
    } catch (final Exception e) {
        logger.error("Failed to create crawling config: {}", form.crawlingConfigPath, e);
        throwValidationError(messages -> messages.addErrorsFailedToCreateCrawlingConfigAtWizard(GLOBAL), () -> asHtml(path_AdminWizard_AdminWizardConfigJsp));
        return null;
    }
}
Also used : Constants(org.codelibs.fess.Constants) WebConfigService(org.codelibs.fess.app.service.WebConfigService) StringUtils(org.apache.commons.lang3.StringUtils) ActionRuntime(org.lastaflute.web.ruts.process.ActionRuntime) FessAdminAction(org.codelibs.fess.app.web.base.FessAdminAction) LaJobUnique(org.lastaflute.job.key.LaJobUnique) CharUtil(org.codelibs.fess.crawler.util.CharUtil) FileConfig(org.codelibs.fess.es.config.exentity.FileConfig) WebConfig(org.codelibs.fess.es.config.exentity.WebConfig) HtmlResponse(org.lastaflute.web.response.HtmlResponse) ScheduledJob(org.codelibs.fess.es.config.exentity.ScheduledJob) Secured(org.codelibs.fess.annotation.Secured) DynamicProperties(org.codelibs.core.misc.DynamicProperties) ProcessHelper(org.codelibs.fess.helper.ProcessHelper) Resource(javax.annotation.Resource) StringUtil(org.codelibs.core.lang.StringUtil) URLEncoder(java.net.URLEncoder) List(java.util.List) Logger(org.apache.logging.log4j.Logger) ComponentUtil(org.codelibs.fess.util.ComponentUtil) ScheduledJobService(org.codelibs.fess.app.service.ScheduledJobService) JobManager(org.lastaflute.job.JobManager) FileConfigService(org.codelibs.fess.app.service.FileConfigService) Execute(org.lastaflute.web.Execute) UnsupportedEncodingException(java.io.UnsupportedEncodingException) LogManager(org.apache.logging.log4j.LogManager) FileConfig(org.codelibs.fess.es.config.exentity.FileConfig) UnsupportedEncodingException(java.io.UnsupportedEncodingException) WebConfig(org.codelibs.fess.es.config.exentity.WebConfig) UnsupportedEncodingException(java.io.UnsupportedEncodingException)

Example 4 with FileConfig

use of org.codelibs.fess.es.config.exentity.FileConfig in project fess by codelibs.

the class AdminFileconfigAction method getFileConfig.

public static OptionalEntity<FileConfig> getFileConfig(final CreateForm form) {
    final SystemHelper systemHelper = ComponentUtil.getSystemHelper();
    final String username = systemHelper.getUsername();
    final long currentTime = systemHelper.getCurrentTimeAsLong();
    return getEntity(form, username, currentTime).map(entity -> {
        entity.setUpdatedBy(username);
        entity.setUpdatedTime(currentTime);
        copyBeanToBean(form, entity, op -> op.exclude(Stream.concat(Stream.of(Constants.COMMON_CONVERSION_RULE), Stream.of(Constants.PERMISSIONS, Constants.VIRTUAL_HOSTS)).toArray(n -> new String[n])));
        final PermissionHelper permissionHelper = ComponentUtil.getPermissionHelper();
        entity.setPermissions(split(form.permissions, "\n").get(stream -> stream.map(s -> permissionHelper.encode(s)).filter(StringUtil::isNotBlank).distinct().toArray(n -> new String[n])));
        entity.setVirtualHosts(split(form.virtualHosts, "\n").get(stream -> stream.filter(StringUtil::isNotBlank).distinct().map(String::trim).toArray(n -> new String[n])));
        return entity;
    });
}
Also used : Constants(org.codelibs.fess.Constants) OptionalThing(org.dbflute.optional.OptionalThing) PermissionHelper(org.codelibs.fess.helper.PermissionHelper) ActionRuntime(org.lastaflute.web.ruts.process.ActionRuntime) RenderDataUtil(org.codelibs.fess.util.RenderDataUtil) StreamUtil.split(org.codelibs.core.stream.StreamUtil.split) CrudMode(org.codelibs.fess.app.web.CrudMode) FessAdminAction(org.codelibs.fess.app.web.base.FessAdminAction) RenderData(org.lastaflute.web.response.render.RenderData) RoleTypeService(org.codelibs.fess.app.service.RoleTypeService) FileConfig(org.codelibs.fess.es.config.exentity.FileConfig) HtmlResponse(org.lastaflute.web.response.HtmlResponse) ConfigType(org.codelibs.fess.es.config.exentity.CrawlingConfig.ConfigType) Secured(org.codelibs.fess.annotation.Secured) StreamUtil.stream(org.codelibs.core.stream.StreamUtil.stream) OptionalEntity(org.dbflute.optional.OptionalEntity) Resource(javax.annotation.Resource) StringUtil(org.codelibs.core.lang.StringUtil) Collectors(java.util.stream.Collectors) FileConfigPager(org.codelibs.fess.app.pager.FileConfigPager) LabelTypeService(org.codelibs.fess.app.service.LabelTypeService) Stream(java.util.stream.Stream) ComponentUtil(org.codelibs.fess.util.ComponentUtil) SystemHelper(org.codelibs.fess.helper.SystemHelper) FileConfigService(org.codelibs.fess.app.service.FileConfigService) Execute(org.lastaflute.web.Execute) SystemHelper(org.codelibs.fess.helper.SystemHelper) PermissionHelper(org.codelibs.fess.helper.PermissionHelper) StringUtil(org.codelibs.core.lang.StringUtil)

Example 5 with FileConfig

use of org.codelibs.fess.es.config.exentity.FileConfig in project fess by codelibs.

the class WebFsIndexHelper method doCrawl.

protected void doCrawl(final String sessionId, final List<WebConfig> webConfigList, final List<FileConfig> fileConfigList) {
    final int multiprocessCrawlingCount = ComponentUtil.getFessConfig().getCrawlingThreadCount();
    final SystemHelper systemHelper = ComponentUtil.getSystemHelper();
    final FessConfig fessConfig = ComponentUtil.getFessConfig();
    final long startTime = System.currentTimeMillis();
    final List<String> sessionIdList = new ArrayList<>();
    crawlerList.clear();
    final List<String> crawlerStatusList = new ArrayList<>();
    // Web
    for (final WebConfig webConfig : webConfigList) {
        final String sid = ComponentUtil.getCrawlingConfigHelper().store(sessionId, webConfig);
        // create crawler
        final Crawler crawler = ComponentUtil.getComponent(Crawler.class);
        crawler.setSessionId(sid);
        sessionIdList.add(sid);
        final String urlsStr = webConfig.getUrls();
        if (StringUtil.isBlank(urlsStr)) {
            logger.warn("No target urls. Skipped");
            break;
        }
        // interval time
        final int intervalTime = webConfig.getIntervalTime() != null ? webConfig.getIntervalTime() : Constants.DEFAULT_INTERVAL_TIME_FOR_WEB;
        ((FessIntervalController) crawler.getIntervalController()).setDelayMillisForWaitingNewUrl(intervalTime);
        final String includedUrlsStr = webConfig.getIncludedUrls() != null ? webConfig.getIncludedUrls() : StringUtil.EMPTY;
        final String excludedUrlsStr = webConfig.getExcludedUrls() != null ? webConfig.getExcludedUrls() : StringUtil.EMPTY;
        // num of threads
        final CrawlerContext crawlerContext = crawler.getCrawlerContext();
        final int numOfThread = webConfig.getNumOfThread() != null ? webConfig.getNumOfThread() : Constants.DEFAULT_NUM_OF_THREAD_FOR_WEB;
        crawlerContext.setNumOfThread(numOfThread);
        // depth
        final int depth = webConfig.getDepth() != null ? webConfig.getDepth() : -1;
        crawlerContext.setMaxDepth(depth);
        // max count
        final long maxCount = webConfig.getMaxAccessCount() != null ? webConfig.getMaxAccessCount() : maxAccessCount;
        crawlerContext.setMaxAccessCount(maxCount);
        webConfig.initializeClientFactory(() -> crawler.getClientFactory());
        final Map<String, String> configParamMap = webConfig.getConfigParameterMap(ConfigName.CONFIG);
        if (Constants.TRUE.equalsIgnoreCase(configParamMap.get(Config.CLEANUP_ALL))) {
            deleteCrawlData(sid);
        } else if (Constants.TRUE.equalsIgnoreCase(configParamMap.get(Config.CLEANUP_URL_FILTERS))) {
            final EsUrlFilterService urlFilterService = ComponentUtil.getComponent(EsUrlFilterService.class);
            try {
                urlFilterService.delete(sid);
            } catch (final Exception e) {
                logger.warn("Failed to delete url filters for {}", sid);
            }
        }
        final DuplicateHostHelper duplicateHostHelper = ComponentUtil.getDuplicateHostHelper();
        // set urls
        split(urlsStr, "[\r\n]").of(stream -> stream.filter(StringUtil::isNotBlank).map(String::trim).distinct().forEach(urlValue -> {
            if (!urlValue.startsWith("#") && fessConfig.isValidCrawlerWebProtocol(urlValue)) {
                final String u = duplicateHostHelper.convert(urlValue);
                crawler.addUrl(u);
                if (logger.isInfoEnabled()) {
                    logger.info("Target URL: {}", u);
                }
            }
        }));
        // set included urls
        split(includedUrlsStr, "[\r\n]").of(stream -> stream.filter(StringUtil::isNotBlank).map(String::trim).forEach(urlValue -> {
            if (!urlValue.startsWith("#")) {
                crawler.addIncludeFilter(urlValue);
                if (logger.isInfoEnabled()) {
                    logger.info("Included URL: {}", urlValue);
                }
            }
        }));
        // set excluded urls
        split(excludedUrlsStr, "[\r\n]").of(stream -> stream.filter(StringUtil::isNotBlank).map(String::trim).forEach(urlValue -> {
            if (!urlValue.startsWith("#")) {
                crawler.addExcludeFilter(urlValue);
                if (logger.isInfoEnabled()) {
                    logger.info("Excluded URL: {}", urlValue);
                }
            }
        }));
        // failure url
        final List<String> excludedUrlList = ComponentUtil.getCrawlingConfigHelper().getExcludedUrlList(webConfig.getConfigId());
        if (excludedUrlList != null) {
            excludedUrlList.stream().filter(StringUtil::isNotBlank).map(String::trim).distinct().forEach(u -> {
                final String urlValue = Pattern.quote(u);
                crawler.addExcludeFilter(urlValue);
                if (logger.isInfoEnabled()) {
                    logger.info("Excluded URL from failures: {}", urlValue);
                }
            });
        }
        if (logger.isDebugEnabled()) {
            logger.debug("Crawling {}", urlsStr);
        }
        crawler.setBackground(true);
        crawler.setThreadPriority(crawlerPriority);
        crawlerList.add(crawler);
        crawlerStatusList.add(Constants.READY);
    }
    // File
    for (final FileConfig fileConfig : fileConfigList) {
        final String sid = ComponentUtil.getCrawlingConfigHelper().store(sessionId, fileConfig);
        // create crawler
        final Crawler crawler = ComponentUtil.getComponent(Crawler.class);
        crawler.setSessionId(sid);
        sessionIdList.add(sid);
        final String pathsStr = fileConfig.getPaths();
        if (StringUtil.isBlank(pathsStr)) {
            logger.warn("No target uris. Skipped");
            break;
        }
        final int intervalTime = fileConfig.getIntervalTime() != null ? fileConfig.getIntervalTime() : Constants.DEFAULT_INTERVAL_TIME_FOR_FS;
        ((FessIntervalController) crawler.getIntervalController()).setDelayMillisForWaitingNewUrl(intervalTime);
        final String includedPathsStr = fileConfig.getIncludedPaths() != null ? fileConfig.getIncludedPaths() : StringUtil.EMPTY;
        final String excludedPathsStr = fileConfig.getExcludedPaths() != null ? fileConfig.getExcludedPaths() : StringUtil.EMPTY;
        // num of threads
        final CrawlerContext crawlerContext = crawler.getCrawlerContext();
        final int numOfThread = fileConfig.getNumOfThread() != null ? fileConfig.getNumOfThread() : Constants.DEFAULT_NUM_OF_THREAD_FOR_FS;
        crawlerContext.setNumOfThread(numOfThread);
        // depth
        final int depth = fileConfig.getDepth() != null ? fileConfig.getDepth() : -1;
        crawlerContext.setMaxDepth(depth);
        // max count
        final long maxCount = fileConfig.getMaxAccessCount() != null ? fileConfig.getMaxAccessCount() : maxAccessCount;
        crawlerContext.setMaxAccessCount(maxCount);
        fileConfig.initializeClientFactory(() -> crawler.getClientFactory());
        final Map<String, String> configParamMap = fileConfig.getConfigParameterMap(ConfigName.CONFIG);
        if (Constants.TRUE.equalsIgnoreCase(configParamMap.get(Config.CLEANUP_ALL))) {
            deleteCrawlData(sid);
        } else if (Constants.TRUE.equalsIgnoreCase(configParamMap.get(Config.CLEANUP_URL_FILTERS))) {
            final EsUrlFilterService urlFilterService = ComponentUtil.getComponent(EsUrlFilterService.class);
            try {
                urlFilterService.delete(sid);
            } catch (final Exception e) {
                logger.warn("Failed to delete url filters for {}", sid);
            }
        }
        // set paths
        split(pathsStr, "[\r\n]").of(stream -> stream.filter(StringUtil::isNotBlank).map(String::trim).distinct().forEach(urlValue -> {
            if (!urlValue.startsWith("#")) {
                final String u;
                if (!fessConfig.isValidCrawlerFileProtocol(urlValue)) {
                    if (urlValue.startsWith("/")) {
                        u = "file:" + urlValue;
                    } else {
                        u = "file:/" + urlValue;
                    }
                } else {
                    u = urlValue;
                }
                crawler.addUrl(u);
                if (logger.isInfoEnabled()) {
                    logger.info("Target Path: {}", u);
                }
            }
        }));
        // set included paths
        final AtomicBoolean urlEncodeDisabled = new AtomicBoolean(false);
        split(includedPathsStr, "[\r\n]").of(stream -> stream.filter(StringUtil::isNotBlank).map(String::trim).forEach(line -> {
            if (!line.startsWith("#")) {
                final String urlValue;
                if (urlEncodeDisabled.get()) {
                    urlValue = line;
                    urlEncodeDisabled.set(false);
                } else {
                    urlValue = systemHelper.encodeUrlFilter(line);
                }
                crawler.addIncludeFilter(urlValue);
                if (logger.isInfoEnabled()) {
                    logger.info("Included Path: {}", urlValue);
                }
            } else if (line.startsWith("#DISABLE_URL_ENCODE")) {
                urlEncodeDisabled.set(true);
            }
        }));
        // set excluded paths
        urlEncodeDisabled.set(false);
        split(excludedPathsStr, "[\r\n]").of(stream -> stream.filter(StringUtil::isNotBlank).map(String::trim).forEach(line -> {
            if (!line.startsWith("#")) {
                final String urlValue;
                if (urlEncodeDisabled.get()) {
                    urlValue = line;
                    urlEncodeDisabled.set(false);
                } else {
                    urlValue = systemHelper.encodeUrlFilter(line);
                }
                crawler.addExcludeFilter(urlValue);
                if (logger.isInfoEnabled()) {
                    logger.info("Excluded Path: {}", urlValue);
                }
            } else if (line.startsWith("#DISABLE_URL_ENCODE")) {
                urlEncodeDisabled.set(true);
            }
        }));
        // failure url
        final List<String> excludedUrlList = ComponentUtil.getCrawlingConfigHelper().getExcludedUrlList(fileConfig.getConfigId());
        if (excludedUrlList != null) {
            excludedUrlList.stream().filter(StringUtil::isNotBlank).map(String::trim).distinct().forEach(u -> {
                final String urlValue = Pattern.quote(u);
                crawler.addExcludeFilter(urlValue);
                if (logger.isInfoEnabled()) {
                    logger.info("Excluded Path from failures: {}", urlValue);
                }
            });
        }
        if (logger.isDebugEnabled()) {
            logger.debug("Crawling {}", pathsStr);
        }
        crawler.setBackground(true);
        crawler.setThreadPriority(crawlerPriority);
        crawlerList.add(crawler);
        crawlerStatusList.add(Constants.READY);
    }
    // run index update
    final IndexUpdater indexUpdater = ComponentUtil.getIndexUpdater();
    indexUpdater.setName("IndexUpdater");
    indexUpdater.setPriority(indexUpdaterPriority);
    indexUpdater.setSessionIdList(sessionIdList);
    indexUpdater.setDaemon(true);
    indexUpdater.setCrawlerList(crawlerList);
    getAvailableBoostDocumentRuleList().forEach(rule -> {
        indexUpdater.addDocBoostMatcher(new org.codelibs.fess.indexer.DocBoostMatcher(rule));
    });
    indexUpdater.start();
    int startedCrawlerNum = 0;
    int activeCrawlerNum = 0;
    while (startedCrawlerNum < crawlerList.size()) {
        // Force to stop crawl
        if (systemHelper.isForceStop()) {
            for (final Crawler crawler : crawlerList) {
                crawler.stop();
            }
            break;
        }
        if (activeCrawlerNum < multiprocessCrawlingCount) {
            // start crawling
            crawlerList.get(startedCrawlerNum).execute();
            crawlerStatusList.set(startedCrawlerNum, Constants.RUNNING);
            startedCrawlerNum++;
            activeCrawlerNum++;
            ThreadUtil.sleep(crawlingExecutionInterval);
            continue;
        }
        // check status
        for (int i = 0; i < startedCrawlerNum; i++) {
            if (crawlerList.get(i).getCrawlerContext().getStatus() == CrawlerStatus.DONE && Constants.RUNNING.equals(crawlerStatusList.get(i))) {
                crawlerList.get(i).awaitTermination();
                crawlerStatusList.set(i, Constants.DONE);
                final String sid = crawlerList.get(i).getCrawlerContext().getSessionId();
                indexUpdater.addFinishedSessionId(sid);
                activeCrawlerNum--;
            }
        }
        ThreadUtil.sleep(crawlingExecutionInterval);
    }
    boolean finishedAll = false;
    while (!finishedAll) {
        finishedAll = true;
        for (int i = 0; i < crawlerList.size(); i++) {
            crawlerList.get(i).awaitTermination(crawlingExecutionInterval);
            if (crawlerList.get(i).getCrawlerContext().getStatus() == CrawlerStatus.DONE && !Constants.DONE.equals(crawlerStatusList.get(i))) {
                crawlerStatusList.set(i, Constants.DONE);
                final String sid = crawlerList.get(i).getCrawlerContext().getSessionId();
                indexUpdater.addFinishedSessionId(sid);
            }
            if (!Constants.DONE.equals(crawlerStatusList.get(i))) {
                finishedAll = false;
            }
        }
    }
    crawlerList.clear();
    crawlerStatusList.clear();
    // put cralwing info
    final CrawlingInfoHelper crawlingInfoHelper = ComponentUtil.getCrawlingInfoHelper();
    final long execTime = System.currentTimeMillis() - startTime;
    crawlingInfoHelper.putToInfoMap(Constants.WEB_FS_CRAWLING_EXEC_TIME, Long.toString(execTime));
    if (logger.isInfoEnabled()) {
        logger.info("[EXEC TIME] crawling time: {}ms", execTime);
    }
    indexUpdater.setFinishCrawling(true);
    try {
        indexUpdater.join();
    } catch (final InterruptedException e) {
        logger.warn("Interrupted index update.", e);
    }
    crawlingInfoHelper.putToInfoMap(Constants.WEB_FS_INDEX_EXEC_TIME, Long.toString(indexUpdater.getExecuteTime()));
    crawlingInfoHelper.putToInfoMap(Constants.WEB_FS_INDEX_SIZE, Long.toString(indexUpdater.getDocumentSize()));
    if (systemHelper.isForceStop()) {
        return;
    }
    for (final String sid : sessionIdList) {
        // remove config
        ComponentUtil.getCrawlingConfigHelper().remove(sid);
        deleteCrawlData(sid);
    }
}
Also used : ThreadUtil(org.codelibs.core.lang.ThreadUtil) Constants(org.codelibs.fess.Constants) CrawlerContext(org.codelibs.fess.crawler.CrawlerContext) BoostDocumentRuleBhv(org.codelibs.fess.es.config.exbhv.BoostDocumentRuleBhv) EsUrlQueueService(org.codelibs.fess.crawler.service.impl.EsUrlQueueService) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) EsDataService(org.codelibs.fess.crawler.service.impl.EsDataService) IndexUpdater(org.codelibs.fess.indexer.IndexUpdater) ConfigName(org.codelibs.fess.es.config.exentity.CrawlingConfig.ConfigName) ArrayList(java.util.ArrayList) CrawlerStatus(org.codelibs.fess.crawler.CrawlerStatus) FessConfig(org.codelibs.fess.mylasta.direction.FessConfig) StreamUtil.split(org.codelibs.core.stream.StreamUtil.split) Map(java.util.Map) Config(org.codelibs.fess.es.config.exentity.CrawlingConfig.Param.Config) FileConfig(org.codelibs.fess.es.config.exentity.FileConfig) WebConfig(org.codelibs.fess.es.config.exentity.WebConfig) Crawler(org.codelibs.fess.crawler.Crawler) EsUrlFilterService(org.codelibs.fess.crawler.service.impl.EsUrlFilterService) StringUtil(org.codelibs.core.lang.StringUtil) BoostDocumentRule(org.codelibs.fess.es.config.exentity.BoostDocumentRule) List(java.util.List) Logger(org.apache.logging.log4j.Logger) ComponentUtil(org.codelibs.fess.util.ComponentUtil) Pattern(java.util.regex.Pattern) Collections(java.util.Collections) LogManager(org.apache.logging.log4j.LogManager) FessIntervalController(org.codelibs.fess.crawler.interval.FessIntervalController) ArrayList(java.util.ArrayList) WebConfig(org.codelibs.fess.es.config.exentity.WebConfig) FessIntervalController(org.codelibs.fess.crawler.interval.FessIntervalController) EsUrlFilterService(org.codelibs.fess.crawler.service.impl.EsUrlFilterService) StringUtil(org.codelibs.core.lang.StringUtil) IndexUpdater(org.codelibs.fess.indexer.IndexUpdater) FileConfig(org.codelibs.fess.es.config.exentity.FileConfig) Crawler(org.codelibs.fess.crawler.Crawler) FessConfig(org.codelibs.fess.mylasta.direction.FessConfig) CrawlerContext(org.codelibs.fess.crawler.CrawlerContext) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean)

Aggregations

FileConfig (org.codelibs.fess.es.config.exentity.FileConfig)9 StringUtil (org.codelibs.core.lang.StringUtil)7 Constants (org.codelibs.fess.Constants)7 List (java.util.List)6 ComponentUtil (org.codelibs.fess.util.ComponentUtil)6 Execute (org.lastaflute.web.Execute)6 Collectors (java.util.stream.Collectors)5 Resource (javax.annotation.Resource)5 FileConfigPager (org.codelibs.fess.app.pager.FileConfigPager)5 FileConfigService (org.codelibs.fess.app.service.FileConfigService)5 StreamUtil.stream (org.codelibs.core.stream.StreamUtil.stream)4 CrudMode (org.codelibs.fess.app.web.CrudMode)4 AdminFileconfigAction.getFileConfig (org.codelibs.fess.app.web.admin.fileconfig.AdminFileconfigAction.getFileConfig)4 ApiResult (org.codelibs.fess.app.web.api.ApiResult)4 ArrayList (java.util.ArrayList)3 Map (java.util.Map)3 LogManager (org.apache.logging.log4j.LogManager)3 Logger (org.apache.logging.log4j.Logger)3 StreamUtil.split (org.codelibs.core.stream.StreamUtil.split)3 ApiConfigResponse (org.codelibs.fess.app.web.api.ApiResult.ApiConfigResponse)3