Search in sources :

Example 1 with UrlAction

use of org.codelibs.fess.crawler.client.http.action.UrlAction in project fess-crawler by codelibs.

the class WebDriverClient method execute.

@Override
public ResponseData execute(final RequestData request) {
    WebDriver webDriver = null;
    try {
        webDriver = webDriverPool.borrowObject();
        Map<String, String> paramMap = null;
        final String url = request.getUrl();
        final String metaData = request.getMetaData();
        if (StringUtil.isNotBlank(metaData)) {
            paramMap = parseParamMap(metaData);
        }
        if (!url.equals(webDriver.getCurrentUrl())) {
            webDriver.get(url);
        }
        if (logger.isDebugEnabled()) {
            logger.debug("Base URL: " + url + "\nContent: " + webDriver.getPageSource());
        }
        if (paramMap != null) {
            final String processorName = paramMap.get(UrlAction.URL_ACTION);
            final UrlAction urlAction = urlActionMap.get(processorName);
            if (urlAction == null) {
                throw new CrawlerSystemException("Unknown processor: " + processorName);
            }
            urlAction.navigate(webDriver, paramMap);
        }
        final String source = webDriver.getPageSource();
        final ResponseData responseData = new ResponseData();
        responseData.setUrl(webDriver.getCurrentUrl());
        responseData.setMethod(request.getMethod().name());
        responseData.setContentLength(source.length());
        final String charSet = getCharSet(webDriver);
        responseData.setCharSet(charSet);
        responseData.setHttpStatusCode(getStatusCode(webDriver));
        responseData.setLastModified(getLastModified(webDriver));
        responseData.setMimeType(getContentType(webDriver));
        responseData.setResponseBody(source.getBytes(charSet));
        for (final UrlAction urlAction : urlActionMap.values()) {
            urlAction.collect(url, webDriver, responseData);
        }
        return responseData;
    } catch (final Exception e) {
        throw new CrawlerSystemException("Failed to access " + request.getUrl(), e);
    } finally {
        if (webDriver != null) {
            try {
                webDriverPool.returnObject(webDriver);
            } catch (final Exception e) {
                logger.warn("Failed to return a returned object.", e);
            }
        }
    }
}
Also used : WebDriver(org.openqa.selenium.WebDriver) UrlAction(org.codelibs.fess.crawler.client.http.action.UrlAction) CrawlerSystemException(org.codelibs.fess.crawler.exception.CrawlerSystemException) ResponseData(org.codelibs.fess.crawler.entity.ResponseData) CrawlerSystemException(org.codelibs.fess.crawler.exception.CrawlerSystemException)

Aggregations

UrlAction (org.codelibs.fess.crawler.client.http.action.UrlAction)1 ResponseData (org.codelibs.fess.crawler.entity.ResponseData)1 CrawlerSystemException (org.codelibs.fess.crawler.exception.CrawlerSystemException)1 WebDriver (org.openqa.selenium.WebDriver)1