Search in sources :

Example 31 with CrawlerSystemException

use of org.codelibs.fess.crawler.exception.CrawlerSystemException in project fess-crawler by codelibs.

the class HtmlTransformer method storeData.

protected void storeData(final ResponseData responseData, final ResultData resultData) {
    try (final InputStream is = responseData.getResponseBody()) {
        final byte[] data = InputStreamUtil.getBytes(is);
        resultData.setData(data);
        resultData.setEncoding(responseData.getCharSet());
    } catch (final CrawlerSystemException e) {
        throw e;
    } catch (final Exception e) {
        throw new CrawlerSystemException("Could not store data.", e);
    }
}
Also used : BufferedInputStream(java.io.BufferedInputStream) InputStream(java.io.InputStream) CrawlerSystemException(org.codelibs.fess.crawler.exception.CrawlerSystemException) CrawlingAccessException(org.codelibs.fess.crawler.exception.CrawlingAccessException) TransformerException(javax.xml.transform.TransformerException) CrawlerSystemException(org.codelibs.fess.crawler.exception.CrawlerSystemException) MalformedURLException(java.net.MalformedURLException) IOException(java.io.IOException) UnsupportedEncodingException(java.io.UnsupportedEncodingException)

Example 32 with CrawlerSystemException

use of org.codelibs.fess.crawler.exception.CrawlerSystemException in project fess-crawler by codelibs.

the class XmlTransformer method getData.

/**
 * Returns data as XML content of String.
 *
 * @return XML content of String.
 */
@Override
public Object getData(final AccessResultData<?> accessResultData) {
    if (dataClass == null) {
        // check transformer name
        if (!getName().equals(accessResultData.getTransformerName())) {
            throw new CrawlerSystemException("Transformer is invalid. Use " + accessResultData.getTransformerName() + ". This transformer is " + getName() + ".");
        }
        final byte[] data = accessResultData.getData();
        if (data == null) {
            return null;
        }
        final String encoding = accessResultData.getEncoding();
        try {
            return new String(data, encoding == null ? Constants.UTF_8 : encoding);
        } catch (final UnsupportedEncodingException e) {
            if (logger.isInfoEnabled()) {
                logger.info("Invalid charsetName: " + encoding + ". Changed to " + Constants.UTF_8, e);
            }
            return new String(data, Constants.UTF_8_CHARSET);
        }
    }
    final Map<String, Object> dataMap = XmlUtil.getDataMap(accessResultData);
    if (Map.class.equals(dataClass)) {
        return dataMap;
    }
    try {
        final Object obj = dataClass.newInstance();
        BeanUtil.copyMapToBean(dataMap, obj);
        return obj;
    } catch (final Exception e) {
        throw new CrawlerSystemException("Could not create/copy a data map to " + dataClass, e);
    }
}
Also used : CrawlerSystemException(org.codelibs.fess.crawler.exception.CrawlerSystemException) UnsupportedEncodingException(java.io.UnsupportedEncodingException) CrawlingAccessException(org.codelibs.fess.crawler.exception.CrawlingAccessException) TransformerException(javax.xml.transform.TransformerException) CrawlerSystemException(org.codelibs.fess.crawler.exception.CrawlerSystemException) UnsupportedEncodingException(java.io.UnsupportedEncodingException)

Example 33 with CrawlerSystemException

use of org.codelibs.fess.crawler.exception.CrawlerSystemException in project fess-crawler by codelibs.

the class HostIntervalController method delayBeforeProcessing.

/*
     * (non-Javadoc)
     *
     * @see org.codelibs.fess.crawler.interval.impl.AbstractIntervalController#
     * delayBeforeProcessing()
     */
@Override
protected void delayBeforeProcessing() {
    final UrlQueue<?> urlQueue = CrawlingParameterUtil.getUrlQueue();
    if (urlQueue == null) {
        return;
    }
    final String url = urlQueue.getUrl();
    if (StringUtil.isBlank(url) || url.startsWith("file:")) {
        // not target
        return;
    }
    try {
        final URL u = new URL(url);
        final String host = u.getHost();
        if (host == null) {
            return;
        }
        final AtomicLong lastTime = lastTimes.putIfAbsent(host, new AtomicLong(SystemUtil.currentTimeMillis()));
        if (lastTime == null) {
            return;
        }
        synchronized (lastTime) {
            while (true) {
                final long currentTime = SystemUtil.currentTimeMillis();
                final long delayTime = lastTime.get() + delayMillisBeforeProcessing - currentTime;
                if (delayTime <= 0) {
                    lastTime.set(currentTime);
                    break;
                }
                lastTime.wait(delayTime);
            }
        }
    } catch (final Exception e) {
        throw new CrawlerSystemException(e);
    }
}
Also used : AtomicLong(java.util.concurrent.atomic.AtomicLong) CrawlerSystemException(org.codelibs.fess.crawler.exception.CrawlerSystemException) URL(java.net.URL) CrawlerSystemException(org.codelibs.fess.crawler.exception.CrawlerSystemException)

Example 34 with CrawlerSystemException

use of org.codelibs.fess.crawler.exception.CrawlerSystemException in project fess-crawler by codelibs.

the class BinaryTransformerTest method test_getData_wrongName.

public void test_getData_wrongName() throws Exception {
    final AccessResultDataImpl accessResultData = new AccessResultDataImpl();
    accessResultData.setTransformerName("transformer");
    accessResultData.setData("xyz".getBytes());
    try {
        binaryTransformer.getData(accessResultData);
        fail();
    } catch (final CrawlerSystemException e) {
    }
}
Also used : CrawlerSystemException(org.codelibs.fess.crawler.exception.CrawlerSystemException) AccessResultDataImpl(org.codelibs.fess.crawler.entity.AccessResultDataImpl)

Example 35 with CrawlerSystemException

use of org.codelibs.fess.crawler.exception.CrawlerSystemException in project fess-crawler by codelibs.

the class FileTransformerTest method test_getData_wrongName.

public void test_getData_wrongName() throws Exception {
    final AccessResultDataImpl accessResultDataImpl = new AccessResultDataImpl();
    accessResultDataImpl.setData("hoge.txt".getBytes());
    accessResultDataImpl.setEncoding(Constants.UTF_8);
    accessResultDataImpl.setTransformerName("transformer");
    setBaseDir();
    try {
        final Object obj = fileTransformer.getData(accessResultDataImpl);
        fail();
    } catch (final CrawlerSystemException e) {
    }
}
Also used : CrawlerSystemException(org.codelibs.fess.crawler.exception.CrawlerSystemException) AccessResultDataImpl(org.codelibs.fess.crawler.entity.AccessResultDataImpl)

Aggregations

CrawlerSystemException (org.codelibs.fess.crawler.exception.CrawlerSystemException)41 IOException (java.io.IOException)16 CrawlingAccessException (org.codelibs.fess.crawler.exception.CrawlingAccessException)13 File (java.io.File)11 InputStream (java.io.InputStream)11 UnsupportedEncodingException (java.io.UnsupportedEncodingException)10 BufferedInputStream (java.io.BufferedInputStream)9 ExtractException (org.codelibs.fess.crawler.exception.ExtractException)9 ExtractData (org.codelibs.fess.crawler.entity.ExtractData)8 ResponseData (org.codelibs.fess.crawler.entity.ResponseData)8 Map (java.util.Map)7 MaxLengthExceededException (org.codelibs.fess.crawler.exception.MaxLengthExceededException)7 MalformedURLException (java.net.MalformedURLException)6 HashMap (java.util.HashMap)6 AccessResultDataImpl (org.codelibs.fess.crawler.entity.AccessResultDataImpl)6 RequestData (org.codelibs.fess.crawler.entity.RequestData)6 ResultData (org.codelibs.fess.crawler.entity.ResultData)6 ChildUrlsException (org.codelibs.fess.crawler.exception.ChildUrlsException)6 HashSet (java.util.HashSet)5 TransformerException (javax.xml.transform.TransformerException)5