Search in sources :

Example 16 with ExtractException

use of org.codelibs.fess.crawler.exception.ExtractException in project fess-crawler by codelibs.

the class AbstractXmlExtractor method getEncoding.

protected String getEncoding(final BufferedInputStream bis) {
    final byte[] b = new byte[preloadSizeForCharset];
    try {
        bis.mark(preloadSizeForCharset);
        final int c = bis.read(b);
        if (c == -1) {
            return encoding;
        }
        final String head = new String(b, 0, c, encoding);
        if (StringUtil.isBlank(head)) {
            return encoding;
        }
        final Matcher matcher = getEncodingPattern().matcher(head);
        if (matcher.find()) {
            final String enc = matcher.group(1);
            if (Charset.isSupported(enc)) {
                return enc;
            }
        }
    } catch (final Exception e) {
        if (logger.isInfoEnabled()) {
            logger.info("Use a default encoding: " + encoding, e);
        }
    } finally {
        try {
            bis.reset();
        } catch (final IOException e) {
            throw new ExtractException(e);
        }
    }
    return encoding;
}
Also used : ExtractException(org.codelibs.fess.crawler.exception.ExtractException) Matcher(java.util.regex.Matcher) IOException(java.io.IOException) IOException(java.io.IOException) ExtractException(org.codelibs.fess.crawler.exception.ExtractException) CrawlerSystemException(org.codelibs.fess.crawler.exception.CrawlerSystemException)

Aggregations

ExtractException (org.codelibs.fess.crawler.exception.ExtractException)16 ExtractData (org.codelibs.fess.crawler.entity.ExtractData)11 CrawlerSystemException (org.codelibs.fess.crawler.exception.CrawlerSystemException)10 IOException (java.io.IOException)9 File (java.io.File)5 HashMap (java.util.HashMap)5 Extractor (org.codelibs.fess.crawler.extractor.Extractor)5 BufferedInputStream (java.io.BufferedInputStream)4 InputStream (java.io.InputStream)3 BufferedReader (java.io.BufferedReader)2 BufferedWriter (java.io.BufferedWriter)2 ByteArrayInputStream (java.io.ByteArrayInputStream)2 FileInputStream (java.io.FileInputStream)2 FileOutputStream (java.io.FileOutputStream)2 InputStreamReader (java.io.InputStreamReader)2 OutputStreamWriter (java.io.OutputStreamWriter)2 Reader (java.io.Reader)2 Map (java.util.Map)2 MessagingException (javax.mail.MessagingException)2 ArchiveInputStream (org.apache.commons.compress.archivers.ArchiveInputStream)2