Search in sources :

Example 1 with CrawlerException

use of org.asqatasun.crawler.exception.CrawlerException in project Asqatasun by Asqatasun.

the class AsqatasunCrawlJob method initializeCrawlContext.

/**
     * 
     * @param url
     * @param crawlParameterSet
     * @param heritrixFileName
     * @return
     */
private File initializeCrawlContext(Collection<String> urlList, Set<Parameter> crawlParameterSet, String heritrixFileName) {
    buildOutputDirectory();
    BufferedReader in = null;
    FileWriter fw = null;
    try {
        LOGGER.debug("crawlConfigFilePath: " + crawlConfigFilePath + " for copy");
        String filepath = crawlConfigFilePath + "/" + heritrixFileName;
        DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
        Document doc = docBuilder.parse(filepath);
        if (LOGGER.isDebugEnabled()) {
            LOGGER.debug("filepath : " + filepath);
            for (Parameter param : crawlParameterSet) {
                LOGGER.debug(param.getParameterElement().getParameterElementCode() + " " + param.getValue());
            }
        }
        doc = setOptionToDocument(urlList, crawlParameterSet, doc);
        //write the content into xml file
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        DOMSource source = new DOMSource(doc);
        String resultFileName = currentJobOutputDir.getPath() + "/" + heritrixFileName;
        StreamResult result = new StreamResult(new File(resultFileName));
        transformer.transform(source, result);
    } catch (IOException | ParserConfigurationException | SAXException ex) {
        LOGGER.error(ex);
        throw new CrawlerException(ex);
    } catch (TransformerConfigurationException ex) {
        LOGGER.error(ex);
        throw new CrawlerException(ex);
    } catch (TransformerException ex) {
        LOGGER.error(ex);
        throw new CrawlerException(ex);
    } finally {
        if (in != null) {
            try {
                in.close();
            } catch (IOException ex) {
                LOGGER.error(ex);
                throw new CrawlerException(ex);
            }
        }
        if (fw != null) {
            try {
                fw.close();
            } catch (IOException ex) {
                LOGGER.error(ex);
                throw new CrawlerException(ex);
            }
        }
    }
    return new File(currentJobOutputDir.getPath() + "/" + heritrixFileName);
}
Also used : DOMSource(javax.xml.transform.dom.DOMSource) DocumentBuilderFactory(javax.xml.parsers.DocumentBuilderFactory) TransformerFactory(javax.xml.transform.TransformerFactory) Transformer(javax.xml.transform.Transformer) StreamResult(javax.xml.transform.stream.StreamResult) TransformerConfigurationException(javax.xml.transform.TransformerConfigurationException) FileWriter(java.io.FileWriter) IOException(java.io.IOException) Document(org.w3c.dom.Document) CrawlerException(org.asqatasun.crawler.exception.CrawlerException) SAXException(org.xml.sax.SAXException) DocumentBuilder(javax.xml.parsers.DocumentBuilder) BufferedReader(java.io.BufferedReader) Parameter(org.asqatasun.entity.parameterization.Parameter) ParserConfigurationException(javax.xml.parsers.ParserConfigurationException) File(java.io.File) TransformerException(javax.xml.transform.TransformerException)

Aggregations

BufferedReader (java.io.BufferedReader)1 File (java.io.File)1 FileWriter (java.io.FileWriter)1 IOException (java.io.IOException)1 DocumentBuilder (javax.xml.parsers.DocumentBuilder)1 DocumentBuilderFactory (javax.xml.parsers.DocumentBuilderFactory)1 ParserConfigurationException (javax.xml.parsers.ParserConfigurationException)1 Transformer (javax.xml.transform.Transformer)1 TransformerConfigurationException (javax.xml.transform.TransformerConfigurationException)1 TransformerException (javax.xml.transform.TransformerException)1 TransformerFactory (javax.xml.transform.TransformerFactory)1 DOMSource (javax.xml.transform.dom.DOMSource)1 StreamResult (javax.xml.transform.stream.StreamResult)1 CrawlerException (org.asqatasun.crawler.exception.CrawlerException)1 Parameter (org.asqatasun.entity.parameterization.Parameter)1 Document (org.w3c.dom.Document)1 SAXException (org.xml.sax.SAXException)1