Search in sources :

Example 1 with AbstractHTMLCleaner

use of org.asqatasun.contentadapter.html.AbstractHTMLCleaner in project Asqatasun by Asqatasun.

the class ContentsAdapterImpl method run.

private Collection<Content> run(Collection<Content> contentList) {
    Collection<Content> localResult = new ArrayList<>();
    for (Content content : contentList) {
        // We only handle here the fetched content (HttpStatus=200)
        if (content instanceof SSP && content.getHttpStatusCode() == 200) {
            Logger.getLogger(this.getClass()).debug("Adapting " + content.getURI());
            SSP ssp = (SSP) content;
            ssp.setDoctype(DocumentCaseInsensitiveAdapter.extractDoctypeDeclaration(ssp.getSource()));
            String dirtyHtml;
            if (xmlizeContent) {
                dirtyHtml = DocumentCaseInsensitiveAdapter.removeDoctypeDeclaration(ssp.getSource());
            } else {
                dirtyHtml = ssp.getSource();
            }
            htmlCleaner.setDirtyHTML(dirtyHtml);
            htmlCleaner.run();
            ssp.setAdaptedContent(htmlCleaner.getResult());
            htmlCleaner.setDirtyHTML(null);
            if (writeCleanHtmlInFile) {
                writeCleanDomInFile(ssp);
            }
            if (parseAndRetrievelRelatedContent) {
                htmlParser.setSSP(ssp);
                htmlParser.run();
            } else {
                Logger.getLogger(this.getClass()).debug("no Html parse executed for the current audit");
            }
            if (xmlizeContent) {
                AbstractHTMLCleaner cleaner = new HTMLCleanerImpl();
                cleaner.setDirtyHTML(ssp.getAdaptedContent());
                cleaner.run();
                ssp.setAdaptedContent(DocumentCaseInsensitiveAdapter.removeLowerCaseTags(cleaner.getResult()));
            }
            localResult.add(ssp);
        }
    }
    return localResult;
}
Also used : SSP(org.asqatasun.entity.audit.SSP) HTMLCleanerImpl(org.asqatasun.contentadapter.html.HTMLCleanerImpl) Content(org.asqatasun.entity.audit.Content) AbstractHTMLCleaner(org.asqatasun.contentadapter.html.AbstractHTMLCleaner)

Aggregations

AbstractHTMLCleaner (org.asqatasun.contentadapter.html.AbstractHTMLCleaner)1 HTMLCleanerImpl (org.asqatasun.contentadapter.html.HTMLCleanerImpl)1 Content (org.asqatasun.entity.audit.Content)1 SSP (org.asqatasun.entity.audit.SSP)1