Search in sources :

Example 1 with Export

use of org.structr.core.Export in project structr by structr.

the class SourcePattern method extract.

@Export
public void extract(final Map<String, Object> parameters) throws FrameworkException {
    final SourcePage page = getProperty(sourcePageProperty);
    if (page == null) {
        throw new FrameworkException(422, "Pattern has no source page, exiting.");
    }
    final String selector = getProperty(selectorProperty);
    if (selector == null) {
        throw new FrameworkException(422, "Pattern has no selector, exiting.");
    }
    final Long from = getProperty(fromProperty);
    final Long to = getProperty(toProperty);
    final List<SourcePattern> subPatterns = getProperty(subPatternsProperty);
    Document doc = null;
    NodeInterface parentObj = null;
    if (parameters.containsKey("object")) {
        parentObj = (NodeInterface) parameters.get("object");
    }
    if (parameters.containsKey("document")) {
        doc = (Document) parameters.get("document");
    } else {
        final String url = page.getProperty(SourcePage.url);
        if (url == null) {
            throw new FrameworkException(422, "This pattern's source page has no URL, exiting.");
        }
        // Get the content from the URL
        final String content = getContent(url);
        // Parse the document with Jsoup and extract the elements matched by the given selector
        doc = Jsoup.parse(content);
    }
    final String mappedType = getProperty(mappedTypeProperty);
    if (mappedType == null) {
        throw new FrameworkException(422, "No mapped type given, exiting.");
    }
    final Elements parts = doc.select(selector);
    // Loop through all elements found for this pattern; if a start index is given, start at this element
    for (int i = (from != null ? from.intValue() : 1); i <= (to != null ? to : parts.size()); i++) {
        // If no object was given (from a higher-level pattern), create a new object of the given type
        final NodeInterface obj = (parentObj == null ? create(mappedType) : parentObj);
        if (subPatterns.size() > 0) {
            // Loop through the sub patterns of this pattern
            for (final SourcePattern subPattern : subPatterns) {
                final String subSelector = selector + ":nth-child(" + i + ") > " + subPattern.getProperty(SourcePattern.selectorProperty);
                final String subPatternMappedAttribute = subPattern.getProperty(SourcePattern.mappedAttributeProperty);
                final String subPatternMappedAttributeFunction = subPattern.getProperty(SourcePattern.mappedAttributeFunctionProperty);
                final SourcePage subPatternSubPage = subPattern.getProperty(SourcePattern.subPageProperty);
                extractAndSetValue(obj, doc, subSelector, mappedType, subPatternMappedAttribute, subPatternMappedAttributeFunction, subPatternSubPage);
            }
        } else {
            final String mappedAttribute = getProperty(mappedAttributeProperty);
            final String mappedAttributeFunction = getProperty(mappedAttributeFunctionProperty);
            extractAndSetValue(obj, doc, selector, mappedType, mappedAttribute, mappedAttributeFunction, null);
        }
    }
}
Also used : FrameworkException(org.structr.common.error.FrameworkException) Document(org.jsoup.nodes.Document) Elements(org.jsoup.select.Elements) NodeInterface(org.structr.core.graph.NodeInterface) Export(org.structr.core.Export)

Aggregations

Document (org.jsoup.nodes.Document)1 Elements (org.jsoup.select.Elements)1 FrameworkException (org.structr.common.error.FrameworkException)1 Export (org.structr.core.Export)1 NodeInterface (org.structr.core.graph.NodeInterface)1