Search in sources :

Example 1 with ContentExtractionException

use of org.exist.contentextraction.ContentExtractionException in project exist by eXist-db.

the class ContentFunctions method streamContent.

private Sequence streamContent(ContentExtraction ce, BinaryValue binary, Sequence pathSeq, FunctionReference ref, Map<String, String> mappings, Sequence data) throws XPathException {
    NodePath[] paths = new NodePath[pathSeq.getItemCount()];
    int i = 0;
    for (SequenceIterator iter = pathSeq.iterate(); iter.hasNext(); i++) {
        String path = iter.nextItem().getStringValue();
        paths[i] = new NodePath(mappings, path, false);
    }
    ContentReceiver receiver = new ContentReceiver(context, paths, ref, data);
    try {
        ce.extractContentAndMetadata(binary, receiver);
    } catch (IOException | SAXException | ContentExtractionException ex) {
        LOG.error(ex.getMessage(), ex);
        throw new XPathException(this, ex.getMessage(), ex);
    }
    return receiver.getResult();
}
Also used : SequenceIterator(org.exist.xquery.value.SequenceIterator) XPathException(org.exist.xquery.XPathException) IOException(java.io.IOException) ContentReceiver(org.exist.contentextraction.ContentReceiver) ContentExtractionException(org.exist.contentextraction.ContentExtractionException) NodePath(org.exist.storage.NodePath) SAXException(org.xml.sax.SAXException)

Example 2 with ContentExtractionException

use of org.exist.contentextraction.ContentExtractionException in project exist by eXist-db.

the class ContentFunctions method eval.

@Override
public Sequence eval(Sequence[] args, Sequence contextSequence) throws XPathException {
    // is argument the empty sequence?
    if (args[0].isEmpty()) {
        return Sequence.EMPTY_SEQUENCE;
    }
    ContentExtraction ce = new ContentExtraction();
    if (isCalledAs("stream-content")) {
        /* binary content */
        BinaryValue binary = (BinaryValue) args[0].itemAt(0);
        /* callback function */
        FunctionReference ref = (FunctionReference) args[2].itemAt(0);
        Map<String, String> mappings = new HashMap<>();
        if (args[3].hasOne()) {
            NodeValue namespaces = (NodeValue) args[3].itemAt(0);
            parseMappings(namespaces, mappings);
        }
        return streamContent(ce, binary, args[1], ref, mappings, args[4]);
    } else {
        try {
            if (isCalledAs("get-metadata")) {
                context.pushDocumentContext();
                try {
                    final MemTreeBuilder builder = context.getDocumentBuilder();
                    builder.startDocument();
                    builder.startElement(new QName("html", XHTML_NS), null);
                    builder.startElement(new QName("head", XHTML_NS), null);
                    final QName qnMeta = new QName("meta", XHTML_NS);
                    final Metadata metadata = ce.extractMetadata((BinaryValue) args[0].itemAt(0));
                    for (final String name : metadata.names()) {
                        for (final String value : metadata.getValues(name)) {
                            final AttributesImpl attributes = new AttributesImpl();
                            attributes.addAttribute("", "name", "name", "string", name);
                            attributes.addAttribute("", "content", "content", "string", value);
                            builder.startElement(qnMeta, attributes);
                            builder.endElement();
                        }
                    }
                    builder.endElement();
                    builder.endElement();
                    builder.endDocument();
                    return builder.getDocument();
                } finally {
                    context.popDocumentContext();
                }
            } else {
                final DocumentBuilderReceiver builder = new DocumentBuilderReceiver();
                builder.setSuppressWhitespace(false);
                final Metadata metadata = ce.extractContentAndMetadata((BinaryValue) args[0].itemAt(0), (ContentHandler) builder);
                return (NodeValue) builder.getDocument();
            }
        } catch (IOException | SAXException | ContentExtractionException ex) {
            LOG.error(ex.getMessage(), ex);
            throw new XPathException(this, ex.getMessage(), ex);
        }
    }
}
Also used : ContentExtraction(org.exist.contentextraction.ContentExtraction) NodeValue(org.exist.xquery.value.NodeValue) HashMap(java.util.HashMap) XPathException(org.exist.xquery.XPathException) QName(org.exist.dom.QName) Metadata(org.apache.tika.metadata.Metadata) BinaryValue(org.exist.xquery.value.BinaryValue) IOException(java.io.IOException) DocumentBuilderReceiver(org.exist.dom.memtree.DocumentBuilderReceiver) ContentExtractionException(org.exist.contentextraction.ContentExtractionException) SAXException(org.xml.sax.SAXException) AttributesImpl(org.xml.sax.helpers.AttributesImpl) MemTreeBuilder(org.exist.dom.memtree.MemTreeBuilder) FunctionReference(org.exist.xquery.value.FunctionReference)

Aggregations

IOException (java.io.IOException)2 ContentExtractionException (org.exist.contentextraction.ContentExtractionException)2 XPathException (org.exist.xquery.XPathException)2 SAXException (org.xml.sax.SAXException)2 HashMap (java.util.HashMap)1 Metadata (org.apache.tika.metadata.Metadata)1 ContentExtraction (org.exist.contentextraction.ContentExtraction)1 ContentReceiver (org.exist.contentextraction.ContentReceiver)1 QName (org.exist.dom.QName)1 DocumentBuilderReceiver (org.exist.dom.memtree.DocumentBuilderReceiver)1 MemTreeBuilder (org.exist.dom.memtree.MemTreeBuilder)1 NodePath (org.exist.storage.NodePath)1 BinaryValue (org.exist.xquery.value.BinaryValue)1 FunctionReference (org.exist.xquery.value.FunctionReference)1 NodeValue (org.exist.xquery.value.NodeValue)1 SequenceIterator (org.exist.xquery.value.SequenceIterator)1 AttributesImpl (org.xml.sax.helpers.AttributesImpl)1