Search in sources :

Example 1 with ConsumersManager

use of org.apache.tika.batch.ConsumersManager in project tika by apache.

the class BatchProcessBuilder method build.

/**
     * Builds a FileResourceBatchProcessor from runtime arguments and a
     * document node of a configuration file.  With the exception of the QueueBuilder,
     * the builders choose how to adjudicate between
     * runtime arguments and the elements in the configuration file.
     *
     * @param docElement   document element of the xml config file
     * @param incomingRuntimeAttributes runtime arguments
     * @return FileResourceBatchProcessor
     */
public BatchProcess build(Node docElement, Map<String, String> incomingRuntimeAttributes) {
    //key components
    long timeoutThresholdMillis = XMLDOMUtil.getLong("timeoutThresholdMillis", incomingRuntimeAttributes, docElement);
    long timeoutCheckPulseMillis = XMLDOMUtil.getLong("timeoutCheckPulseMillis", incomingRuntimeAttributes, docElement);
    long pauseOnEarlyTerminationMillis = XMLDOMUtil.getLong("pauseOnEarlyTerminationMillis", incomingRuntimeAttributes, docElement);
    int maxAliveTimeSeconds = XMLDOMUtil.getInt("maxAliveTimeSeconds", incomingRuntimeAttributes, docElement);
    FileResourceCrawler crawler = null;
    ConsumersManager consumersManager = null;
    StatusReporter reporter = null;
    Interrupter interrupter = null;
    /*
         * TODO: This is a bit smelly.  NumConsumers needs to be used by the crawler
         * and the consumers.  This copies the incomingRuntimeAttributes and then
         * supplies the numConsumers from the commandline (if it exists) or from the config file
         * At least this creates an unmodifiable defensive copy of incomingRuntimeAttributes...
         */
    Map<String, String> runtimeAttributes = setNumConsumersInRuntimeAttributes(docElement, incomingRuntimeAttributes);
    //build queue
    ArrayBlockingQueue<FileResource> queue = buildQueue(docElement, runtimeAttributes);
    NodeList children = docElement.getChildNodes();
    Map<String, Node> keyNodes = new HashMap<String, Node>();
    for (int i = 0; i < children.getLength(); i++) {
        Node child = children.item(i);
        if (child.getNodeType() != Node.ELEMENT_NODE) {
            continue;
        }
        String nodeName = child.getNodeName();
        keyNodes.put(nodeName, child);
    }
    //build consumers
    consumersManager = buildConsumersManager(keyNodes.get("consumers"), runtimeAttributes, queue);
    //build crawler
    crawler = buildCrawler(queue, keyNodes.get("crawler"), runtimeAttributes);
    reporter = buildReporter(crawler, consumersManager, keyNodes.get("reporter"), runtimeAttributes);
    interrupter = buildInterrupter(keyNodes.get("interrupter"), runtimeAttributes);
    BatchProcess proc = new BatchProcess(crawler, consumersManager, reporter, interrupter);
    if (timeoutThresholdMillis > -1) {
        proc.setTimeoutThresholdMillis(timeoutThresholdMillis);
    }
    if (pauseOnEarlyTerminationMillis > -1) {
        proc.setPauseOnEarlyTerminationMillis(pauseOnEarlyTerminationMillis);
    }
    if (timeoutCheckPulseMillis > -1) {
        proc.setTimeoutCheckPulseMillis(timeoutCheckPulseMillis);
    }
    proc.setMaxAliveTimeSeconds(maxAliveTimeSeconds);
    return proc;
}
Also used : Interrupter(org.apache.tika.batch.Interrupter) FileResourceCrawler(org.apache.tika.batch.FileResourceCrawler) HashMap(java.util.HashMap) NodeList(org.w3c.dom.NodeList) Node(org.w3c.dom.Node) BatchProcess(org.apache.tika.batch.BatchProcess) FileResource(org.apache.tika.batch.FileResource) ConsumersManager(org.apache.tika.batch.ConsumersManager) StatusReporter(org.apache.tika.batch.StatusReporter)

Example 2 with ConsumersManager

use of org.apache.tika.batch.ConsumersManager in project tika by apache.

the class MockConsumersBuilder method build.

@Override
public ConsumersManager build(Node node, Map<String, String> runtimeAttributes, ArrayBlockingQueue<FileResource> queue) {
    ConsumersManager manager = super.build(node, runtimeAttributes, queue);
    boolean hangOnInit = runtimeAttributes.containsKey("hangOnInit");
    boolean hangOnShutdown = runtimeAttributes.containsKey("hangOnShutdown");
    return new MockConsumersManager(manager, hangOnInit, hangOnShutdown);
}
Also used : ConsumersManager(org.apache.tika.batch.ConsumersManager)

Example 3 with ConsumersManager

use of org.apache.tika.batch.ConsumersManager in project tika by apache.

the class BasicTikaFSConsumersBuilder method build.

@Override
public ConsumersManager build(Node node, Map<String, String> runtimeAttributes, ArrayBlockingQueue<FileResource> queue) {
    //figure out if we're building a recursiveParserWrapper
    boolean recursiveParserWrapper = false;
    String recursiveParserWrapperString = runtimeAttributes.get("recursiveParserWrapper");
    if (recursiveParserWrapperString != null) {
        recursiveParserWrapper = PropsUtil.getBoolean(recursiveParserWrapperString, recursiveParserWrapper);
    } else {
        Node recursiveParserWrapperNode = node.getAttributes().getNamedItem("recursiveParserWrapper");
        if (recursiveParserWrapperNode != null) {
            recursiveParserWrapper = PropsUtil.getBoolean(recursiveParserWrapperNode.getNodeValue(), recursiveParserWrapper);
        }
    }
    //how long to let the consumersManager run on init() and shutdown()
    Long consumersManagerMaxMillis = null;
    String consumersManagerMaxMillisString = runtimeAttributes.get("consumersManagerMaxMillis");
    if (consumersManagerMaxMillisString != null) {
        consumersManagerMaxMillis = PropsUtil.getLong(consumersManagerMaxMillisString, null);
    } else {
        Node consumersManagerMaxMillisNode = node.getAttributes().getNamedItem("consumersManagerMaxMillis");
        if (consumersManagerMaxMillis == null && consumersManagerMaxMillisNode != null) {
            consumersManagerMaxMillis = PropsUtil.getLong(consumersManagerMaxMillisNode.getNodeValue(), null);
        }
    }
    TikaConfig config = null;
    String tikaConfigPath = runtimeAttributes.get("c");
    if (tikaConfigPath == null) {
        Node tikaConfigNode = node.getAttributes().getNamedItem("tikaConfig");
        if (tikaConfigNode != null) {
            tikaConfigPath = PropsUtil.getString(tikaConfigNode.getNodeValue(), null);
        }
    }
    if (tikaConfigPath != null) {
        try (InputStream is = Files.newInputStream(Paths.get(tikaConfigPath))) {
            config = new TikaConfig(is);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    } else {
        config = TikaConfig.getDefaultConfig();
    }
    List<FileResourceConsumer> consumers = new LinkedList<FileResourceConsumer>();
    int numConsumers = BatchProcessBuilder.getNumConsumers(runtimeAttributes);
    NodeList nodeList = node.getChildNodes();
    Node contentHandlerFactoryNode = null;
    Node parserFactoryNode = null;
    Node outputStreamFactoryNode = null;
    for (int i = 0; i < nodeList.getLength(); i++) {
        Node child = nodeList.item(i);
        String cn = child.getNodeName();
        if (cn.equals("parser")) {
            parserFactoryNode = child;
        } else if (cn.equals("contenthandler")) {
            contentHandlerFactoryNode = child;
        } else if (cn.equals("outputstream")) {
            outputStreamFactoryNode = child;
        }
    }
    if (contentHandlerFactoryNode == null || parserFactoryNode == null || outputStreamFactoryNode == null) {
        throw new RuntimeException("You must specify a ContentHandlerFactory, " + "a ParserFactory and an OutputStreamFactory");
    }
    ContentHandlerFactory contentHandlerFactory = getContentHandlerFactory(contentHandlerFactoryNode, runtimeAttributes);
    ParserFactory parserFactory = getParserFactory(parserFactoryNode, runtimeAttributes);
    OutputStreamFactory outputStreamFactory = getOutputStreamFactory(outputStreamFactoryNode, runtimeAttributes, contentHandlerFactory, recursiveParserWrapper);
    if (recursiveParserWrapper) {
        for (int i = 0; i < numConsumers; i++) {
            FileResourceConsumer c = new RecursiveParserWrapperFSConsumer(queue, parserFactory, contentHandlerFactory, outputStreamFactory, config);
            consumers.add(c);
        }
    } else {
        for (int i = 0; i < numConsumers; i++) {
            FileResourceConsumer c = new BasicTikaFSConsumer(queue, parserFactory, contentHandlerFactory, outputStreamFactory, config);
            consumers.add(c);
        }
    }
    ConsumersManager manager = new FSConsumersManager(consumers);
    if (consumersManagerMaxMillis != null) {
        manager.setConsumersManagerMaxMillis(consumersManagerMaxMillis);
    }
    return manager;
}
Also used : ContentHandlerFactory(org.apache.tika.sax.ContentHandlerFactory) BasicContentHandlerFactory(org.apache.tika.sax.BasicContentHandlerFactory) FSConsumersManager(org.apache.tika.batch.fs.FSConsumersManager) RecursiveParserWrapperFSConsumer(org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer) TikaConfig(org.apache.tika.config.TikaConfig) InputStream(java.io.InputStream) Node(org.w3c.dom.Node) NodeList(org.w3c.dom.NodeList) FSOutputStreamFactory(org.apache.tika.batch.fs.FSOutputStreamFactory) OutputStreamFactory(org.apache.tika.batch.OutputStreamFactory) ParserFactory(org.apache.tika.batch.ParserFactory) LinkedList(java.util.LinkedList) ConsumersManager(org.apache.tika.batch.ConsumersManager) FSConsumersManager(org.apache.tika.batch.fs.FSConsumersManager) BasicTikaFSConsumer(org.apache.tika.batch.fs.BasicTikaFSConsumer) FileResourceConsumer(org.apache.tika.batch.FileResourceConsumer)

Aggregations

ConsumersManager (org.apache.tika.batch.ConsumersManager)3 Node (org.w3c.dom.Node)2 NodeList (org.w3c.dom.NodeList)2 InputStream (java.io.InputStream)1 HashMap (java.util.HashMap)1 LinkedList (java.util.LinkedList)1 BatchProcess (org.apache.tika.batch.BatchProcess)1 FileResource (org.apache.tika.batch.FileResource)1 FileResourceConsumer (org.apache.tika.batch.FileResourceConsumer)1 FileResourceCrawler (org.apache.tika.batch.FileResourceCrawler)1 Interrupter (org.apache.tika.batch.Interrupter)1 OutputStreamFactory (org.apache.tika.batch.OutputStreamFactory)1 ParserFactory (org.apache.tika.batch.ParserFactory)1 StatusReporter (org.apache.tika.batch.StatusReporter)1 BasicTikaFSConsumer (org.apache.tika.batch.fs.BasicTikaFSConsumer)1 FSConsumersManager (org.apache.tika.batch.fs.FSConsumersManager)1 FSOutputStreamFactory (org.apache.tika.batch.fs.FSOutputStreamFactory)1 RecursiveParserWrapperFSConsumer (org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer)1 TikaConfig (org.apache.tika.config.TikaConfig)1 BasicContentHandlerFactory (org.apache.tika.sax.BasicContentHandlerFactory)1