Search in sources :

Example 1 with ParserFactory

use of org.apache.tika.batch.ParserFactory in project tika by apache.

the class BasicTikaFSConsumersBuilder method build.

@Override
public ConsumersManager build(Node node, Map<String, String> runtimeAttributes, ArrayBlockingQueue<FileResource> queue) {
    //figure out if we're building a recursiveParserWrapper
    boolean recursiveParserWrapper = false;
    String recursiveParserWrapperString = runtimeAttributes.get("recursiveParserWrapper");
    if (recursiveParserWrapperString != null) {
        recursiveParserWrapper = PropsUtil.getBoolean(recursiveParserWrapperString, recursiveParserWrapper);
    } else {
        Node recursiveParserWrapperNode = node.getAttributes().getNamedItem("recursiveParserWrapper");
        if (recursiveParserWrapperNode != null) {
            recursiveParserWrapper = PropsUtil.getBoolean(recursiveParserWrapperNode.getNodeValue(), recursiveParserWrapper);
        }
    }
    //how long to let the consumersManager run on init() and shutdown()
    Long consumersManagerMaxMillis = null;
    String consumersManagerMaxMillisString = runtimeAttributes.get("consumersManagerMaxMillis");
    if (consumersManagerMaxMillisString != null) {
        consumersManagerMaxMillis = PropsUtil.getLong(consumersManagerMaxMillisString, null);
    } else {
        Node consumersManagerMaxMillisNode = node.getAttributes().getNamedItem("consumersManagerMaxMillis");
        if (consumersManagerMaxMillis == null && consumersManagerMaxMillisNode != null) {
            consumersManagerMaxMillis = PropsUtil.getLong(consumersManagerMaxMillisNode.getNodeValue(), null);
        }
    }
    TikaConfig config = null;
    String tikaConfigPath = runtimeAttributes.get("c");
    if (tikaConfigPath == null) {
        Node tikaConfigNode = node.getAttributes().getNamedItem("tikaConfig");
        if (tikaConfigNode != null) {
            tikaConfigPath = PropsUtil.getString(tikaConfigNode.getNodeValue(), null);
        }
    }
    if (tikaConfigPath != null) {
        try (InputStream is = Files.newInputStream(Paths.get(tikaConfigPath))) {
            config = new TikaConfig(is);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    } else {
        config = TikaConfig.getDefaultConfig();
    }
    List<FileResourceConsumer> consumers = new LinkedList<FileResourceConsumer>();
    int numConsumers = BatchProcessBuilder.getNumConsumers(runtimeAttributes);
    NodeList nodeList = node.getChildNodes();
    Node contentHandlerFactoryNode = null;
    Node parserFactoryNode = null;
    Node outputStreamFactoryNode = null;
    for (int i = 0; i < nodeList.getLength(); i++) {
        Node child = nodeList.item(i);
        String cn = child.getNodeName();
        if (cn.equals("parser")) {
            parserFactoryNode = child;
        } else if (cn.equals("contenthandler")) {
            contentHandlerFactoryNode = child;
        } else if (cn.equals("outputstream")) {
            outputStreamFactoryNode = child;
        }
    }
    if (contentHandlerFactoryNode == null || parserFactoryNode == null || outputStreamFactoryNode == null) {
        throw new RuntimeException("You must specify a ContentHandlerFactory, " + "a ParserFactory and an OutputStreamFactory");
    }
    ContentHandlerFactory contentHandlerFactory = getContentHandlerFactory(contentHandlerFactoryNode, runtimeAttributes);
    ParserFactory parserFactory = getParserFactory(parserFactoryNode, runtimeAttributes);
    OutputStreamFactory outputStreamFactory = getOutputStreamFactory(outputStreamFactoryNode, runtimeAttributes, contentHandlerFactory, recursiveParserWrapper);
    if (recursiveParserWrapper) {
        for (int i = 0; i < numConsumers; i++) {
            FileResourceConsumer c = new RecursiveParserWrapperFSConsumer(queue, parserFactory, contentHandlerFactory, outputStreamFactory, config);
            consumers.add(c);
        }
    } else {
        for (int i = 0; i < numConsumers; i++) {
            FileResourceConsumer c = new BasicTikaFSConsumer(queue, parserFactory, contentHandlerFactory, outputStreamFactory, config);
            consumers.add(c);
        }
    }
    ConsumersManager manager = new FSConsumersManager(consumers);
    if (consumersManagerMaxMillis != null) {
        manager.setConsumersManagerMaxMillis(consumersManagerMaxMillis);
    }
    return manager;
}
Also used : ContentHandlerFactory(org.apache.tika.sax.ContentHandlerFactory) BasicContentHandlerFactory(org.apache.tika.sax.BasicContentHandlerFactory) FSConsumersManager(org.apache.tika.batch.fs.FSConsumersManager) RecursiveParserWrapperFSConsumer(org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer) TikaConfig(org.apache.tika.config.TikaConfig) InputStream(java.io.InputStream) Node(org.w3c.dom.Node) NodeList(org.w3c.dom.NodeList) FSOutputStreamFactory(org.apache.tika.batch.fs.FSOutputStreamFactory) OutputStreamFactory(org.apache.tika.batch.OutputStreamFactory) ParserFactory(org.apache.tika.batch.ParserFactory) LinkedList(java.util.LinkedList) ConsumersManager(org.apache.tika.batch.ConsumersManager) FSConsumersManager(org.apache.tika.batch.fs.FSConsumersManager) BasicTikaFSConsumer(org.apache.tika.batch.fs.BasicTikaFSConsumer) FileResourceConsumer(org.apache.tika.batch.FileResourceConsumer)

Example 2 with ParserFactory

use of org.apache.tika.batch.ParserFactory in project tika by apache.

the class AppParserFactoryBuilder method build.

@Override
public ParserFactory build(Node node, Map<String, String> runtimeAttrs) {
    Map<String, String> localAttrs = XMLDOMUtil.mapifyAttrs(node, runtimeAttrs);
    String className = localAttrs.get("class");
    ParserFactory pf = ClassLoaderUtil.buildClass(ParserFactory.class, className);
    if (localAttrs.containsKey("parseRecursively")) {
        String bString = localAttrs.get("parseRecursively").toLowerCase(Locale.ENGLISH);
        if (bString.equals("true")) {
            pf.setParseRecursively(true);
        } else if (bString.equals("false")) {
            pf.setParseRecursively(false);
        } else {
            throw new RuntimeException("parseRecursively must have value of \"true\" or \"false\": " + bString);
        }
    }
    if (pf instanceof DigestingAutoDetectParserFactory) {
        DigestingParser.Digester d = buildDigester(localAttrs);
        ((DigestingAutoDetectParserFactory) pf).setDigester(d);
    }
    return pf;
}
Also used : ParserFactory(org.apache.tika.batch.ParserFactory) DigestingAutoDetectParserFactory(org.apache.tika.batch.DigestingAutoDetectParserFactory) DigestingAutoDetectParserFactory(org.apache.tika.batch.DigestingAutoDetectParserFactory) DigestingParser(org.apache.tika.parser.DigestingParser)

Example 3 with ParserFactory

use of org.apache.tika.batch.ParserFactory in project tika by apache.

the class ParserFactoryBuilder method build.

@Override
public ParserFactory build(Node node, Map<String, String> runtimeAttrs) {
    Map<String, String> localAttrs = XMLDOMUtil.mapifyAttrs(node, runtimeAttrs);
    String className = localAttrs.get("class");
    ParserFactory pf = ClassLoaderUtil.buildClass(ParserFactory.class, className);
    if (localAttrs.containsKey("parseRecursively")) {
        String bString = localAttrs.get("parseRecursively").toLowerCase(Locale.ENGLISH);
        if (bString.equals("true")) {
            pf.setParseRecursively(true);
        } else if (bString.equals("false")) {
            pf.setParseRecursively(false);
        } else {
            throw new RuntimeException("parseRecursively must have value of \"true\" or \"false\": " + bString);
        }
    }
    return pf;
}
Also used : ParserFactory(org.apache.tika.batch.ParserFactory)

Aggregations

ParserFactory (org.apache.tika.batch.ParserFactory)3 InputStream (java.io.InputStream)1 LinkedList (java.util.LinkedList)1 ConsumersManager (org.apache.tika.batch.ConsumersManager)1 DigestingAutoDetectParserFactory (org.apache.tika.batch.DigestingAutoDetectParserFactory)1 FileResourceConsumer (org.apache.tika.batch.FileResourceConsumer)1 OutputStreamFactory (org.apache.tika.batch.OutputStreamFactory)1 BasicTikaFSConsumer (org.apache.tika.batch.fs.BasicTikaFSConsumer)1 FSConsumersManager (org.apache.tika.batch.fs.FSConsumersManager)1 FSOutputStreamFactory (org.apache.tika.batch.fs.FSOutputStreamFactory)1 RecursiveParserWrapperFSConsumer (org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer)1 TikaConfig (org.apache.tika.config.TikaConfig)1 DigestingParser (org.apache.tika.parser.DigestingParser)1 BasicContentHandlerFactory (org.apache.tika.sax.BasicContentHandlerFactory)1 ContentHandlerFactory (org.apache.tika.sax.ContentHandlerFactory)1 Node (org.w3c.dom.Node)1 NodeList (org.w3c.dom.NodeList)1