Search in sources :

Example 16 with CompositeParser

use of org.apache.tika.parser.CompositeParser in project tika by apache.

the class TikaCLI method displaySupportedTypes.

/**
     * Prints all the known media types, aliases and matching parser classes.
     */
private void displaySupportedTypes() {
    AutoDetectParser parser = new AutoDetectParser();
    MediaTypeRegistry registry = parser.getMediaTypeRegistry();
    Map<MediaType, Parser> parsers = parser.getParsers();
    for (MediaType type : registry.getTypes()) {
        System.out.println(type);
        for (MediaType alias : registry.getAliases(type)) {
            System.out.println("  alias:     " + alias);
        }
        MediaType supertype = registry.getSupertype(type);
        if (supertype != null) {
            System.out.println("  supertype: " + supertype);
        }
        Parser p = parsers.get(type);
        if (p != null) {
            if (p instanceof CompositeParser) {
                p = ((CompositeParser) p).getParsers().get(type);
            }
            System.out.println("  parser:    " + p.getClass().getName());
        }
    }
}
Also used : CompositeParser(org.apache.tika.parser.CompositeParser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) MediaType(org.apache.tika.mime.MediaType) MediaTypeRegistry(org.apache.tika.mime.MediaTypeRegistry) Parser(org.apache.tika.parser.Parser) CompositeParser(org.apache.tika.parser.CompositeParser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) DigestingParser(org.apache.tika.parser.DigestingParser) NetworkParser(org.apache.tika.parser.NetworkParser) ForkParser(org.apache.tika.fork.ForkParser)

Example 17 with CompositeParser

use of org.apache.tika.parser.CompositeParser in project tika by apache.

the class TikaMimeTypes method getMediaTypes.

protected List<MediaTypeDetails> getMediaTypes() {
    MediaTypeRegistry registry = TikaResource.getConfig().getMediaTypeRegistry();
    Map<MediaType, Parser> parsers = ((CompositeParser) TikaResource.getConfig().getParser()).getParsers();
    List<MediaTypeDetails> types = new ArrayList<TikaMimeTypes.MediaTypeDetails>(registry.getTypes().size());
    for (MediaType type : registry.getTypes()) {
        MediaTypeDetails details = new MediaTypeDetails();
        details.type = type;
        details.aliases = registry.getAliases(type).toArray(new MediaType[0]);
        MediaType supertype = registry.getSupertype(type);
        if (supertype != null && !MediaType.OCTET_STREAM.equals(supertype)) {
            details.supertype = supertype;
        }
        Parser p = parsers.get(type);
        if (p != null) {
            if (p instanceof CompositeParser) {
                p = ((CompositeParser) p).getParsers().get(type);
            }
            details.parser = p.getClass().getName();
        }
        types.add(details);
    }
    return types;
}
Also used : CompositeParser(org.apache.tika.parser.CompositeParser) ArrayList(java.util.ArrayList) MediaType(org.apache.tika.mime.MediaType) MediaTypeRegistry(org.apache.tika.mime.MediaTypeRegistry) Parser(org.apache.tika.parser.Parser) CompositeParser(org.apache.tika.parser.CompositeParser)

Aggregations

CompositeParser (org.apache.tika.parser.CompositeParser)17 Parser (org.apache.tika.parser.Parser)16 Test (org.junit.Test)10 MediaType (org.apache.tika.mime.MediaType)9 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)9 DefaultParser (org.apache.tika.parser.DefaultParser)7 ParserDecorator (org.apache.tika.parser.ParserDecorator)7 TikaConfig (org.apache.tika.config.TikaConfig)6 EmptyParser (org.apache.tika.parser.EmptyParser)5 XMLParser (org.apache.tika.parser.xml.XMLParser)4 InputStream (java.io.InputStream)3 TikaConfigTest (org.apache.tika.config.TikaConfigTest)3 TikaException (org.apache.tika.exception.TikaException)3 ForkParser (org.apache.tika.fork.ForkParser)3 TikaInputStream (org.apache.tika.io.TikaInputStream)3 ExecutableParser (org.apache.tika.parser.executable.ExecutableParser)3 TesseractOCRParser (org.apache.tika.parser.ocr.TesseractOCRParser)3 TikaTest (org.apache.tika.TikaTest)2 Metadata (org.apache.tika.metadata.Metadata)2 MediaTypeRegistry (org.apache.tika.mime.MediaTypeRegistry)2