Search in sources :

Example 1 with MetadataList

use of org.apache.tika.server.MetadataList in project tika by apache.

the class RecursiveMetadataResource method parseMetadata.

private MetadataList parseMetadata(InputStream is, MultivaluedMap<String, String> httpHeaders, UriInfo info, String handlerTypeName) throws Exception {
    final Metadata metadata = new Metadata();
    final ParseContext context = new ParseContext();
    Parser parser = TikaResource.createParser();
    // TODO: parameterize choice of max chars/max embedded attachments
    BasicContentHandlerFactory.HANDLER_TYPE type = BasicContentHandlerFactory.parseHandlerType(handlerTypeName, DEFAULT_HANDLER_TYPE);
    RecursiveParserWrapper wrapper = new RecursiveParserWrapper(parser, new BasicContentHandlerFactory(type, -1));
    TikaResource.fillMetadata(parser, metadata, context, httpHeaders);
    // no need to add parser to parse recursively
    TikaResource.fillParseContext(context, httpHeaders, null);
    TikaResource.logRequest(LOG, info, metadata);
    TikaResource.parse(wrapper, LOG, info.getPath(), is, new LanguageHandler() {

        public void endDocument() {
            metadata.set("language", getLanguage().getLanguage());
        }
    }, metadata, context);
    return new MetadataList(wrapper.getMetadata());
}
Also used : MetadataList(org.apache.tika.server.MetadataList) LanguageHandler(org.apache.tika.language.detect.LanguageHandler) BasicContentHandlerFactory(org.apache.tika.sax.BasicContentHandlerFactory) Metadata(org.apache.tika.metadata.Metadata) ParseContext(org.apache.tika.parser.ParseContext) RecursiveParserWrapper(org.apache.tika.parser.RecursiveParserWrapper) Parser(org.apache.tika.parser.Parser)

Aggregations

LanguageHandler (org.apache.tika.language.detect.LanguageHandler)1 Metadata (org.apache.tika.metadata.Metadata)1 ParseContext (org.apache.tika.parser.ParseContext)1 Parser (org.apache.tika.parser.Parser)1 RecursiveParserWrapper (org.apache.tika.parser.RecursiveParserWrapper)1 BasicContentHandlerFactory (org.apache.tika.sax.BasicContentHandlerFactory)1 MetadataList (org.apache.tika.server.MetadataList)1