Search in sources :

Example 61 with TikaConfig

use of org.apache.tika.config.TikaConfig in project mylyn.docs by eclipse.

the class EPUBFileUtil method getMimeType.

/**
 * Attempts to figure out the MIME-type for the file.
 *
 * @param file
 *            the file to determine MIME-type for
 * @return the MIME-type or <code>application/octet-stream</code>
 */
public static String getMimeType(File file) {
    try {
        if (tika == null) {
            tika = new TikaConfig();
        }
        Metadata metadata = new Metadata();
        metadata.set(TikaMetadataKeys.RESOURCE_NAME_KEY, file.getName());
        MediaType detect = tika.getDetector().detect(TikaInputStream.get(file), metadata);
        return detect.toString();
    } catch (IOException e) {
        throw new RuntimeException(e);
    } catch (TikaException e) {
        throw new RuntimeException(e);
    }
}
Also used : TikaException(org.apache.tika.exception.TikaException) TikaConfig(org.apache.tika.config.TikaConfig) Metadata(org.apache.tika.metadata.Metadata) MediaType(org.apache.tika.mime.MediaType) IOException(java.io.IOException)

Example 62 with TikaConfig

use of org.apache.tika.config.TikaConfig in project jackrabbit by apache.

the class SearchIndex method createParser.

private Parser createParser() {
    URL url = null;
    if (tikaConfigPath != null) {
        File file = new File(tikaConfigPath);
        if (file.exists()) {
            try {
                url = file.toURI().toURL();
            } catch (MalformedURLException e) {
                log.warn("Invalid Tika configuration path: " + file, e);
            }
        } else {
            ClassLoader loader = SearchIndex.class.getClassLoader();
            url = loader.getResource(tikaConfigPath);
        }
    }
    if (url == null) {
        url = SearchIndex.class.getResource("tika-config.xml");
    }
    TikaConfig config = null;
    if (url != null) {
        try {
            config = new TikaConfig(url);
        } catch (Exception e) {
            log.warn("Tika configuration not available: " + url, e);
        }
    }
    if (config == null) {
        config = TikaConfig.getDefaultConfig();
    }
    if (forkJavaCommand != null) {
        ForkParser forkParser = new ForkParser(SearchIndex.class.getClassLoader(), new AutoDetectParser(config));
        forkParser.setJavaCommand(forkJavaCommand);
        forkParser.setPoolSize(extractorPoolSize);
        return forkParser;
    } else {
        return new AutoDetectParser(config);
    }
}
Also used : ForkParser(org.apache.tika.fork.ForkParser) MalformedURLException(java.net.MalformedURLException) TikaConfig(org.apache.tika.config.TikaConfig) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) File(java.io.File) URL(java.net.URL) FileSystemException(org.apache.jackrabbit.core.fs.FileSystemException) SAXException(org.xml.sax.SAXException) JournalException(org.apache.jackrabbit.core.journal.JournalException) NoSuchItemStateException(org.apache.jackrabbit.core.state.NoSuchItemStateException) RepositoryException(javax.jcr.RepositoryException) MalformedURLException(java.net.MalformedURLException) IOException(java.io.IOException) ItemStateException(org.apache.jackrabbit.core.state.ItemStateException) ParserConfigurationException(javax.xml.parsers.ParserConfigurationException) InvalidQueryException(javax.jcr.query.InvalidQueryException)

Aggregations

TikaConfig (org.apache.tika.config.TikaConfig)62 Test (org.junit.Test)32 Metadata (org.apache.tika.metadata.Metadata)26 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)20 TikaTest (org.apache.tika.TikaTest)16 InputStream (java.io.InputStream)12 Tika (org.apache.tika.Tika)12 IOException (java.io.IOException)10 URL (java.net.URL)10 TikaException (org.apache.tika.exception.TikaException)9 TikaInputStream (org.apache.tika.io.TikaInputStream)9 ParseContext (org.apache.tika.parser.ParseContext)9 Parser (org.apache.tika.parser.Parser)9 MediaType (org.apache.tika.mime.MediaType)8 CompositeParser (org.apache.tika.parser.CompositeParser)8 ByteArrayInputStream (java.io.ByteArrayInputStream)7 File (java.io.File)6 TikaConfigTest (org.apache.tika.config.TikaConfigTest)6 HashSet (java.util.HashSet)5 SAXException (org.xml.sax.SAXException)5