Search in sources :

Example 6 with MimeTypes

use of org.apache.tika.mime.MimeTypes in project tika by apache.

the class TikaDetectorsTest method testGetHTML.

@Test
public void testGetHTML() throws Exception {
    Response response = WebClient.create(endPoint + DETECTORS_PATH).type("text/html").accept("text/html").get();
    String text = getStringFromInputStream((InputStream) response.getEntity());
    assertContains("<h2>DefaultDetector</h2>", text);
    assertContains("Composite", text);
    assertContains("<h3>OggDetector", text);
    assertContains("<h3>POIFSContainerDetector", text);
    assertContains("<h3>MimeTypes", text);
    assertContains(OggDetector.class.getName(), text);
    assertContains(POIFSContainerDetector.class.getName(), text);
    assertContains(ZipContainerDetector.class.getName(), text);
    assertContains(MimeTypes.class.getName(), text);
}
Also used : Response(javax.ws.rs.core.Response) POIFSContainerDetector(org.apache.tika.parser.microsoft.POIFSContainerDetector) ZipContainerDetector(org.apache.tika.parser.pkg.ZipContainerDetector) OggDetector(org.gagravarr.tika.OggDetector) MimeTypes(org.apache.tika.mime.MimeTypes) Test(org.junit.Test)

Example 7 with MimeTypes

use of org.apache.tika.mime.MimeTypes in project tika by apache.

the class MyFirstTika method parseUsingComponents.

public static String parseUsingComponents(String filename, TikaConfig tikaConfig, Metadata metadata) throws Exception {
    MimeTypes mimeRegistry = tikaConfig.getMimeRepository();
    System.out.println("Examining: [" + filename + "]");
    metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
    System.out.println("The MIME type (based on filename) is: [" + mimeRegistry.detect(null, metadata) + "]");
    InputStream stream = TikaInputStream.get(new File(filename));
    System.out.println("The MIME type (based on MAGIC) is: [" + mimeRegistry.detect(stream, metadata) + "]");
    stream = TikaInputStream.get(new File(filename));
    Detector detector = tikaConfig.getDetector();
    System.out.println("The MIME type (based on the Detector interface) is: [" + detector.detect(stream, metadata) + "]");
    LanguageDetector langDetector = new OptimaizeLangDetector().loadModels();
    LanguageResult lang = langDetector.detect(FileUtils.readFileToString(new File(filename), UTF_8));
    System.out.println("The language of this content is: [" + lang.getLanguage() + "]");
    // Get a non-detecting parser that handles all the types it can
    Parser parser = tikaConfig.getParser();
    // Tell it what we think the content is
    MediaType type = detector.detect(stream, metadata);
    metadata.set(Metadata.CONTENT_TYPE, type.toString());
    // Have the file parsed to get the content and metadata
    ContentHandler handler = new BodyContentHandler();
    parser.parse(stream, handler, metadata, new ParseContext());
    return handler.toString();
}
Also used : LanguageDetector(org.apache.tika.language.detect.LanguageDetector) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) LanguageDetector(org.apache.tika.language.detect.LanguageDetector) Detector(org.apache.tika.detect.Detector) OptimaizeLangDetector(org.apache.tika.langdetect.OptimaizeLangDetector) LanguageResult(org.apache.tika.language.detect.LanguageResult) TikaInputStream(org.apache.tika.io.TikaInputStream) InputStream(java.io.InputStream) ParseContext(org.apache.tika.parser.ParseContext) OptimaizeLangDetector(org.apache.tika.langdetect.OptimaizeLangDetector) MediaType(org.apache.tika.mime.MediaType) MimeTypes(org.apache.tika.mime.MimeTypes) File(java.io.File) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ContentHandler(org.xml.sax.ContentHandler) Parser(org.apache.tika.parser.Parser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser)

Example 8 with MimeTypes

use of org.apache.tika.mime.MimeTypes in project tika by apache.

the class EmbeddedDocumentUtil method getExtension.

public String getExtension(TikaInputStream is, Metadata metadata) {
    String mimeString = metadata.get(Metadata.CONTENT_TYPE);
    TikaConfig config = getConfig();
    MimeType mimeType = null;
    MimeTypes types = config.getMimeRepository();
    boolean detected = false;
    if (mimeString != null) {
        try {
            mimeType = types.forName(mimeString);
        } catch (MimeTypeException e) {
        //swallow
        }
    }
    if (mimeType == null) {
        Detector detector = config.getDetector();
        try {
            MediaType mediaType = detector.detect(is, metadata);
            mimeType = types.forName(mediaType.toString());
            detected = true;
            is.reset();
        } catch (IOException e) {
        //swallow
        } catch (MimeTypeException e) {
        //swallow
        }
    }
    if (mimeType != null) {
        if (detected) {
            //set or correct the mime type
            metadata.set(Metadata.CONTENT_TYPE, mimeType.toString());
        }
        return mimeType.getExtension();
    }
    return ".bin";
}
Also used : Detector(org.apache.tika.detect.Detector) TikaConfig(org.apache.tika.config.TikaConfig) MimeTypeException(org.apache.tika.mime.MimeTypeException) MediaType(org.apache.tika.mime.MediaType) IOException(java.io.IOException) MimeTypes(org.apache.tika.mime.MimeTypes) MimeType(org.apache.tika.mime.MimeType)

Example 9 with MimeTypes

use of org.apache.tika.mime.MimeTypes in project tika by apache.

the class CustomMimeInfo method customMimeInfo.

public static String customMimeInfo() throws Exception {
    String path = "file:///path/to/prescription-type.xml";
    MimeTypes typeDatabase = MimeTypesFactory.create(new URL(path));
    Tika tika = new Tika(typeDatabase);
    String type = tika.detect("/path/to/prescription.xpd");
    return type;
}
Also used : MimeTypes(org.apache.tika.mime.MimeTypes) Tika(org.apache.tika.Tika) URL(java.net.URL)

Example 10 with MimeTypes

use of org.apache.tika.mime.MimeTypes in project tika by apache.

the class CustomMimeInfo method customCompositeDetector.

public static String customCompositeDetector() throws Exception {
    String path = "file:///path/to/prescription-type.xml";
    MimeTypes typeDatabase = MimeTypesFactory.create(new URL(path));
    Tika tika = new Tika(new CompositeDetector(typeDatabase, new EncryptedPrescriptionDetector()));
    String type = tika.detect("/path/to/tmp/prescription.xpd");
    return type;
}
Also used : CompositeDetector(org.apache.tika.detect.CompositeDetector) MimeTypes(org.apache.tika.mime.MimeTypes) Tika(org.apache.tika.Tika) URL(java.net.URL)

Aggregations

MimeTypes (org.apache.tika.mime.MimeTypes)10 MimeType (org.apache.tika.mime.MimeType)4 IOException (java.io.IOException)3 TikaConfig (org.apache.tika.config.TikaConfig)3 MediaType (org.apache.tika.mime.MediaType)3 BufferedInputStream (java.io.BufferedInputStream)2 File (java.io.File)2 URL (java.net.URL)2 Tika (org.apache.tika.Tika)2 Detector (org.apache.tika.detect.Detector)2 Metadata (org.apache.tika.metadata.Metadata)2 MimeTypeException (org.apache.tika.mime.MimeTypeException)2 CommonsMultipartFile (org.springframework.web.multipart.commons.CommonsMultipartFile)2 BufferedReader (java.io.BufferedReader)1 FileInputStream (java.io.FileInputStream)1 InputStream (java.io.InputStream)1 InputStreamReader (java.io.InputStreamReader)1 HashSet (java.util.HashSet)1 TreeSet (java.util.TreeSet)1 Response (javax.ws.rs.core.Response)1