Search in sources :

Example 1 with OpusParser

use of org.gagravarr.tika.OpusParser in project tika by apache.

the class AutoDetectParserTest method testOggFlacAudio.

/**
     * Test to ensure that the Ogg Audio parsers (Vorbis, Opus, Flac etc)
     *  have been correctly included, and are available
     */
@SuppressWarnings("deprecation")
@Test
public void testOggFlacAudio() throws Exception {
    // The three test files should all have similar test data
    String[] testFiles = new String[] { "testVORBIS.ogg", "testFLAC.flac", "testFLAC.oga", "testOPUS.opus" };
    MediaType[] mediaTypes = new MediaType[] { MediaType.parse(OGG_VORBIS), MediaType.parse(FLAC_NATIVE), MediaType.parse(OGG_FLAC), MediaType.parse(OGG_OPUS) };
    // Check we can load the parsers, and they claim to do the right things
    VorbisParser vParser = new VorbisParser();
    assertNotNull("Parser not found for " + mediaTypes[0], vParser.getSupportedTypes(new ParseContext()));
    FlacParser fParser = new FlacParser();
    assertNotNull("Parser not found for " + mediaTypes[1], fParser.getSupportedTypes(new ParseContext()));
    assertNotNull("Parser not found for " + mediaTypes[2], fParser.getSupportedTypes(new ParseContext()));
    OpusParser oParser = new OpusParser();
    assertNotNull("Parser not found for " + mediaTypes[3], oParser.getSupportedTypes(new ParseContext()));
    // Check we found the parser
    CompositeParser parser = (CompositeParser) tika.getParser();
    for (MediaType mt : mediaTypes) {
        assertNotNull("Parser not found for " + mt, parser.getParsers().get(mt));
    }
    // Have each file parsed, and check
    for (int i = 0; i < testFiles.length; i++) {
        String file = testFiles[i];
        try (InputStream input = AutoDetectParserTest.class.getResourceAsStream("/test-documents/" + file)) {
            if (input == null) {
                fail("Could not find test file " + file);
            }
            Metadata metadata = new Metadata();
            ContentHandler handler = new BodyContentHandler();
            new AutoDetectParser(tika).parse(input, handler, metadata);
            assertEquals("Incorrect content type for " + file, mediaTypes[i].toString(), metadata.get(Metadata.CONTENT_TYPE));
            // Check some of the common metadata
            // Old style metadata
            assertEquals("Test Artist", metadata.get(Metadata.AUTHOR));
            assertEquals("Test Title", metadata.get(Metadata.TITLE));
            // New style metadata
            assertEquals("Test Artist", metadata.get(TikaCoreProperties.CREATOR));
            assertEquals("Test Title", metadata.get(TikaCoreProperties.TITLE));
            // Check some of the XMPDM metadata
            if (!file.endsWith(".opus")) {
                assertEquals("Test Album", metadata.get(XMPDM.ALBUM));
            }
            assertEquals("Test Artist", metadata.get(XMPDM.ARTIST));
            assertEquals("Stereo", metadata.get(XMPDM.AUDIO_CHANNEL_TYPE));
            assertEquals("44100", metadata.get(XMPDM.AUDIO_SAMPLE_RATE));
            // Check some of the text
            String content = handler.toString();
            assertTrue(content.contains("Test Title"));
            assertTrue(content.contains("Test Artist"));
        }
    }
}
Also used : BodyContentHandler(org.apache.tika.sax.BodyContentHandler) VorbisParser(org.gagravarr.tika.VorbisParser) ByteArrayInputStream(java.io.ByteArrayInputStream) InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ContentHandler(org.xml.sax.ContentHandler) MediaType(org.apache.tika.mime.MediaType) FlacParser(org.gagravarr.tika.FlacParser) OpusParser(org.gagravarr.tika.OpusParser) Test(org.junit.Test)

Example 2 with OpusParser

use of org.gagravarr.tika.OpusParser in project tika by apache.

the class TikaParsersTest method testGetHTML.

@Test
public void testGetHTML() throws Exception {
    for (boolean details : new boolean[] { false, true }) {
        Response response = WebClient.create(endPoint + getPath(details)).type("text/html").accept("text/html").get();
        String text = getStringFromInputStream((InputStream) response.getEntity());
        assertContains("<h2>DefaultParser</h2>", text);
        assertContains("Composite", text);
        assertContains("<h3>OpusParser", text);
        assertContains("<h3>PackageParser", text);
        assertContains("<h3>OOXMLParser", text);
        assertContains(OpusParser.class.getName(), text);
        assertContains(PackageParser.class.getName(), text);
        assertContains(OOXMLParser.class.getName(), text);
        if (details) {
            // Should have the mimetypes they handle
            assertContains("<li>text/plain", text);
            assertContains("<li>application/pdf", text);
            assertContains("<li>audio/ogg", text);
        } else {
            // Shouldn't do
            assertNotFound("text/plain", text);
            assertNotFound("application/pdf", text);
            assertNotFound("audio/ogg", text);
        }
    }
}
Also used : Response(javax.ws.rs.core.Response) OOXMLParser(org.apache.tika.parser.microsoft.ooxml.OOXMLParser) PackageParser(org.apache.tika.parser.pkg.PackageParser) OpusParser(org.gagravarr.tika.OpusParser) Test(org.junit.Test)

Aggregations

OpusParser (org.gagravarr.tika.OpusParser)2 Test (org.junit.Test)2 ByteArrayInputStream (java.io.ByteArrayInputStream)1 InputStream (java.io.InputStream)1 Response (javax.ws.rs.core.Response)1 Metadata (org.apache.tika.metadata.Metadata)1 MediaType (org.apache.tika.mime.MediaType)1 OOXMLParser (org.apache.tika.parser.microsoft.ooxml.OOXMLParser)1 PackageParser (org.apache.tika.parser.pkg.PackageParser)1 BodyContentHandler (org.apache.tika.sax.BodyContentHandler)1 FlacParser (org.gagravarr.tika.FlacParser)1 VorbisParser (org.gagravarr.tika.VorbisParser)1 ContentHandler (org.xml.sax.ContentHandler)1