Search in sources :

Example 36 with Tika

use of org.apache.tika.Tika in project tika by apache.

the class AudioParserTest method testAIFF.

@Test
public void testAIFF() throws Exception {
    String path = "/test-documents/testAIFF.aif";
    Metadata metadata = new Metadata();
    String content = new Tika().parseToString(AudioParserTest.class.getResourceAsStream(path), metadata);
    assertEquals("audio/x-aiff", metadata.get(Metadata.CONTENT_TYPE));
    assertEquals("44100.0", metadata.get("samplerate"));
    assertEquals("2", metadata.get("channels"));
    assertEquals("16", metadata.get("bits"));
    assertEquals("PCM_SIGNED", metadata.get("encoding"));
    assertEquals("", content);
}
Also used : Metadata(org.apache.tika.metadata.Metadata) Tika(org.apache.tika.Tika) Test(org.junit.Test)

Example 37 with Tika

use of org.apache.tika.Tika in project tika by apache.

the class ObjectRecognitionParserTest method jpegTesorflowTest.

@Ignore("If tensorflow not available Ignore")
@Test
public void jpegTesorflowTest() throws IOException, TikaException, SAXException {
    try (InputStream stream = loader.getResourceAsStream(CONFIG_FILE)) {
        assert stream != null;
        Tika tika = new Tika(new TikaConfig(stream));
        Metadata metadata = new Metadata();
        try (InputStream imageStream = loader.getResourceAsStream(CAT_IMAGE)) {
            Reader reader = tika.parse(imageStream, metadata);
            List<String> lines = IOUtils.readLines(reader);
            String text = StringUtils.join(lines, " ");
            String[] expectedObjects = { "Egyptian cat", "tabby, tabby cat" };
            String metaValues = StringUtils.join(metadata.getValues(ObjectRecognitionParser.MD_KEY), " ");
            for (String expectedObject : expectedObjects) {
                String message = "'" + expectedObject + "' must have been detected";
                Assert.assertTrue(message, text.contains(expectedObject));
                Assert.assertTrue(message, metaValues.contains(expectedObject));
            }
        }
    }
}
Also used : TikaConfig(org.apache.tika.config.TikaConfig) TikaInputStream(org.apache.tika.io.TikaInputStream) Metadata(org.apache.tika.metadata.Metadata) Tika(org.apache.tika.Tika) Ignore(org.junit.Ignore) Test(org.junit.Test)

Example 38 with Tika

use of org.apache.tika.Tika in project tika by apache.

the class ObjectRecognitionParserTest method testREST.

@Ignore("Configure Rest API service")
@Test
public void testREST() throws Exception {
    try (InputStream stream = loader.getResourceAsStream(CONFIG_REST_FILE)) {
        assert stream != null;
        Tika tika = new Tika(new TikaConfig(stream));
        Metadata metadata = new Metadata();
        try (InputStream imageStream = loader.getResourceAsStream(CAT_IMAGE)) {
            Reader reader = tika.parse(imageStream, metadata);
            String text = IOUtils.toString(reader);
            String[] expectedObjects = { "Egyptian cat", "tabby, tabby cat" };
            String metaValues = StringUtils.join(metadata.getValues(ObjectRecognitionParser.MD_KEY), " ");
            for (String expectedObject : expectedObjects) {
                String message = "'" + expectedObject + "' must have been detected";
                Assert.assertTrue(message, text.contains(expectedObject));
                Assert.assertTrue(message, metaValues.contains(expectedObject));
            }
        }
    }
}
Also used : TikaConfig(org.apache.tika.config.TikaConfig) TikaInputStream(org.apache.tika.io.TikaInputStream) Metadata(org.apache.tika.metadata.Metadata) Tika(org.apache.tika.Tika) Ignore(org.junit.Ignore) Test(org.junit.Test)

Example 39 with Tika

use of org.apache.tika.Tika in project tika by apache.

the class FLVParserTest method testFLV.

@Test
public void testFLV() throws Exception {
    String path = "/test-documents/testFLV.flv";
    Metadata metadata = new Metadata();
    String content = new Tika().parseToString(FLVParserTest.class.getResourceAsStream(path), metadata);
    assertEquals("", content);
    assertEquals("video/x-flv", metadata.get(Metadata.CONTENT_TYPE));
    assertEquals("true", metadata.get("hasVideo"));
    assertEquals("false", metadata.get("stereo"));
    assertEquals("true", metadata.get("hasAudio"));
    assertEquals("120.0", metadata.get("height"));
    assertEquals("16.0", metadata.get("audiosamplesize"));
}
Also used : Metadata(org.apache.tika.metadata.Metadata) Tika(org.apache.tika.Tika) Test(org.junit.Test)

Example 40 with Tika

use of org.apache.tika.Tika in project tika by apache.

the class OOXMLContainerExtractionTest method setUp.

@Before
public void setUp() {
    Tika tika = new Tika();
    extractor = new ParserContainerExtractor(tika.getParser(), tika.getDetector());
}
Also used : Tika(org.apache.tika.Tika) ParserContainerExtractor(org.apache.tika.extractor.ParserContainerExtractor) Before(org.junit.Before)

Aggregations

Tika (org.apache.tika.Tika)54 Test (org.junit.Test)32 Metadata (org.apache.tika.metadata.Metadata)29 ByteArrayInputStream (java.io.ByteArrayInputStream)14 TikaTest (org.apache.tika.TikaTest)12 TikaConfig (org.apache.tika.config.TikaConfig)12 File (java.io.File)8 InputStream (java.io.InputStream)7 URL (java.net.URL)6 TikaInputStream (org.apache.tika.io.TikaInputStream)5 IOException (java.io.IOException)4 HashSet (java.util.HashSet)4 Ignore (org.junit.Ignore)4 FileInputStream (java.io.FileInputStream)3 ArrayList (java.util.ArrayList)3 HashMap (java.util.HashMap)3 Content (org.apache.nutch.protocol.Content)3 Before (org.junit.Before)3 FileOutputStream (java.io.FileOutputStream)2 UnsupportedEncodingException (java.io.UnsupportedEncodingException)2