Search in sources :

Example 1 with Tika

use of org.apache.tika.Tika in project tika by apache.

the class TikaEncodingDetectorTest method testEncodingDetectorConfigurability.

@Test
public void testEncodingDetectorConfigurability() throws Exception {
    TikaConfig tikaConfig = new TikaConfig(getResourceAsStream("/org/apache/tika/config/TIKA-2273-no-icu4j-encoding-detector.xml"));
    AutoDetectParser p = new AutoDetectParser(tikaConfig);
    try {
        Metadata metadata = getXML("english.cp500.txt", p).metadata;
        fail("can't detect w/out ICU");
    } catch (TikaException e) {
        assertContains("Failed to detect", e.getMessage());
    }
    Tika tika = new Tika(tikaConfig);
    try {
        String txt = tika.parseToString(getResourceAsFile("/test-documents/english.cp500.txt"));
        fail("can't detect w/out ICU");
    } catch (TikaException e) {
        assertContains("Failed to detect", e.getMessage());
    }
}
Also used : TikaException(org.apache.tika.exception.TikaException) Metadata(org.apache.tika.metadata.Metadata) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) Tika(org.apache.tika.Tika) Test(org.junit.Test)

Example 2 with Tika

use of org.apache.tika.Tika in project tika by apache.

the class SimpleTextExtractor method main.

public static void main(String[] args) throws Exception {
    // Create a Tika instance with the default configuration
    Tika tika = new Tika();
    // text content
    for (String file : args) {
        String text = tika.parseToString(new File(file));
        System.out.print(text);
    }
}
Also used : Tika(org.apache.tika.Tika) File(java.io.File)

Example 3 with Tika

use of org.apache.tika.Tika in project tika by apache.

the class AudioParserTest method testAU.

@Test
public void testAU() throws Exception {
    String path = "/test-documents/testAU.au";
    Metadata metadata = new Metadata();
    String content = new Tika().parseToString(AudioParserTest.class.getResourceAsStream(path), metadata);
    assertEquals("audio/basic", metadata.get(Metadata.CONTENT_TYPE));
    assertEquals("44100.0", metadata.get("samplerate"));
    assertEquals("2", metadata.get("channels"));
    assertEquals("16", metadata.get("bits"));
    assertEquals("PCM_SIGNED", metadata.get("encoding"));
    assertEquals("", content);
}
Also used : Metadata(org.apache.tika.metadata.Metadata) Tika(org.apache.tika.Tika) Test(org.junit.Test)

Example 4 with Tika

use of org.apache.tika.Tika in project tika by apache.

the class MidiParserTest method testMID.

@Test
public void testMID() throws Exception {
    String path = "/test-documents/testMID.mid";
    Metadata metadata = new Metadata();
    String content = new Tika().parseToString(MidiParserTest.class.getResourceAsStream(path), metadata);
    assertEquals("audio/midi", metadata.get(Metadata.CONTENT_TYPE));
    assertEquals("2", metadata.get("tracks"));
    assertEquals("0", metadata.get("patches"));
    assertEquals("PPQ", metadata.get("divisionType"));
    assertContains("Untitled", content);
}
Also used : Metadata(org.apache.tika.metadata.Metadata) Tika(org.apache.tika.Tika) Test(org.junit.Test)

Example 5 with Tika

use of org.apache.tika.Tika in project tika by apache.

the class HtmlParserTest method XtestParseUTF8.

@Test
@Ignore("The file 'testXHTML_utf8.html' is not available for testing")
public void XtestParseUTF8() throws IOException, SAXException, TikaException {
    String path = "/test-documents/testXHTML_utf8.html";
    Metadata metadata = new Metadata();
    String content = new Tika().parseToString(HtmlParserTest.class.getResourceAsStream(path), metadata);
    assertTrue("Did not contain expected text:" + "Title : Tilte with UTF-8 chars öäå", content.contains("Title : Tilte with UTF-8 chars öäå"));
    assertTrue("Did not contain expected text:" + "Content with UTF-8 chars", content.contains("Content with UTF-8 chars"));
    assertTrue("Did not contain expected text:" + "åäö", content.contains("åäö"));
}
Also used : Metadata(org.apache.tika.metadata.Metadata) Tika(org.apache.tika.Tika) Ignore(org.junit.Ignore) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest)

Aggregations

Tika (org.apache.tika.Tika)50 Test (org.junit.Test)32 Metadata (org.apache.tika.metadata.Metadata)28 TikaTest (org.apache.tika.TikaTest)12 TikaConfig (org.apache.tika.config.TikaConfig)12 ByteArrayInputStream (java.io.ByteArrayInputStream)11 File (java.io.File)6 InputStream (java.io.InputStream)6 URL (java.net.URL)5 HashSet (java.util.HashSet)4 TikaInputStream (org.apache.tika.io.TikaInputStream)4 Ignore (org.junit.Ignore)4 FileInputStream (java.io.FileInputStream)3 Before (org.junit.Before)3 IOException (java.io.IOException)2 ArrayList (java.util.ArrayList)2 Response (javax.ws.rs.core.Response)2 CompositeDetector (org.apache.tika.detect.CompositeDetector)2 TikaException (org.apache.tika.exception.TikaException)2 MimeTypes (org.apache.tika.mime.MimeTypes)2