Search in sources :

Example 6 with ForkParser

use of org.apache.tika.fork.ForkParser in project tika by apache.

the class ForkParserIntegrationTest method testForkedPDFParsing.

/**
     * TIKA-808 - Ensure that parsing of our test PDFs work under
     * the Fork Parser, to ensure that complex parsing behaves
     */
@Test
public void testForkedPDFParsing() throws Exception {
    ForkParser parser = new ForkParser(ForkParserIntegrationTest.class.getClassLoader(), tika.getParser());
    try {
        ContentHandler output = new BodyContentHandler();
        InputStream stream = ForkParserIntegrationTest.class.getResourceAsStream("/test-documents/testPDF.pdf");
        ParseContext context = new ParseContext();
        context.set(Parser.class, new EmptyParser());
        parser.parse(stream, output, new Metadata(), context);
        String content = output.toString();
        assertContains("Apache Tika", content);
        assertContains("Tika - Content Analysis Toolkit", content);
        assertContains("incubator", content);
        assertContains("Apache Software Foundation", content);
    } finally {
        parser.close();
    }
}
Also used : ForkParser(org.apache.tika.fork.ForkParser) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) InputStream(java.io.InputStream) ParseContext(org.apache.tika.parser.ParseContext) EmptyParser(org.apache.tika.parser.EmptyParser) Metadata(org.apache.tika.metadata.Metadata) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ContentHandler(org.xml.sax.ContentHandler) Test(org.junit.Test)

Example 7 with ForkParser

use of org.apache.tika.fork.ForkParser in project tika by apache.

the class ForkParserIntegrationTest method testForkedTextParsing.

/**
     * Simple text parsing
     */
@Test
public void testForkedTextParsing() throws Exception {
    ForkParser parser = new ForkParser(ForkParserIntegrationTest.class.getClassLoader(), tika.getParser());
    try {
        ContentHandler output = new BodyContentHandler();
        InputStream stream = ForkParserIntegrationTest.class.getResourceAsStream("/test-documents/testTXT.txt");
        ParseContext context = new ParseContext();
        parser.parse(stream, output, new Metadata(), context);
        String content = output.toString();
        assertContains("Test d'indexation", content);
        assertContains("http://www.apache.org", content);
    } finally {
        parser.close();
    }
}
Also used : ForkParser(org.apache.tika.fork.ForkParser) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) InputStream(java.io.InputStream) ParseContext(org.apache.tika.parser.ParseContext) Metadata(org.apache.tika.metadata.Metadata) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ContentHandler(org.xml.sax.ContentHandler) Test(org.junit.Test)

Aggregations

ForkParser (org.apache.tika.fork.ForkParser)7 InputStream (java.io.InputStream)6 Metadata (org.apache.tika.metadata.Metadata)6 ParseContext (org.apache.tika.parser.ParseContext)6 BodyContentHandler (org.apache.tika.sax.BodyContentHandler)6 Test (org.junit.Test)6 ContentHandler (org.xml.sax.ContentHandler)6 TikaException (org.apache.tika.exception.TikaException)2 MediaType (org.apache.tika.mime.MediaType)2 ByteArrayInputStream (java.io.ByteArrayInputStream)1 File (java.io.File)1 FileInputStream (java.io.FileInputStream)1 IOException (java.io.IOException)1 NotSerializableException (java.io.NotSerializableException)1 StringWriter (java.io.StringWriter)1 Writer (java.io.Writer)1 MalformedURLException (java.net.MalformedURLException)1 URL (java.net.URL)1 JarInputStream (java.util.jar.JarInputStream)1 RepositoryException (javax.jcr.RepositoryException)1