Search in sources :

Example 11 with ContainerExtractor

use of org.apache.tika.extractor.ContainerExtractor in project tika by apache.

the class RTFParserTest method testBinControlWord.

// TIKA-782
@Test
public void testBinControlWord() throws Exception {
    ByteCopyingHandler embHandler = new ByteCopyingHandler();
    try (TikaInputStream tis = TikaInputStream.get(getResourceAsStream("/test-documents/testBinControlWord.rtf"))) {
        ContainerExtractor ex = new ParserContainerExtractor();
        assertEquals(true, ex.isSupported(tis));
        ex.extract(tis, ex, embHandler);
    }
    assertEquals(1, embHandler.bytes.size());
    byte[] bytes = embHandler.bytes.get(0);
    assertEquals(10, bytes.length);
    //}
    assertEquals(125, (int) bytes[4]);
    //make sure that at least the last value is correct
    assertEquals(-1, (int) bytes[9]);
}
Also used : TikaInputStream(org.apache.tika.io.TikaInputStream) ContainerExtractor(org.apache.tika.extractor.ContainerExtractor) ParserContainerExtractor(org.apache.tika.extractor.ParserContainerExtractor) ParserContainerExtractor(org.apache.tika.extractor.ParserContainerExtractor) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest)

Aggregations

ContainerExtractor (org.apache.tika.extractor.ContainerExtractor)11 ParserContainerExtractor (org.apache.tika.extractor.ParserContainerExtractor)11 Test (org.junit.Test)11 TikaInputStream (org.apache.tika.io.TikaInputStream)5 TikaTest (org.apache.tika.TikaTest)4 TrackingHandler (org.apache.tika.TikaTest.TrackingHandler)2 MediaType (org.apache.tika.mime.MediaType)2 InputStream (java.io.InputStream)1 HashSet (java.util.HashSet)1 TesseractOCRParserTest (org.apache.tika.parser.ocr.TesseractOCRParserTest)1