Search in sources :

Example 11 with DefaultParser

use of org.apache.tika.parser.DefaultParser in project tika by apache.

the class TikaParserConfigTest method testParserExcludeFromDefault.

@Test
public void testParserExcludeFromDefault() throws Exception {
    TikaConfig config = getConfig("TIKA-1558-blacklist.xml");
    assertNotNull(config.getParser());
    assertNotNull(config.getDetector());
    CompositeParser parser = (CompositeParser) config.getParser();
    MediaType PE_EXE = MediaType.application("x-msdownload");
    MediaType ELF = MediaType.application("x-elf");
    // Get the DefaultParser from the config
    ParserDecorator confWrappedParser = (ParserDecorator) parser.getParsers().get(MediaType.APPLICATION_XML);
    assertNotNull(confWrappedParser);
    DefaultParser confParser = (DefaultParser) confWrappedParser.getWrappedParser();
    // Get a fresh "default" DefaultParser
    DefaultParser normParser = new DefaultParser(config.getMediaTypeRegistry());
    // The default one will offer the Executable Parser
    assertContains(PE_EXE, normParser.getSupportedTypes(context));
    assertContains(ELF, normParser.getSupportedTypes(context));
    boolean hasExec = false;
    for (Parser p : normParser.getParsers().values()) {
        if (p instanceof ExecutableParser) {
            hasExec = true;
            break;
        }
    }
    assertTrue(hasExec);
    // The one from the config won't
    assertNotContained(PE_EXE, confParser.getSupportedTypes(context));
    assertNotContained(ELF, confParser.getSupportedTypes(context));
    for (Parser p : confParser.getParsers().values()) {
        if (p instanceof ExecutableParser)
            fail("Shouldn't have the Executable Parser from config");
    }
}
Also used : CompositeParser(org.apache.tika.parser.CompositeParser) ParserDecorator(org.apache.tika.parser.ParserDecorator) MediaType(org.apache.tika.mime.MediaType) ExecutableParser(org.apache.tika.parser.executable.ExecutableParser) DefaultParser(org.apache.tika.parser.DefaultParser) Parser(org.apache.tika.parser.Parser) ExecutableParser(org.apache.tika.parser.executable.ExecutableParser) CompositeParser(org.apache.tika.parser.CompositeParser) XMLParser(org.apache.tika.parser.xml.XMLParser) DefaultParser(org.apache.tika.parser.DefaultParser) EmptyParser(org.apache.tika.parser.EmptyParser) Test(org.junit.Test)

Example 12 with DefaultParser

use of org.apache.tika.parser.DefaultParser in project tika by apache.

the class TesseractOCRParserTest method offersNoTypesIfNotFound.

/*
    Check that if Tesseract is not found, the TesseractOCRParser claims to not support
    any file types. So, the standard image parser is called instead.
     */
@Test
public void offersNoTypesIfNotFound() throws Exception {
    TesseractOCRParser parser = new TesseractOCRParser();
    DefaultParser defaultParser = new DefaultParser();
    MediaType png = MediaType.image("png");
    // With an invalid path, will offer no types
    TesseractOCRConfig invalidConfig = new TesseractOCRConfig();
    invalidConfig.setTesseractPath("/made/up/path");
    ParseContext parseContext = new ParseContext();
    parseContext.set(TesseractOCRConfig.class, invalidConfig);
    // No types offered
    assertEquals(0, parser.getSupportedTypes(parseContext).size());
    // And DefaultParser won't use us
    assertEquals(ImageParser.class, defaultParser.getParsers(parseContext).get(png).getClass());
}
Also used : ParseContext(org.apache.tika.parser.ParseContext) MediaType(org.apache.tika.mime.MediaType) DefaultParser(org.apache.tika.parser.DefaultParser) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest)

Aggregations

DefaultParser (org.apache.tika.parser.DefaultParser)12 Parser (org.apache.tika.parser.Parser)10 Test (org.junit.Test)8 CompositeParser (org.apache.tika.parser.CompositeParser)7 MediaType (org.apache.tika.mime.MediaType)5 EmptyParser (org.apache.tika.parser.EmptyParser)4 ParserDecorator (org.apache.tika.parser.ParserDecorator)4 TikaTest (org.apache.tika.TikaTest)3 ParseContext (org.apache.tika.parser.ParseContext)3 ExecutableParser (org.apache.tika.parser.executable.ExecutableParser)3 XMLParser (org.apache.tika.parser.xml.XMLParser)3 TikaException (org.apache.tika.exception.TikaException)2 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)2 InputStream (java.io.InputStream)1 StringWriter (java.io.StringWriter)1 HashSet (java.util.HashSet)1 Properties (java.util.Properties)1 SolrException (org.apache.solr.common.SolrException)1 NamedList (org.apache.solr.common.util.NamedList)1 TikaConfig (org.apache.tika.config.TikaConfig)1