Search in sources :

Example 6 with CompositeEncodingDetector

use of org.apache.tika.detect.CompositeEncodingDetector in project tika by apache.

the class TikaConfigSerializer method addEncodingDetectors.

private static void addEncodingDetectors(Mode mode, Element rootElement, Document doc, TikaConfig config) throws Exception {
    EncodingDetector encDetector = config.getEncodingDetector();
    if (mode == Mode.MINIMAL && encDetector instanceof DefaultEncodingDetector) {
        // Don't output anything, all using defaults
        Node detComment = doc.createComment("for example: <encodingDetectors><encodingDetector class=\"" + "org.apache.tika.detect.DefaultEncodingDetector\"></encodingDetectors>");
        rootElement.appendChild(detComment);
        return;
    }
    Element encDetectorsElement = doc.createElement("encodingDetectors");
    if (mode == Mode.CURRENT && encDetector instanceof DefaultEncodingDetector || !(encDetector instanceof CompositeEncodingDetector)) {
        Element encDetectorElement = doc.createElement("encodingDetector");
        encDetectorElement.setAttribute("class", encDetector.getClass().getCanonicalName());
        encDetectorsElement.appendChild(encDetectorElement);
    } else {
        List<EncodingDetector> children = ((CompositeEncodingDetector) encDetector).getDetectors();
        for (EncodingDetector d : children) {
            Element encDetectorElement = doc.createElement("encodingDetector");
            encDetectorElement.setAttribute("class", d.getClass().getCanonicalName());
            encDetectorsElement.appendChild(encDetectorElement);
        }
    }
    rootElement.appendChild(encDetectorsElement);
}
Also used : DefaultEncodingDetector(org.apache.tika.detect.DefaultEncodingDetector) CompositeEncodingDetector(org.apache.tika.detect.CompositeEncodingDetector) EncodingDetector(org.apache.tika.detect.EncodingDetector) CompositeEncodingDetector(org.apache.tika.detect.CompositeEncodingDetector) Node(org.w3c.dom.Node) Element(org.w3c.dom.Element) DefaultEncodingDetector(org.apache.tika.detect.DefaultEncodingDetector)

Example 7 with CompositeEncodingDetector

use of org.apache.tika.detect.CompositeEncodingDetector in project tika by apache.

the class TikaEncodingDetectorTest method testEncodingDetectorsAreLoaded.

@Test
public void testEncodingDetectorsAreLoaded() {
    EncodingDetector encodingDetector = ((AbstractEncodingDetectorParser) new TXTParser()).getEncodingDetector();
    assertTrue(encodingDetector instanceof CompositeEncodingDetector);
}
Also used : Icu4jEncodingDetector(org.apache.tika.parser.txt.Icu4jEncodingDetector) NonDetectingEncodingDetector(org.apache.tika.detect.NonDetectingEncodingDetector) UniversalEncodingDetector(org.apache.tika.parser.txt.UniversalEncodingDetector) CompositeEncodingDetector(org.apache.tika.detect.CompositeEncodingDetector) EncodingDetector(org.apache.tika.detect.EncodingDetector) HtmlEncodingDetector(org.apache.tika.parser.html.HtmlEncodingDetector) CompositeEncodingDetector(org.apache.tika.detect.CompositeEncodingDetector) TXTParser(org.apache.tika.parser.txt.TXTParser) AbstractEncodingDetectorParser(org.apache.tika.parser.AbstractEncodingDetectorParser) Test(org.junit.Test)

Aggregations

CompositeEncodingDetector (org.apache.tika.detect.CompositeEncodingDetector)7 EncodingDetector (org.apache.tika.detect.EncodingDetector)7 NonDetectingEncodingDetector (org.apache.tika.detect.NonDetectingEncodingDetector)6 HtmlEncodingDetector (org.apache.tika.parser.html.HtmlEncodingDetector)6 Icu4jEncodingDetector (org.apache.tika.parser.txt.Icu4jEncodingDetector)6 UniversalEncodingDetector (org.apache.tika.parser.txt.UniversalEncodingDetector)6 Test (org.junit.Test)6 AbstractEncodingDetectorParser (org.apache.tika.parser.AbstractEncodingDetectorParser)3 TXTParser (org.apache.tika.parser.txt.TXTParser)3 ArrayList (java.util.ArrayList)2 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)2 CompositeParser (org.apache.tika.parser.CompositeParser)2 Parser (org.apache.tika.parser.Parser)2 DefaultEncodingDetector (org.apache.tika.detect.DefaultEncodingDetector)1 TikaException (org.apache.tika.exception.TikaException)1 Metadata (org.apache.tika.metadata.Metadata)1 Element (org.w3c.dom.Element)1 Node (org.w3c.dom.Node)1