Search in sources :

Example 1 with IdentityHtmlMapper

use of org.apache.tika.parser.html.IdentityHtmlMapper in project tika by apache.

the class TIAParsingExample method testHtmlMapper.

public static void testHtmlMapper() throws Exception {
    InputStream stream = new ByteArrayInputStream(new byte[0]);
    ContentHandler handler = new DefaultHandler();
    Metadata metadata = new Metadata();
    Parser parser = new AutoDetectParser();
    ParseContext context = new ParseContext();
    context.set(HtmlMapper.class, new IdentityHtmlMapper());
    parser.parse(stream, handler, metadata, context);
}
Also used : IdentityHtmlMapper(org.apache.tika.parser.html.IdentityHtmlMapper) ByteArrayInputStream(java.io.ByteArrayInputStream) GZIPInputStream(java.util.zip.GZIPInputStream) ByteArrayInputStream(java.io.ByteArrayInputStream) TikaInputStream(org.apache.tika.io.TikaInputStream) FileInputStream(java.io.FileInputStream) InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata) ParseContext(org.apache.tika.parser.ParseContext) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ContentHandler(org.xml.sax.ContentHandler) LinkContentHandler(org.apache.tika.sax.LinkContentHandler) TeeContentHandler(org.apache.tika.sax.TeeContentHandler) DefaultHandler(org.xml.sax.helpers.DefaultHandler) Parser(org.apache.tika.parser.Parser) XMLParser(org.apache.tika.parser.xml.XMLParser) HtmlParser(org.apache.tika.parser.html.HtmlParser) TXTParser(org.apache.tika.parser.txt.TXTParser) CompositeParser(org.apache.tika.parser.CompositeParser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser)

Aggregations

ByteArrayInputStream (java.io.ByteArrayInputStream)1 FileInputStream (java.io.FileInputStream)1 InputStream (java.io.InputStream)1 GZIPInputStream (java.util.zip.GZIPInputStream)1 TikaInputStream (org.apache.tika.io.TikaInputStream)1 Metadata (org.apache.tika.metadata.Metadata)1 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)1 CompositeParser (org.apache.tika.parser.CompositeParser)1 ParseContext (org.apache.tika.parser.ParseContext)1 Parser (org.apache.tika.parser.Parser)1 HtmlParser (org.apache.tika.parser.html.HtmlParser)1 IdentityHtmlMapper (org.apache.tika.parser.html.IdentityHtmlMapper)1 TXTParser (org.apache.tika.parser.txt.TXTParser)1 XMLParser (org.apache.tika.parser.xml.XMLParser)1 BodyContentHandler (org.apache.tika.sax.BodyContentHandler)1 LinkContentHandler (org.apache.tika.sax.LinkContentHandler)1 TeeContentHandler (org.apache.tika.sax.TeeContentHandler)1 ContentHandler (org.xml.sax.ContentHandler)1 DefaultHandler (org.xml.sax.helpers.DefaultHandler)1