Search in sources :

Example 1 with Renderer

use of com.uwyn.jhighlight.renderer.Renderer in project tika by apache.

the class SourceCodeParser method parse.

@Override
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    try (AutoDetectReader reader = new AutoDetectReader(new CloseShieldInputStream(stream), metadata, getEncodingDetector(context))) {
        Charset charset = reader.getCharset();
        String mediaType = metadata.get(Metadata.CONTENT_TYPE);
        String name = metadata.get(Metadata.RESOURCE_NAME_KEY);
        if (mediaType != null && name != null) {
            MediaType type = MediaType.parse(mediaType);
            metadata.set(Metadata.CONTENT_TYPE, type.toString());
            metadata.set(Metadata.CONTENT_ENCODING, charset.name());
            StringBuilder out = new StringBuilder();
            String line;
            int nbLines = 0;
            while ((line = reader.readLine()) != null) {
                out.append(line + System.getProperty("line.separator"));
                String author = parserAuthor(line);
                if (author != null) {
                    metadata.add(TikaCoreProperties.CREATOR, author);
                }
                nbLines++;
            }
            metadata.set("LoC", String.valueOf(nbLines));
            Renderer renderer = getRenderer(type.toString());
            String codeAsHtml = renderer.highlight(name, out.toString(), charset.name(), false);
            Schema schema = context.get(Schema.class, HTML_SCHEMA);
            org.ccil.cowan.tagsoup.Parser parser = new org.ccil.cowan.tagsoup.Parser();
            parser.setProperty(org.ccil.cowan.tagsoup.Parser.schemaProperty, schema);
            parser.setContentHandler(handler);
            parser.parse(new InputSource(new StringReader(codeAsHtml)));
        }
    }
}
Also used : InputSource(org.xml.sax.InputSource) HTMLSchema(org.ccil.cowan.tagsoup.HTMLSchema) Schema(org.ccil.cowan.tagsoup.Schema) Charset(java.nio.charset.Charset) AbstractEncodingDetectorParser(org.apache.tika.parser.AbstractEncodingDetectorParser) AutoDetectReader(org.apache.tika.detect.AutoDetectReader) Renderer(com.uwyn.jhighlight.renderer.Renderer) StringReader(java.io.StringReader) MediaType(org.apache.tika.mime.MediaType) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream)

Aggregations

Renderer (com.uwyn.jhighlight.renderer.Renderer)1 StringReader (java.io.StringReader)1 Charset (java.nio.charset.Charset)1 CloseShieldInputStream (org.apache.commons.io.input.CloseShieldInputStream)1 AutoDetectReader (org.apache.tika.detect.AutoDetectReader)1 MediaType (org.apache.tika.mime.MediaType)1 AbstractEncodingDetectorParser (org.apache.tika.parser.AbstractEncodingDetectorParser)1 HTMLSchema (org.ccil.cowan.tagsoup.HTMLSchema)1 Schema (org.ccil.cowan.tagsoup.Schema)1 InputSource (org.xml.sax.InputSource)1