Search in sources :

Example 6 with HwmfPicture

use of org.apache.poi.hwmf.usermodel.HwmfPicture in project tika by apache.

the class WMFParser method parse.

@Override
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
    xhtml.startDocument();
    try {
        HwmfPicture picture = new HwmfPicture(stream);
        //to determine when to keep two text parts on the same line
        for (HwmfRecord record : picture.getRecords()) {
            Charset charset = LocaleUtil.CHARSET_1252;
            //This fix should be done within POI
            if (record.getRecordType().equals(HwmfRecordType.createFontIndirect)) {
                HwmfFont font = ((HwmfText.WmfCreateFontIndirect) record).getFont();
                charset = (font.getCharSet() == null || font.getCharSet().getCharset() == null) ? LocaleUtil.CHARSET_1252 : font.getCharSet().getCharset();
            }
            if (record.getRecordType().equals(HwmfRecordType.extTextOut)) {
                HwmfText.WmfExtTextOut textOut = (HwmfText.WmfExtTextOut) record;
                xhtml.startElement("p");
                xhtml.characters(textOut.getText(charset));
                xhtml.endElement("p");
            } else if (record.getRecordType().equals(HwmfRecordType.textOut)) {
                HwmfText.WmfTextOut textOut = (HwmfText.WmfTextOut) record;
                xhtml.startElement("p");
                xhtml.characters(textOut.getText(charset));
                xhtml.endElement("p");
            }
        }
    } catch (RecordFormatException e) {
        //POI's hwmfparser can throw these for "parse exceptions"
        throw new TikaException(e.getMessage(), e);
    } catch (RuntimeException e) {
        //convert Runtime to RecordFormatExceptions
        throw new TikaException(e.getMessage(), e);
    } catch (AssertionError e) {
        //POI's hwmfparser can throw these for parse exceptions
        throw new TikaException(e.getMessage(), e);
    }
    xhtml.endDocument();
}
Also used : TikaException(org.apache.tika.exception.TikaException) HwmfRecord(org.apache.poi.hwmf.record.HwmfRecord) Charset(java.nio.charset.Charset) HwmfText(org.apache.poi.hwmf.record.HwmfText) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) HwmfFont(org.apache.poi.hwmf.record.HwmfFont) HwmfPicture(org.apache.poi.hwmf.usermodel.HwmfPicture) RecordFormatException(org.apache.poi.util.RecordFormatException)

Aggregations

HwmfPicture (org.apache.poi.hwmf.usermodel.HwmfPicture)6 File (java.io.File)5 FileInputStream (java.io.FileInputStream)5 HwmfRecord (org.apache.poi.hwmf.record.HwmfRecord)5 Test (org.junit.Test)5 Ignore (org.junit.Ignore)4 Charset (java.nio.charset.Charset)3 HwmfFont (org.apache.poi.hwmf.record.HwmfFont)3 HwmfText (org.apache.poi.hwmf.record.HwmfText)3 Dimension (java.awt.Dimension)2 Graphics2D (java.awt.Graphics2D)2 BufferedImage (java.awt.image.BufferedImage)2 Rectangle2D (java.awt.geom.Rectangle2D)1 FileFilter (java.io.FileFilter)1 IOException (java.io.IOException)1 HwmfImageRecord (org.apache.poi.hwmf.record.HwmfFill.HwmfImageRecord)1 RecordFormatException (org.apache.poi.util.RecordFormatException)1 TikaException (org.apache.tika.exception.TikaException)1 XHTMLContentHandler (org.apache.tika.sax.XHTMLContentHandler)1