Search in sources :

Example 1 with PSTFile

use of com.pff.PSTFile in project Xponents by OpenSextant.

the class OutlookPSTCrawler method collect.

@Override
public void collect() throws IOException, ConfigException {
    //
    // Logic:  Traverse PST file.
    //     it contains mail, contacts, tasks, notes, other stuff?
    //
    // Replicate folder structure discovered.
    // Mail and date-oriented items should be filed by date. For now, YYYY-MM-DD is fine.
    //
    // For mail messages, review DefaultMailCralwer:
    //  - for each message
    //    save message to disk;  create parent folder to contain message contents
    //    run text conversion individually on attachments.
    //
    //  - structure:
    //    ./Mail/
    //         2014-04-09/messageABC.eml
    //         2014-04-09/messageABC/attachment1.doc
    log.info("Traversing PST Folders for FILE={}", pst);
    try {
        PSTFile pstStore = new PSTFile(pst);
        processFolder(pstStore.getRootFolder());
    } catch (PSTException err) {
        throw new ConfigException("Failure with PST traversal", err);
    }
}
Also used : PSTFile(com.pff.PSTFile) PSTException(com.pff.PSTException) ConfigException(org.opensextant.ConfigException)

Example 2 with PSTFile

use of com.pff.PSTFile in project tika by apache.

the class OutlookPSTParser method parse.

public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    // Use the delegate parser to parse the contained document
    EmbeddedDocumentExtractor embeddedExtractor = EmbeddedDocumentUtil.getEmbeddedDocumentExtractor(context);
    metadata.set(Metadata.CONTENT_TYPE, MS_OUTLOOK_PST_MIMETYPE.toString());
    XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
    xhtml.startDocument();
    TikaInputStream in = TikaInputStream.get(stream);
    PSTFile pstFile = null;
    try {
        pstFile = new PSTFile(in.getFile().getPath());
        metadata.set(Metadata.CONTENT_LENGTH, valueOf(pstFile.getFileHandle().length()));
        boolean isValid = pstFile.getFileHandle().getFD().valid();
        metadata.set("isValid", valueOf(isValid));
        if (isValid) {
            parseFolder(xhtml, pstFile.getRootFolder(), embeddedExtractor);
        }
    } catch (Exception e) {
        throw new TikaException(e.getMessage(), e);
    } finally {
        if (pstFile != null && pstFile.getFileHandle() != null) {
            try {
                pstFile.getFileHandle().close();
            } catch (IOException e) {
            //swallow closing exception
            }
        }
    }
    xhtml.endDocument();
}
Also used : TikaException(org.apache.tika.exception.TikaException) EmbeddedDocumentExtractor(org.apache.tika.extractor.EmbeddedDocumentExtractor) PSTFile(com.pff.PSTFile) TikaInputStream(org.apache.tika.io.TikaInputStream) IOException(java.io.IOException) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) TikaException(org.apache.tika.exception.TikaException) IOException(java.io.IOException) PSTException(com.pff.PSTException) SAXException(org.xml.sax.SAXException)

Aggregations

PSTException (com.pff.PSTException)2 PSTFile (com.pff.PSTFile)2 IOException (java.io.IOException)1 TikaException (org.apache.tika.exception.TikaException)1 EmbeddedDocumentExtractor (org.apache.tika.extractor.EmbeddedDocumentExtractor)1 TikaInputStream (org.apache.tika.io.TikaInputStream)1 XHTMLContentHandler (org.apache.tika.sax.XHTMLContentHandler)1 ConfigException (org.opensextant.ConfigException)1 SAXException (org.xml.sax.SAXException)1