Search in sources :

Example 1 with XSLFRelation

use of org.apache.poi.xslf.usermodel.XSLFRelation in project tika by apache.

the class OOXMLExtractorFactory method trySXSLF.

private static POIXMLTextExtractor trySXSLF(OPCPackage pkg) throws XmlException, OpenXML4JException, IOException {
    PackageRelationshipCollection packageRelationshipCollection = pkg.getRelationshipsByType("http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument");
    if (packageRelationshipCollection.size() == 0) {
        packageRelationshipCollection = pkg.getRelationshipsByType("http://purl.oclc.org/ooxml/officeDocument/relationships/officeDocument");
    }
    if (packageRelationshipCollection.size() == 0) {
        return null;
    }
    PackagePart corePart = pkg.getPart(packageRelationshipCollection.getRelationship(0));
    String targetContentType = corePart.getContentType();
    XSLFRelation[] xslfRelations = org.apache.poi.xslf.extractor.XSLFPowerPointExtractor.SUPPORTED_TYPES;
    for (int i = 0; i < xslfRelations.length; i++) {
        XSLFRelation xslfRelation = xslfRelations[i];
        if (xslfRelation.getContentType().equals(targetContentType)) {
            return new XSLFEventBasedPowerPointExtractor(pkg);
        }
    }
    if (XSLFRelation.THEME_MANAGER.getContentType().equals(targetContentType)) {
        return new XSLFEventBasedPowerPointExtractor(pkg);
    }
    return null;
}
Also used : PackageRelationshipCollection(org.apache.poi.openxml4j.opc.PackageRelationshipCollection) XSLFRelation(org.apache.poi.xslf.usermodel.XSLFRelation) PackagePart(org.apache.poi.openxml4j.opc.PackagePart) XSLFEventBasedPowerPointExtractor(org.apache.tika.parser.microsoft.ooxml.xslf.XSLFEventBasedPowerPointExtractor)

Example 2 with XSLFRelation

use of org.apache.poi.xslf.usermodel.XSLFRelation in project poi by apache.

the class ExtractorFactory method createExtractor.

/**
     * Tries to determine the actual type of file and produces a matching text-extractor for it.
     *
     * @param pkg An {@link OPCPackage}.
     * @return A {@link POIXMLTextExtractor} for the given file.
     * @throws IOException If an error occurs while reading the file 
     * @throws OpenXML4JException If an error parsing the OpenXML file format is found. 
     * @throws XmlException If an XML parsing error occurs.
     * @throws IllegalArgumentException If no matching file type could be found.
     */
public static POIXMLTextExtractor createExtractor(OPCPackage pkg) throws IOException, OpenXML4JException, XmlException {
    try {
        // Check for the normal Office core document
        PackageRelationshipCollection core;
        core = pkg.getRelationshipsByType(CORE_DOCUMENT_REL);
        // If nothing was found, try some of the other OOXML-based core types
        if (core.size() == 0) {
            // Could it be an OOXML-Strict one?
            core = pkg.getRelationshipsByType(STRICT_DOCUMENT_REL);
        }
        if (core.size() == 0) {
            // Could it be a visio one?
            core = pkg.getRelationshipsByType(VISIO_DOCUMENT_REL);
            if (core.size() == 1)
                return new XDGFVisioExtractor(pkg);
        }
        // Should just be a single core document, complain if not
        if (core.size() != 1) {
            throw new IllegalArgumentException("Invalid OOXML Package received - expected 1 core document, found " + core.size());
        }
        // Grab the core document part, and try to identify from that
        final PackagePart corePart = pkg.getPart(core.getRelationship(0));
        final String contentType = corePart.getContentType();
        // Is it XSSF?
        for (XSSFRelation rel : XSSFExcelExtractor.SUPPORTED_TYPES) {
            if (rel.getContentType().equals(contentType)) {
                if (getPreferEventExtractor()) {
                    return new XSSFEventBasedExcelExtractor(pkg);
                }
                return new XSSFExcelExtractor(pkg);
            }
        }
        // Is it XWPF?
        for (XWPFRelation rel : XWPFWordExtractor.SUPPORTED_TYPES) {
            if (rel.getContentType().equals(contentType)) {
                return new XWPFWordExtractor(pkg);
            }
        }
        // Is it XSLF?
        for (XSLFRelation rel : XSLFPowerPointExtractor.SUPPORTED_TYPES) {
            if (rel.getContentType().equals(contentType)) {
                return new XSLFPowerPointExtractor(pkg);
            }
        }
        // special handling for SlideShow-Theme-files, 
        if (XSLFRelation.THEME_MANAGER.getContentType().equals(contentType)) {
            return new XSLFPowerPointExtractor(new XSLFSlideShow(pkg));
        }
        // How about xlsb?
        for (XSSFRelation rel : XSSFBEventBasedExcelExtractor.SUPPORTED_TYPES) {
            if (rel.getContentType().equals(contentType)) {
                return new XSSFBEventBasedExcelExtractor(pkg);
            }
        }
        throw new IllegalArgumentException("No supported documents found in the OOXML package (found " + contentType + ")");
    } catch (IOException e) {
        // ensure that we close the package again if there is an error opening it, however
        // we need to revert the package to not re-write the file via close(), which is very likely not wanted for a TextExtractor!
        pkg.revert();
        throw e;
    } catch (OpenXML4JException e) {
        // ensure that we close the package again if there is an error opening it, however
        // we need to revert the package to not re-write the file via close(), which is very likely not wanted for a TextExtractor!
        pkg.revert();
        throw e;
    } catch (XmlException e) {
        // ensure that we close the package again if there is an error opening it, however
        // we need to revert the package to not re-write the file via close(), which is very likely not wanted for a TextExtractor!
        pkg.revert();
        throw e;
    } catch (RuntimeException e) {
        // ensure that we close the package again if there is an error opening it, however
        // we need to revert the package to not re-write the file via close(), which is very likely not wanted for a TextExtractor!
        pkg.revert();
        throw e;
    }
}
Also used : XSSFRelation(org.apache.poi.xssf.usermodel.XSSFRelation) XDGFVisioExtractor(org.apache.poi.xdgf.extractor.XDGFVisioExtractor) XSSFBEventBasedExcelExtractor(org.apache.poi.xssf.extractor.XSSFBEventBasedExcelExtractor) PackageRelationshipCollection(org.apache.poi.openxml4j.opc.PackageRelationshipCollection) XSSFExcelExtractor(org.apache.poi.xssf.extractor.XSSFExcelExtractor) XWPFWordExtractor(org.apache.poi.xwpf.extractor.XWPFWordExtractor) IOException(java.io.IOException) PackagePart(org.apache.poi.openxml4j.opc.PackagePart) XSLFSlideShow(org.apache.poi.xslf.usermodel.XSLFSlideShow) XWPFRelation(org.apache.poi.xwpf.usermodel.XWPFRelation) OpenXML4JException(org.apache.poi.openxml4j.exceptions.OpenXML4JException) XSSFEventBasedExcelExtractor(org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor) XSLFPowerPointExtractor(org.apache.poi.xslf.extractor.XSLFPowerPointExtractor) XmlException(org.apache.xmlbeans.XmlException) XSLFRelation(org.apache.poi.xslf.usermodel.XSLFRelation)

Aggregations

PackagePart (org.apache.poi.openxml4j.opc.PackagePart)2 PackageRelationshipCollection (org.apache.poi.openxml4j.opc.PackageRelationshipCollection)2 XSLFRelation (org.apache.poi.xslf.usermodel.XSLFRelation)2 IOException (java.io.IOException)1 OpenXML4JException (org.apache.poi.openxml4j.exceptions.OpenXML4JException)1 XDGFVisioExtractor (org.apache.poi.xdgf.extractor.XDGFVisioExtractor)1 XSLFPowerPointExtractor (org.apache.poi.xslf.extractor.XSLFPowerPointExtractor)1 XSLFSlideShow (org.apache.poi.xslf.usermodel.XSLFSlideShow)1 XSSFBEventBasedExcelExtractor (org.apache.poi.xssf.extractor.XSSFBEventBasedExcelExtractor)1 XSSFEventBasedExcelExtractor (org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor)1 XSSFExcelExtractor (org.apache.poi.xssf.extractor.XSSFExcelExtractor)1 XSSFRelation (org.apache.poi.xssf.usermodel.XSSFRelation)1 XWPFWordExtractor (org.apache.poi.xwpf.extractor.XWPFWordExtractor)1 XWPFRelation (org.apache.poi.xwpf.usermodel.XWPFRelation)1 XSLFEventBasedPowerPointExtractor (org.apache.tika.parser.microsoft.ooxml.xslf.XSLFEventBasedPowerPointExtractor)1 XmlException (org.apache.xmlbeans.XmlException)1