Search in sources :

Example 46 with PackageRelationship

use of org.apache.poi.openxml4j.opc.PackageRelationship in project tika by apache.

the class SXWPFWordExtractorDecorator method loadNumbering.

private XWPFNumbering loadNumbering(PackagePart packagePart) {
    try {
        PackageRelationshipCollection numberingParts = packagePart.getRelationshipsByType(XWPFRelation.NUMBERING.getRelation());
        if (numberingParts.size() > 0) {
            PackageRelationship numberingRelationShip = numberingParts.getRelationship(0);
            if (numberingRelationShip == null) {
                return null;
            }
            PackagePart numberingPart = packagePart.getRelatedPart(numberingRelationShip);
            if (numberingPart == null) {
                return null;
            }
            return new XWPFNumberingShim(numberingPart);
        }
    } catch (IOException | OpenXML4JException e) {
    //swallow
    }
    return null;
}
Also used : PackageRelationship(org.apache.poi.openxml4j.opc.PackageRelationship) OpenXML4JException(org.apache.poi.openxml4j.exceptions.OpenXML4JException) PackageRelationshipCollection(org.apache.poi.openxml4j.opc.PackageRelationshipCollection) IOException(java.io.IOException) PackagePart(org.apache.poi.openxml4j.opc.PackagePart) XWPFNumberingShim(org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFNumberingShim)

Example 47 with PackageRelationship

use of org.apache.poi.openxml4j.opc.PackageRelationship in project tika by apache.

the class SXWPFWordExtractorDecorator method loadStyles.

private XWPFStylesShim loadStyles(PackagePart packagePart) throws InvalidFormatException, TikaException, IOException, SAXException {
    PackageRelationshipCollection stylesParts = packagePart.getRelationshipsByType(XWPFRelation.STYLES.getRelation());
    if (stylesParts.size() > 0) {
        PackageRelationship stylesRelationShip = stylesParts.getRelationship(0);
        if (stylesRelationShip == null) {
            return null;
        }
        PackagePart stylesPart = packagePart.getRelatedPart(stylesRelationShip);
        if (stylesPart == null) {
            return null;
        }
        return new XWPFStylesShim(stylesPart, context);
    }
    return null;
}
Also used : PackageRelationship(org.apache.poi.openxml4j.opc.PackageRelationship) PackageRelationshipCollection(org.apache.poi.openxml4j.opc.PackageRelationshipCollection) XWPFStylesShim(org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFStylesShim) PackagePart(org.apache.poi.openxml4j.opc.PackagePart)

Example 48 with PackageRelationship

use of org.apache.poi.openxml4j.opc.PackageRelationship in project tika by apache.

the class AbstractOOXMLExtractor method loadLinkedRelationships.

/**
     * This is used by the SAX docx and pptx decorators to load hyperlinks and
     * other linked objects
     *
     * @param bodyPart
     * @return
     */
protected Map<String, String> loadLinkedRelationships(PackagePart bodyPart, boolean includeInternal, Metadata metadata) {
    Map<String, String> linkedRelationships = new HashMap<>();
    try {
        PackageRelationshipCollection prc = bodyPart.getRelationshipsByType(XWPFRelation.HYPERLINK.getRelation());
        for (int i = 0; i < prc.size(); i++) {
            PackageRelationship pr = prc.getRelationship(i);
            if (pr == null) {
                continue;
            }
            if (!includeInternal && TargetMode.INTERNAL.equals(pr.getTargetMode())) {
                continue;
            }
            String id = pr.getId();
            String url = (pr.getTargetURI() == null) ? null : pr.getTargetURI().toString();
            if (id != null && url != null) {
                linkedRelationships.put(id, url);
            }
        }
        for (String rel : EMBEDDED_RELATIONSHIPS) {
            prc = bodyPart.getRelationshipsByType(rel);
            for (int i = 0; i < prc.size(); i++) {
                PackageRelationship pr = prc.getRelationship(i);
                if (pr == null) {
                    continue;
                }
                String id = pr.getId();
                String uriString = (pr.getTargetURI() == null) ? null : pr.getTargetURI().toString();
                String fileName = uriString;
                if (pr.getTargetURI() != null) {
                    try {
                        fileName = FileHelper.getFilename(new File(fileName));
                    } catch (Exception e) {
                        fileName = uriString;
                    }
                }
                if (id != null) {
                    fileName = (fileName == null) ? "" : fileName;
                    linkedRelationships.put(id, fileName);
                }
            }
        }
    } catch (InvalidFormatException e) {
        EmbeddedDocumentUtil.recordEmbeddedStreamException(e, metadata);
    }
    return linkedRelationships;
}
Also used : PackageRelationship(org.apache.poi.openxml4j.opc.PackageRelationship) HashMap(java.util.HashMap) PackageRelationshipCollection(org.apache.poi.openxml4j.opc.PackageRelationshipCollection) File(java.io.File) InvalidFormatException(org.apache.poi.openxml4j.exceptions.InvalidFormatException) Ole10NativeException(org.apache.poi.poifs.filesystem.Ole10NativeException) TikaException(org.apache.tika.exception.TikaException) InvalidFormatException(org.apache.poi.openxml4j.exceptions.InvalidFormatException) IOException(java.io.IOException) FileNotFoundException(java.io.FileNotFoundException) XmlException(org.apache.xmlbeans.XmlException) SAXException(org.xml.sax.SAXException)

Example 49 with PackageRelationship

use of org.apache.poi.openxml4j.opc.PackageRelationship in project tika by apache.

the class AbstractOOXMLExtractor method handleThumbnail.

private void handleThumbnail(ContentHandler handler) {
    try {
        OPCPackage opcPackage = extractor.getPackage();
        for (PackageRelationship rel : opcPackage.getRelationshipsByType(PackageRelationshipTypes.THUMBNAIL)) {
            PackagePart tPart = opcPackage.getPart(rel);
            InputStream tStream = tPart.getInputStream();
            Metadata thumbnailMetadata = new Metadata();
            String thumbName = tPart.getPartName().getName();
            thumbnailMetadata.set(Metadata.RESOURCE_NAME_KEY, thumbName);
            AttributesImpl attributes = new AttributesImpl();
            attributes.addAttribute(XHTML, "class", "class", "CDATA", "embedded");
            attributes.addAttribute(XHTML, "id", "id", "CDATA", thumbName);
            handler.startElement(XHTML, "div", "div", attributes);
            handler.endElement(XHTML, "div", "div");
            thumbnailMetadata.set(Metadata.EMBEDDED_RELATIONSHIP_ID, thumbName);
            thumbnailMetadata.set(Metadata.CONTENT_TYPE, tPart.getContentType());
            thumbnailMetadata.set(TikaCoreProperties.TITLE, tPart.getPartName().getName());
            if (embeddedExtractor.shouldParseEmbedded(thumbnailMetadata)) {
                embeddedExtractor.parseEmbedded(TikaInputStream.get(tStream), new EmbeddedContentHandler(handler), thumbnailMetadata, false);
            }
            tStream.close();
        }
    } catch (Exception ex) {
    }
}
Also used : PackageRelationship(org.apache.poi.openxml4j.opc.PackageRelationship) AttributesImpl(org.xml.sax.helpers.AttributesImpl) TikaInputStream(org.apache.tika.io.TikaInputStream) InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata) EmbeddedContentHandler(org.apache.tika.sax.EmbeddedContentHandler) PackagePart(org.apache.poi.openxml4j.opc.PackagePart) OPCPackage(org.apache.poi.openxml4j.opc.OPCPackage) Ole10NativeException(org.apache.poi.poifs.filesystem.Ole10NativeException) TikaException(org.apache.tika.exception.TikaException) InvalidFormatException(org.apache.poi.openxml4j.exceptions.InvalidFormatException) IOException(java.io.IOException) FileNotFoundException(java.io.FileNotFoundException) XmlException(org.apache.xmlbeans.XmlException) SAXException(org.xml.sax.SAXException)

Aggregations

PackageRelationship (org.apache.poi.openxml4j.opc.PackageRelationship)49 PackagePart (org.apache.poi.openxml4j.opc.PackagePart)27 InvalidFormatException (org.apache.poi.openxml4j.exceptions.InvalidFormatException)21 PackageRelationshipCollection (org.apache.poi.openxml4j.opc.PackageRelationshipCollection)14 PackagePartName (org.apache.poi.openxml4j.opc.PackagePartName)13 IOException (java.io.IOException)10 POIXMLException (org.apache.poi.POIXMLException)8 TikaException (org.apache.tika.exception.TikaException)5 ArrayList (java.util.ArrayList)4 OPCPackage (org.apache.poi.openxml4j.opc.OPCPackage)4 XmlException (org.apache.xmlbeans.XmlException)4 Test (org.junit.Test)4 URI (java.net.URI)3 HashMap (java.util.HashMap)3 OpenXML4JException (org.apache.poi.openxml4j.exceptions.OpenXML4JException)3 SAXException (org.xml.sax.SAXException)3 ByteArrayOutputStream (java.io.ByteArrayOutputStream)2 FileNotFoundException (java.io.FileNotFoundException)2 InputStream (java.io.InputStream)2 XMLSignatureException (javax.xml.crypto.dsig.XMLSignatureException)2