Search in sources :

Example 1 with CloseShieldInputStream

use of org.apache.commons.io.input.CloseShieldInputStream in project sling by apache.

the class ZipReader method parse.

/**
	 * @see org.apache.sling.jcr.contentloader.ContentReader#parse(java.io.InputStream, org.apache.sling.jcr.contentloader.ContentCreator)
	 */
public void parse(InputStream ins, ContentCreator creator) throws IOException, RepositoryException {
    try {
        creator.createNode(null, NT_FOLDER, null);
        final ZipInputStream zis = new ZipInputStream(ins);
        ZipEntry entry;
        do {
            entry = zis.getNextEntry();
            if (entry != null) {
                if (!entry.isDirectory()) {
                    String name = entry.getName();
                    int pos = name.lastIndexOf('/');
                    if (pos != -1) {
                        creator.switchCurrentNode(name.substring(0, pos), NT_FOLDER);
                    }
                    creator.createFileAndResourceNode(name, new CloseShieldInputStream(zis), null, entry.getTime());
                    creator.finishNode();
                    creator.finishNode();
                    if (pos != -1) {
                        creator.finishNode();
                    }
                }
                zis.closeEntry();
            }
        } while (entry != null);
        creator.finishNode();
    } finally {
        if (ins != null) {
            try {
                ins.close();
            } catch (IOException ignore) {
            }
        }
    }
}
Also used : ZipInputStream(java.util.zip.ZipInputStream) ZipEntry(java.util.zip.ZipEntry) IOException(java.io.IOException) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream)

Example 2 with CloseShieldInputStream

use of org.apache.commons.io.input.CloseShieldInputStream in project gradle by gradle.

the class TarTaskOutputPacker method unpack.

private void unpack(TaskOutputsInternal taskOutputs, TarInputStream tarInput, TaskOutputOriginReader readOriginAction) throws IOException {
    Map<String, TaskOutputFilePropertySpec> propertySpecs = Maps.uniqueIndex(taskOutputs.getFileProperties(), new Function<TaskFilePropertySpec, String>() {

        @Override
        public String apply(TaskFilePropertySpec propertySpec) {
            return propertySpec.getPropertyName();
        }
    });
    boolean originSeen = false;
    TarEntry entry;
    while ((entry = tarInput.getNextEntry()) != null) {
        String name = entry.getName();
        if (name.equals(METADATA_PATH)) {
            // handle origin metadata
            originSeen = true;
            readOriginAction.execute(new CloseShieldInputStream(tarInput));
        } else {
            // handle output property
            Matcher matcher = PROPERTY_PATH.matcher(name);
            if (!matcher.matches()) {
                throw new IllegalStateException("Cached result format error, invalid contents: " + name);
            }
            String propertyName = matcher.group(2);
            CacheableTaskOutputFilePropertySpec propertySpec = (CacheableTaskOutputFilePropertySpec) propertySpecs.get(propertyName);
            if (propertySpec == null) {
                throw new IllegalStateException(String.format("No output property '%s' registered", propertyName));
            }
            boolean outputMissing = matcher.group(1) != null;
            String childPath = matcher.group(3);
            unpackPropertyEntry(propertySpec, tarInput, entry, childPath, outputMissing);
        }
    }
    if (!originSeen) {
        throw new IllegalStateException("Cached result format error, no origin metadata was found.");
    }
}
Also used : CacheableTaskOutputFilePropertySpec(org.gradle.api.internal.tasks.CacheableTaskOutputFilePropertySpec) TaskOutputFilePropertySpec(org.gradle.api.internal.tasks.TaskOutputFilePropertySpec) Matcher(java.util.regex.Matcher) TaskFilePropertySpec(org.gradle.api.internal.tasks.TaskFilePropertySpec) TarEntry(org.apache.tools.tar.TarEntry) CacheableTaskOutputFilePropertySpec(org.gradle.api.internal.tasks.CacheableTaskOutputFilePropertySpec) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream)

Example 3 with CloseShieldInputStream

use of org.apache.commons.io.input.CloseShieldInputStream in project tika by apache.

the class XMLParser method parse.

public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    if (metadata.get(Metadata.CONTENT_TYPE) == null) {
        metadata.set(Metadata.CONTENT_TYPE, "application/xml");
    }
    final XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
    xhtml.startDocument();
    xhtml.startElement("p");
    TaggedContentHandler tagged = new TaggedContentHandler(handler);
    try {
        context.getSAXParser().parse(new CloseShieldInputStream(stream), new OfflineContentHandler(new EmbeddedContentHandler(getContentHandler(tagged, metadata, context))));
    } catch (SAXException e) {
        tagged.throwIfCauseOf(e);
        throw new TikaException("XML parse error", e);
    } finally {
        xhtml.endElement("p");
        xhtml.endDocument();
    }
}
Also used : OfflineContentHandler(org.apache.tika.sax.OfflineContentHandler) TikaException(org.apache.tika.exception.TikaException) TaggedContentHandler(org.apache.tika.sax.TaggedContentHandler) EmbeddedContentHandler(org.apache.tika.sax.EmbeddedContentHandler) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream) SAXException(org.xml.sax.SAXException)

Example 4 with CloseShieldInputStream

use of org.apache.commons.io.input.CloseShieldInputStream in project tika by apache.

the class SXSLFPowerPointExtractorDecorator method handleSlidePart.

private void handleSlidePart(PackagePart slidePart, XHTMLContentHandler xhtml) throws IOException, SAXException {
    Map<String, String> linkedRelationships = loadLinkedRelationships(slidePart, false, metadata);
    //        Map<String, String> hyperlinks = loadHyperlinkRelationships(packagePart);
    xhtml.startElement("div", "class", "slide-content");
    try (InputStream stream = slidePart.getInputStream()) {
        context.getSAXParser().parse(new CloseShieldInputStream(stream), new OfflineContentHandler(new EmbeddedContentHandler(new OOXMLWordAndPowerPointTextHandler(new OOXMLTikaBodyPartHandler(xhtml), linkedRelationships))));
    } catch (TikaException e) {
        metadata.add(TikaCoreProperties.TIKA_META_EXCEPTION_WARNING, ExceptionUtils.getStackTrace(e));
    }
    xhtml.endElement("div");
    handleBasicRelatedParts(XSLFRelation.SLIDE_LAYOUT.getRelation(), "slide-master-content", slidePart, new PlaceHolderSkipper(new OOXMLWordAndPowerPointTextHandler(new OOXMLTikaBodyPartHandler(xhtml), linkedRelationships)));
    handleBasicRelatedParts(XSLFRelation.NOTES.getRelation(), "slide-notes", slidePart, new OOXMLWordAndPowerPointTextHandler(new OOXMLTikaBodyPartHandler(xhtml), linkedRelationships));
    handleBasicRelatedParts(XSLFRelation.NOTES_MASTER.getRelation(), "slide-notes-master", slidePart, new OOXMLWordAndPowerPointTextHandler(new OOXMLTikaBodyPartHandler(xhtml), linkedRelationships));
    handleBasicRelatedParts(XSLFRelation.COMMENTS.getRelation(), null, slidePart, new XSLFCommentsHandler(xhtml));
}
Also used : OfflineContentHandler(org.apache.tika.sax.OfflineContentHandler) TikaException(org.apache.tika.exception.TikaException) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream) InputStream(java.io.InputStream) EmbeddedContentHandler(org.apache.tika.sax.EmbeddedContentHandler) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream)

Example 5 with CloseShieldInputStream

use of org.apache.commons.io.input.CloseShieldInputStream in project tika by apache.

the class XWPFEventBasedWordExtractor method handlePart.

private void handlePart(PackagePart packagePart, XWPFListManager xwpfListManager, StringBuilder buffer) throws IOException, SAXException {
    Map<String, String> hyperlinks = loadHyperlinkRelationships(packagePart);
    try (InputStream stream = packagePart.getInputStream()) {
        XMLReader reader = SAXHelper.newXMLReader();
        reader.setContentHandler(new OOXMLWordAndPowerPointTextHandler(new XWPFToTextContentHandler(buffer), hyperlinks));
        reader.parse(new InputSource(new CloseShieldInputStream(stream)));
    } catch (ParserConfigurationException e) {
        LOG.warn("Can't configure XMLReader", e);
    }
}
Also used : InputSource(org.xml.sax.InputSource) OOXMLWordAndPowerPointTextHandler(org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream) InputStream(java.io.InputStream) ParserConfigurationException(javax.xml.parsers.ParserConfigurationException) XMLReader(org.xml.sax.XMLReader) CloseShieldInputStream(org.apache.commons.io.input.CloseShieldInputStream)

Aggregations

CloseShieldInputStream (org.apache.commons.io.input.CloseShieldInputStream)35 XHTMLContentHandler (org.apache.tika.sax.XHTMLContentHandler)13 TikaException (org.apache.tika.exception.TikaException)12 InputStream (java.io.InputStream)8 OfflineContentHandler (org.apache.tika.sax.OfflineContentHandler)8 TikaInputStream (org.apache.tika.io.TikaInputStream)7 AutoDetectReader (org.apache.tika.detect.AutoDetectReader)6 MediaType (org.apache.tika.mime.MediaType)6 EmbeddedContentHandler (org.apache.tika.sax.EmbeddedContentHandler)5 SAXException (org.xml.sax.SAXException)5 BufferedInputStream (java.io.BufferedInputStream)4 Charset (java.nio.charset.Charset)4 TikaConfig (org.apache.tika.config.TikaConfig)4 Matcher (java.util.regex.Matcher)3 ZipArchiveEntry (org.apache.commons.compress.archivers.zip.ZipArchiveEntry)3 ZipArchiveInputStream (org.apache.commons.compress.archivers.zip.ZipArchiveInputStream)3 EmbeddedDocumentExtractor (org.apache.tika.extractor.EmbeddedDocumentExtractor)3 Metadata (org.apache.tika.metadata.Metadata)3 TaggedContentHandler (org.apache.tika.sax.TaggedContentHandler)3 InputSource (org.xml.sax.InputSource)3