Search in sources :

Example 91 with TikaException

use of org.apache.tika.exception.TikaException in project tika by apache.

the class CryptoParser method parse.

public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    try {
        Cipher cipher;
        if (provider != null) {
            cipher = Cipher.getInstance(transformation, provider);
        } else {
            cipher = Cipher.getInstance(transformation);
        }
        Key key = context.get(Key.class);
        if (key == null) {
            throw new EncryptedDocumentException("No decryption key provided");
        }
        AlgorithmParameters params = context.get(AlgorithmParameters.class);
        SecureRandom random = context.get(SecureRandom.class);
        if (params != null && random != null) {
            cipher.init(Cipher.DECRYPT_MODE, key, params, random);
        } else if (params != null) {
            cipher.init(Cipher.DECRYPT_MODE, key, params);
        } else if (random != null) {
            cipher.init(Cipher.DECRYPT_MODE, key, random);
        } else {
            cipher.init(Cipher.DECRYPT_MODE, key);
        }
        super.parse(new CipherInputStream(stream, cipher), handler, metadata, context);
    } catch (GeneralSecurityException e) {
        throw new TikaException("Unable to decrypt document stream", e);
    }
}
Also used : EncryptedDocumentException(org.apache.tika.exception.EncryptedDocumentException) TikaException(org.apache.tika.exception.TikaException) CipherInputStream(javax.crypto.CipherInputStream) GeneralSecurityException(java.security.GeneralSecurityException) SecureRandom(java.security.SecureRandom) Cipher(javax.crypto.Cipher) Key(java.security.Key) AlgorithmParameters(java.security.AlgorithmParameters)

Example 92 with TikaException

use of org.apache.tika.exception.TikaException in project tika by apache.

the class ForkClient method sendObject.

/**
     * Serializes the object first into an in-memory buffer and then
     * writes it to the output stream with a preceding size integer.
     *
     * @param object object to be serialized
     * @param resources list of fork resources, used when adding proxies
     * @throws IOException if the object could not be serialized
     */
private void sendObject(Object object, List<ForkResource> resources) throws IOException, TikaException {
    int n = resources.size();
    if (object instanceof InputStream) {
        resources.add(new InputStreamResource((InputStream) object));
        object = new InputStreamProxy(n);
    } else if (object instanceof ContentHandler) {
        resources.add(new ContentHandlerResource((ContentHandler) object));
        object = new ContentHandlerProxy(n);
    } else if (object instanceof ClassLoader) {
        resources.add(new ClassLoaderResource((ClassLoader) object));
        object = new ClassLoaderProxy(n);
    }
    try {
        ForkObjectInputStream.sendObject(object, output);
    } catch (NotSerializableException nse) {
        // Build a more friendly error message for this
        throw new TikaException("Unable to serialize " + object.getClass().getSimpleName() + " to pass to the Forked Parser", nse);
    }
    waitForResponse(resources);
}
Also used : NotSerializableException(java.io.NotSerializableException) TikaException(org.apache.tika.exception.TikaException) DataInputStream(java.io.DataInputStream) InputStream(java.io.InputStream) ContentHandler(org.xml.sax.ContentHandler)

Example 93 with TikaException

use of org.apache.tika.exception.TikaException in project tika by apache.

the class ForkParser method parse.

public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    if (stream == null) {
        throw new NullPointerException("null stream");
    }
    Throwable t;
    boolean alive = false;
    ForkClient client = acquireClient();
    try {
        ContentHandler tee = new TeeContentHandler(handler, new MetadataContentHandler(metadata));
        t = client.call("parse", stream, tee, metadata, context);
        alive = true;
    } catch (TikaException te) {
        // Problem occurred on our side
        alive = true;
        throw te;
    } catch (IOException e) {
        // Problem occurred on the other side
        throw new TikaException("Failed to communicate with a forked parser process." + " The process has most likely crashed due to some error" + " like running out of memory. A new process will be" + " started for the next parsing request.", e);
    } finally {
        releaseClient(client, alive);
    }
    if (t instanceof IOException) {
        throw (IOException) t;
    } else if (t instanceof SAXException) {
        throw (SAXException) t;
    } else if (t instanceof TikaException) {
        throw (TikaException) t;
    } else if (t != null) {
        throw new TikaException("Unexpected error in forked server process", t);
    }
}
Also used : TikaException(org.apache.tika.exception.TikaException) IOException(java.io.IOException) TeeContentHandler(org.apache.tika.sax.TeeContentHandler) TeeContentHandler(org.apache.tika.sax.TeeContentHandler) ContentHandler(org.xml.sax.ContentHandler) SAXException(org.xml.sax.SAXException)

Example 94 with TikaException

use of org.apache.tika.exception.TikaException in project tika by apache.

the class GeographicInformationParser method parse.

@Override
public void parse(InputStream inputStream, ContentHandler contentHandler, Metadata metadata, ParseContext parseContext) throws IOException, SAXException, TikaException {
    metadata.set(Metadata.CONTENT_TYPE, geoInfoType);
    DataStore dataStore = null;
    DefaultMetadata defaultMetadata = null;
    XHTMLContentHandler xhtmlContentHandler = new XHTMLContentHandler(contentHandler, metadata);
    TemporaryResources tmp = TikaInputStream.isTikaInputStream(inputStream) ? null : new TemporaryResources();
    try {
        TikaInputStream tikaInputStream = TikaInputStream.get(inputStream, tmp);
        File file = tikaInputStream.getFile();
        dataStore = DataStores.open(file);
        defaultMetadata = new DefaultMetadata(dataStore.getMetadata());
        if (defaultMetadata != null)
            extract(xhtmlContentHandler, metadata, defaultMetadata);
    } catch (UnsupportedStorageException e) {
        throw new TikaException("UnsupportedStorageException", e);
    } catch (DataStoreException e) {
        throw new TikaException("DataStoreException", e);
    } finally {
        if (tmp != null) {
            tmp.dispose();
        }
    }
}
Also used : DataStoreException(org.apache.sis.storage.DataStoreException) TikaException(org.apache.tika.exception.TikaException) DataStore(org.apache.sis.storage.DataStore) DefaultMetadata(org.apache.sis.metadata.iso.DefaultMetadata) TemporaryResources(org.apache.tika.io.TemporaryResources) TikaInputStream(org.apache.tika.io.TikaInputStream) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) File(java.io.File) UnsupportedStorageException(org.apache.sis.storage.UnsupportedStorageException)

Example 95 with TikaException

use of org.apache.tika.exception.TikaException in project tika by apache.

the class GribParser method parse.

public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    //Set MIME type as grib2
    metadata.set(Metadata.CONTENT_TYPE, GRIB_MIME_TYPE);
    TikaInputStream tis = TikaInputStream.get(stream, new TemporaryResources());
    File gribFile = tis.getFile();
    try {
        NetcdfFile ncFile = NetcdfDataset.openFile(gribFile.getAbsolutePath(), null);
        // first parse out the set of global attributes
        for (Attribute attr : ncFile.getGlobalAttributes()) {
            Property property = resolveMetadataKey(attr.getFullName());
            if (attr.getDataType().isString()) {
                metadata.add(property, attr.getStringValue());
            } else if (attr.getDataType().isNumeric()) {
                int value = attr.getNumericValue().intValue();
                metadata.add(property, String.valueOf(value));
            }
        }
        XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
        xhtml.startDocument();
        xhtml.newline();
        xhtml.startElement("ul");
        xhtml.characters("dimensions:");
        xhtml.newline();
        for (Dimension dim : ncFile.getDimensions()) {
            xhtml.element("li", dim.getFullName() + "=" + String.valueOf(dim.getLength()) + ";");
            xhtml.newline();
        }
        xhtml.startElement("ul");
        xhtml.characters("variables:");
        xhtml.newline();
        for (Variable var : ncFile.getVariables()) {
            xhtml.element("p", String.valueOf(var.getDataType()) + var.getNameAndDimensions() + ";");
            for (Attribute element : var.getAttributes()) {
                xhtml.element("li", " :" + element + ";");
                xhtml.newline();
            }
        }
        xhtml.endElement("ul");
        xhtml.endElement("ul");
        xhtml.endDocument();
    } catch (IOException e) {
        throw new TikaException("NetCDF parse error", e);
    }
}
Also used : NetcdfFile(ucar.nc2.NetcdfFile) Variable(ucar.nc2.Variable) TikaException(org.apache.tika.exception.TikaException) Attribute(ucar.nc2.Attribute) TemporaryResources(org.apache.tika.io.TemporaryResources) TikaInputStream(org.apache.tika.io.TikaInputStream) Dimension(ucar.nc2.Dimension) IOException(java.io.IOException) XHTMLContentHandler(org.apache.tika.sax.XHTMLContentHandler) File(java.io.File) NetcdfFile(ucar.nc2.NetcdfFile) Property(org.apache.tika.metadata.Property)

Aggregations

TikaException (org.apache.tika.exception.TikaException)144 IOException (java.io.IOException)56 SAXException (org.xml.sax.SAXException)44 InputStream (java.io.InputStream)37 Metadata (org.apache.tika.metadata.Metadata)35 TikaInputStream (org.apache.tika.io.TikaInputStream)33 XHTMLContentHandler (org.apache.tika.sax.XHTMLContentHandler)29 ParseContext (org.apache.tika.parser.ParseContext)19 Test (org.junit.Test)19 BodyContentHandler (org.apache.tika.sax.BodyContentHandler)17 ContentHandler (org.xml.sax.ContentHandler)17 CloseShieldInputStream (org.apache.commons.io.input.CloseShieldInputStream)15 TemporaryResources (org.apache.tika.io.TemporaryResources)15 MediaType (org.apache.tika.mime.MediaType)14 Parser (org.apache.tika.parser.Parser)14 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)13 ByteArrayInputStream (java.io.ByteArrayInputStream)12 ArrayList (java.util.ArrayList)11 File (java.io.File)8 EmbeddedContentHandler (org.apache.tika.sax.EmbeddedContentHandler)8