Search in sources :

Example 31 with Metadata

use of org.apache.tika.metadata.Metadata in project tika by apache.

the class TestParsers method testWORDxtraction.

@Test
public void testWORDxtraction() throws Exception {
    File file = getResourceAsFile("/test-documents/testWORD.doc");
    Parser parser = tika.getParser();
    Metadata metadata = new Metadata();
    try (InputStream stream = new FileInputStream(file)) {
        parser.parse(stream, new DefaultHandler(), metadata, new ParseContext());
    }
    assertEquals("Sample Word Document", metadata.get(TikaCoreProperties.TITLE));
}
Also used : FileInputStream(java.io.FileInputStream) InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata) ParseContext(org.apache.tika.parser.ParseContext) File(java.io.File) FileInputStream(java.io.FileInputStream) Parser(org.apache.tika.parser.Parser) DefaultHandler(org.xml.sax.helpers.DefaultHandler) Test(org.junit.Test)

Example 32 with Metadata

use of org.apache.tika.metadata.Metadata in project tika by apache.

the class TensorflowImageRecParser method recognise.

@Override
public List<RecognisedObject> recognise(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    Metadata md = new Metadata();
    parse(stream, handler, md, context);
    List<RecognisedObject> objects = new ArrayList<>();
    for (String key : md.names()) {
        double confidence = Double.parseDouble(md.get(key));
        objects.add(new RecognisedObject(key, "eng", key, confidence));
    }
    return objects;
}
Also used : Metadata(org.apache.tika.metadata.Metadata) ArrayList(java.util.ArrayList) RecognisedObject(org.apache.tika.parser.recognition.RecognisedObject)

Example 33 with Metadata

use of org.apache.tika.metadata.Metadata in project tika by apache.

the class RTFEmbObjHandler method startPict.

protected void startPict() {
    state = EMB_STATE.PICT;
    metadata = new Metadata();
}
Also used : RTFMetadata(org.apache.tika.metadata.RTFMetadata) Metadata(org.apache.tika.metadata.Metadata)

Example 34 with Metadata

use of org.apache.tika.metadata.Metadata in project tika by apache.

the class AutoDetectParserTest method testNoBombDetectedForInvalidXml.

/**
     * Make sure XML parse errors don't trigger ZIP bomb detection.
     *
     * @see <a href="https://issues.apache.org/jira/browse/TIKA-1322">TIKA-1322</a>
     */
@Test
public void testNoBombDetectedForInvalidXml() throws Exception {
    // create zip with ten empty / invalid XML files, 1.xml .. 10.xml
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ZipOutputStream zos = new ZipOutputStream(baos);
    for (int i = 1; i <= 10; i++) {
        zos.putNextEntry(new ZipEntry(i + ".xml"));
        zos.closeEntry();
    }
    zos.finish();
    zos.close();
    new AutoDetectParser(tika).parse(new ByteArrayInputStream(baos.toByteArray()), new BodyContentHandler(-1), new Metadata());
}
Also used : BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ByteArrayInputStream(java.io.ByteArrayInputStream) ZipOutputStream(java.util.zip.ZipOutputStream) ZipEntry(java.util.zip.ZipEntry) Metadata(org.apache.tika.metadata.Metadata) ByteArrayOutputStream(java.io.ByteArrayOutputStream) Test(org.junit.Test)

Example 35 with Metadata

use of org.apache.tika.metadata.Metadata in project tika by apache.

the class TestMimeTypes method assertType.

private void assertType(String expected, String filename) throws Exception {
    try (InputStream stream = TestMimeTypes.class.getResourceAsStream("/test-documents/" + filename)) {
        assertNotNull("Test file not found: " + filename, stream);
        Metadata metadata = new Metadata();
        metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
        assertEquals(expected, repo.detect(stream, metadata).toString());
    }
}
Also used : ByteArrayInputStream(java.io.ByteArrayInputStream) InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata)

Aggregations

Metadata (org.apache.tika.metadata.Metadata)651 Test (org.junit.Test)467 InputStream (java.io.InputStream)320 ParseContext (org.apache.tika.parser.ParseContext)283 BodyContentHandler (org.apache.tika.sax.BodyContentHandler)269 TikaTest (org.apache.tika.TikaTest)257 ContentHandler (org.xml.sax.ContentHandler)229 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)154 ByteArrayInputStream (java.io.ByteArrayInputStream)143 Parser (org.apache.tika.parser.Parser)136 TikaInputStream (org.apache.tika.io.TikaInputStream)133 IOException (java.io.IOException)66 DefaultHandler (org.xml.sax.helpers.DefaultHandler)59 TikaException (org.apache.tika.exception.TikaException)48 ExcelParserTest (org.apache.tika.parser.microsoft.ExcelParserTest)36 WordParserTest (org.apache.tika.parser.microsoft.WordParserTest)36 StringWriter (java.io.StringWriter)33 Tika (org.apache.tika.Tika)29 MediaType (org.apache.tika.mime.MediaType)29 SAXException (org.xml.sax.SAXException)29