use of org.apache.tika.metadata.Metadata in project tika by apache.
the class TestParsers method testWORDxtraction.
@Test
public void testWORDxtraction() throws Exception {
File file = getResourceAsFile("/test-documents/testWORD.doc");
Parser parser = tika.getParser();
Metadata metadata = new Metadata();
try (InputStream stream = new FileInputStream(file)) {
parser.parse(stream, new DefaultHandler(), metadata, new ParseContext());
}
assertEquals("Sample Word Document", metadata.get(TikaCoreProperties.TITLE));
}
use of org.apache.tika.metadata.Metadata in project tika by apache.
the class TensorflowImageRecParser method recognise.
@Override
public List<RecognisedObject> recognise(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
Metadata md = new Metadata();
parse(stream, handler, md, context);
List<RecognisedObject> objects = new ArrayList<>();
for (String key : md.names()) {
double confidence = Double.parseDouble(md.get(key));
objects.add(new RecognisedObject(key, "eng", key, confidence));
}
return objects;
}
use of org.apache.tika.metadata.Metadata in project tika by apache.
the class RTFEmbObjHandler method startPict.
protected void startPict() {
state = EMB_STATE.PICT;
metadata = new Metadata();
}
use of org.apache.tika.metadata.Metadata in project tika by apache.
the class AutoDetectParserTest method testNoBombDetectedForInvalidXml.
/**
* Make sure XML parse errors don't trigger ZIP bomb detection.
*
* @see <a href="https://issues.apache.org/jira/browse/TIKA-1322">TIKA-1322</a>
*/
@Test
public void testNoBombDetectedForInvalidXml() throws Exception {
// create zip with ten empty / invalid XML files, 1.xml .. 10.xml
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ZipOutputStream zos = new ZipOutputStream(baos);
for (int i = 1; i <= 10; i++) {
zos.putNextEntry(new ZipEntry(i + ".xml"));
zos.closeEntry();
}
zos.finish();
zos.close();
new AutoDetectParser(tika).parse(new ByteArrayInputStream(baos.toByteArray()), new BodyContentHandler(-1), new Metadata());
}
use of org.apache.tika.metadata.Metadata in project tika by apache.
the class TestMimeTypes method assertType.
private void assertType(String expected, String filename) throws Exception {
try (InputStream stream = TestMimeTypes.class.getResourceAsStream("/test-documents/" + filename)) {
assertNotNull("Test file not found: " + filename, stream);
Metadata metadata = new Metadata();
metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
assertEquals(expected, repo.detect(stream, metadata).toString());
}
}
Aggregations