Search in sources :

Example 1 with RecognisedObject

use of org.apache.tika.parser.recognition.RecognisedObject in project tika by apache.

the class TensorflowImageRecParser method recognise.

@Override
public List<RecognisedObject> recognise(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    Metadata md = new Metadata();
    parse(stream, handler, md, context);
    List<RecognisedObject> objects = new ArrayList<>();
    for (String key : md.names()) {
        double confidence = Double.parseDouble(md.get(key));
        objects.add(new RecognisedObject(key, "eng", key, confidence));
    }
    return objects;
}
Also used : Metadata(org.apache.tika.metadata.Metadata) ArrayList(java.util.ArrayList) RecognisedObject(org.apache.tika.parser.recognition.RecognisedObject)

Example 2 with RecognisedObject

use of org.apache.tika.parser.recognition.RecognisedObject in project tika by apache.

the class TensorflowVideoRecParserTest method recognise.

@Test
public void recognise() throws Exception {
    TensorflowRESTVideoRecogniser recogniser = new TensorflowRESTVideoRecogniser();
    recogniser.initialize(new HashMap<String, Param>());
    try (InputStream stream = getClass().getClassLoader().getResourceAsStream("test-documents/testVideoMp4.mp4")) {
        List<RecognisedObject> objects = recogniser.recognise(stream, new DefaultHandler(), new Metadata(), new ParseContext());
        Assert.assertTrue(objects.size() > 0);
        Set<String> objectLabels = new HashSet<>();
        for (RecognisedObject object : objects) {
            objectLabels.add(object.getLabel());
        }
        Assert.assertTrue(objectLabels.size() > 0);
    }
}
Also used : InputStream(java.io.InputStream) Param(org.apache.tika.config.Param) Metadata(org.apache.tika.metadata.Metadata) ParseContext(org.apache.tika.parser.ParseContext) RecognisedObject(org.apache.tika.parser.recognition.RecognisedObject) DefaultHandler(org.xml.sax.helpers.DefaultHandler) HashSet(java.util.HashSet) Test(org.junit.Test)

Example 3 with RecognisedObject

use of org.apache.tika.parser.recognition.RecognisedObject in project tika by apache.

the class TensorflowImageRecParserTest method recognise.

@Test
public void recognise() throws Exception {
    TensorflowImageRecParser recogniser = new TensorflowImageRecParser();
    recogniser.initialize(new HashMap<String, Param>());
    try (InputStream stream = getClass().getClassLoader().getResourceAsStream("test-documents/testJPEG.jpg")) {
        List<RecognisedObject> objects = recogniser.recognise(stream, new DefaultHandler(), new Metadata(), new ParseContext());
        Assert.assertTrue(5 == objects.size());
        Set<String> objectLabels = new HashSet<>();
        for (RecognisedObject object : objects) {
            objectLabels.add(object.getLabel());
        }
        System.out.println(objectLabels);
        String[] expected = { "Egyptian cat", "tabby, tabby cat" };
        for (String label : expected) {
            Assert.assertTrue(label + " is expected", objectLabels.contains(label));
        }
    }
}
Also used : InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata) RecognisedObject(org.apache.tika.parser.recognition.RecognisedObject) DefaultHandler(org.xml.sax.helpers.DefaultHandler) Param(org.apache.tika.config.Param) ParseContext(org.apache.tika.parser.ParseContext) HashSet(java.util.HashSet) Test(org.junit.Test)

Example 4 with RecognisedObject

use of org.apache.tika.parser.recognition.RecognisedObject in project tika by apache.

the class DL4JInceptionV3Net method recognise.

@Override
public List<RecognisedObject> recognise(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    INDArray image = preProcessImage(imageLoader.asMatrix(stream));
    INDArray scores = graph.outputSingle(image);
    List<RecognisedObject> result = new ArrayList<>();
    for (int i = 0; i < scores.length(); i++) {
        if (scores.getDouble(i) > minConfidence) {
            String label = labelMap.get(i);
            String id = i + "";
            result.add(new RecognisedObject(label, labelLang, id, scores.getDouble(i)));
            LOG.debug("Found Object {}", label);
        }
    }
    return result;
}
Also used : INDArray(org.nd4j.linalg.api.ndarray.INDArray) ArrayList(java.util.ArrayList) RecognisedObject(org.apache.tika.parser.recognition.RecognisedObject)

Example 5 with RecognisedObject

use of org.apache.tika.parser.recognition.RecognisedObject in project tika by apache.

the class TensorflowRESTRecogniser method recognise.

@Override
public List<RecognisedObject> recognise(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {
    List<RecognisedObject> recObjs = new ArrayList<>();
    try {
        DefaultHttpClient client = new DefaultHttpClient();
        HttpPost request = new HttpPost(getApiUri(metadata));
        try (ByteArrayOutputStream byteStream = new ByteArrayOutputStream()) {
            //TODO: convert this to stream, this might cause OOM issue
            // InputStreamEntity is not working
            // request.setEntity(new InputStreamEntity(stream, -1));
            IOUtils.copy(stream, byteStream);
            request.setEntity(new ByteArrayEntity(byteStream.toByteArray()));
        }
        HttpResponse response = client.execute(request);
        try (InputStream reply = response.getEntity().getContent()) {
            String replyMessage = IOUtils.toString(reply);
            if (response.getStatusLine().getStatusCode() == 200) {
                JSONObject jReply = new JSONObject(replyMessage);
                JSONArray jClasses = jReply.getJSONArray("classnames");
                JSONArray jConfidence = jReply.getJSONArray("confidence");
                if (jClasses.length() != jConfidence.length()) {
                    LOG.warn("Classes of size {} is not equal to confidence of size {}", jClasses.length(), jConfidence.length());
                }
                assert jClasses.length() == jConfidence.length();
                for (int i = 0; i < jClasses.length(); i++) {
                    RecognisedObject recObj = new RecognisedObject(jClasses.getString(i), LABEL_LANG, jClasses.getString(i), jConfidence.getDouble(i));
                    recObjs.add(recObj);
                }
            } else {
                LOG.warn("Status = {}", response.getStatusLine());
                LOG.warn("Response = {}", replyMessage);
            }
        }
    } catch (Exception e) {
        LOG.warn(e.getMessage(), e);
    }
    LOG.debug("Num Objects found {}", recObjs.size());
    return recObjs;
}
Also used : HttpPost(org.apache.http.client.methods.HttpPost) InputStream(java.io.InputStream) ArrayList(java.util.ArrayList) JSONArray(org.json.JSONArray) HttpResponse(org.apache.http.HttpResponse) RecognisedObject(org.apache.tika.parser.recognition.RecognisedObject) ByteArrayOutputStream(java.io.ByteArrayOutputStream) DefaultHttpClient(org.apache.http.impl.client.DefaultHttpClient) TikaException(org.apache.tika.exception.TikaException) TikaConfigException(org.apache.tika.exception.TikaConfigException) IOException(java.io.IOException) SAXException(org.xml.sax.SAXException) ByteArrayEntity(org.apache.http.entity.ByteArrayEntity) JSONObject(org.json.JSONObject)

Aggregations

RecognisedObject (org.apache.tika.parser.recognition.RecognisedObject)5 InputStream (java.io.InputStream)3 ArrayList (java.util.ArrayList)3 Metadata (org.apache.tika.metadata.Metadata)3 HashSet (java.util.HashSet)2 Param (org.apache.tika.config.Param)2 ParseContext (org.apache.tika.parser.ParseContext)2 Test (org.junit.Test)2 DefaultHandler (org.xml.sax.helpers.DefaultHandler)2 ByteArrayOutputStream (java.io.ByteArrayOutputStream)1 IOException (java.io.IOException)1 HttpResponse (org.apache.http.HttpResponse)1 HttpPost (org.apache.http.client.methods.HttpPost)1 ByteArrayEntity (org.apache.http.entity.ByteArrayEntity)1 DefaultHttpClient (org.apache.http.impl.client.DefaultHttpClient)1 TikaConfigException (org.apache.tika.exception.TikaConfigException)1 TikaException (org.apache.tika.exception.TikaException)1 JSONArray (org.json.JSONArray)1 JSONObject (org.json.JSONObject)1 INDArray (org.nd4j.linalg.api.ndarray.INDArray)1