Search in sources :

Example 1 with Textual

use of org.opencastproject.metadata.mpeg7.Textual in project opencast by opencast.

the class DictionaryServiceImpl method cleanUpText.

/**
 * Filter the text according to the rules defined by the dictionary
 * implementation used. This implementation uses a regular expression to find
 * matching terms.
 *
 * @return filtered text
 */
@Override
public Textual cleanUpText(String text) {
    logger.debug("Text input: “{}”", text);
    LinkedList<String> words = new LinkedList<String>();
    Matcher matcher = compilesPattern.matcher(text);
    while (matcher.find()) {
        words.add(matcher.group());
    }
    String result = org.apache.commons.lang3.StringUtils.join(words, " ");
    logger.debug("Resulting text: “{}”", result);
    if ("".equals(result)) {
        return null;
    }
    return new TextualImpl(result);
}
Also used : Matcher(java.util.regex.Matcher) TextualImpl(org.opencastproject.metadata.mpeg7.TextualImpl) LinkedList(java.util.LinkedList)

Example 2 with Textual

use of org.opencastproject.metadata.mpeg7.Textual in project opencast by opencast.

the class TextAnalyzerServiceImpl method analyze.

/**
 * Returns the video text element for the given image.
 *
 * @param imageFile
 *          the image
 * @param id
 *          the video text id
 * @return the video text found on the image
 * @throws TextAnalyzerException
 *           if accessing the image fails
 */
protected VideoText[] analyze(File imageFile, String id) throws TextAnalyzerException {
    /* Call the text extractor implementation to extract the text from the
     * provided image file */
    List<VideoText> videoTexts = new ArrayList<VideoText>();
    TextFrame textFrame = null;
    try {
        textFrame = textExtractor.extract(imageFile);
    } catch (IOException e) {
        logger.warn("Error reading image file {}: {}", imageFile, e.getMessage());
        throw new TextAnalyzerException(e);
    } catch (TextExtractorException e) {
        logger.warn("Error extracting text from {}: {}", imageFile, e.getMessage());
        throw new TextAnalyzerException(e);
    }
    /* Get detected text as raw string */
    int i = 1;
    for (TextLine line : textFrame.getLines()) {
        if (line.getText() != null) {
            VideoText videoText = new VideoTextImpl(id + "-" + i++);
            videoText.setBoundary(line.getBoundaries());
            Textual text = dictionaryService.cleanUpText(line.getText());
            if (text != null) {
                videoText.setText(text);
                videoTexts.add(videoText);
            }
        }
    }
    return videoTexts.toArray(new VideoText[videoTexts.size()]);
}
Also used : TextAnalyzerException(org.opencastproject.textanalyzer.api.TextAnalyzerException) TextLine(org.opencastproject.textextractor.api.TextLine) Textual(org.opencastproject.metadata.mpeg7.Textual) TextExtractorException(org.opencastproject.textextractor.api.TextExtractorException) VideoTextImpl(org.opencastproject.metadata.mpeg7.VideoTextImpl) ArrayList(java.util.ArrayList) TextFrame(org.opencastproject.textextractor.api.TextFrame) IOException(java.io.IOException) VideoText(org.opencastproject.metadata.mpeg7.VideoText)

Aggregations

IOException (java.io.IOException)1 ArrayList (java.util.ArrayList)1 LinkedList (java.util.LinkedList)1 Matcher (java.util.regex.Matcher)1 Textual (org.opencastproject.metadata.mpeg7.Textual)1 TextualImpl (org.opencastproject.metadata.mpeg7.TextualImpl)1 VideoText (org.opencastproject.metadata.mpeg7.VideoText)1 VideoTextImpl (org.opencastproject.metadata.mpeg7.VideoTextImpl)1 TextAnalyzerException (org.opencastproject.textanalyzer.api.TextAnalyzerException)1 TextExtractorException (org.opencastproject.textextractor.api.TextExtractorException)1 TextFrame (org.opencastproject.textextractor.api.TextFrame)1 TextLine (org.opencastproject.textextractor.api.TextLine)1