Search in sources :

Example 11 with Paragraph

use of com.joliciel.jochre.graphics.Paragraph in project jochre by urieli.

the class ImageController method save.

void save() {
    try {
        Comboitem selectedItem = cmbStatus.getSelectedItem();
        ImageStatus imageStatus = ImageStatus.forId((Integer) selectedItem.getValue());
        currentImage.setImageStatus(imageStatus);
        if (currentUser.getRole().equals(UserRole.ADMIN)) {
            User owner = (User) cmbOwner.getSelectedItem().getValue();
            currentImage.setOwner(owner);
        }
        GraphicsDao graphicsDao = GraphicsDao.getInstance(jochreSession);
        graphicsDao.saveJochreImage(currentImage);
        for (Paragraph paragraph : currentImage.getParagraphs()) {
            LOG.trace("Paragraph " + paragraph.getIndex() + ", " + paragraph.getRows().size() + " rows");
            for (RowOfShapes row : paragraph.getRows()) {
                List<List<String>> letterGroups = this.getLetterGroups(row);
                LOG.trace("Row " + row.getIndex() + ", " + row.getGroups().size() + " groups, " + letterGroups.size() + " letter groups");
                Iterator<List<String>> iLetterGroups = letterGroups.iterator();
                for (GroupOfShapes group : row.getGroups()) {
                    LOG.trace("Group " + group.getIndex() + " text : " + group.getWord());
                    boolean hasChange = false;
                    List<String> letters = null;
                    if (iLetterGroups.hasNext())
                        letters = iLetterGroups.next();
                    else
                        letters = new ArrayList<String>();
                    LOG.trace("Found " + letters.size() + " letters in text");
                    Iterator<String> iLetters = letters.iterator();
                    for (Shape shape : group.getShapes()) {
                        String currentLetter = shape.getLetter();
                        if (currentLetter == null)
                            currentLetter = "";
                        String newLetter = "";
                        if (iLetters.hasNext())
                            newLetter = iLetters.next();
                        if (newLetter.startsWith("[") && newLetter.endsWith("]")) {
                            newLetter = newLetter.substring(1, newLetter.length() - 1);
                        }
                        LOG.trace("currentLetter:  " + currentLetter + ", newLetter: " + newLetter);
                        if (!currentLetter.equals(newLetter)) {
                            LOG.trace("newLetter: " + newLetter);
                            shape.setLetter(newLetter);
                            shape.save();
                            hasChange = true;
                        }
                    }
                    if (hasChange)
                        LOG.trace("Group text after : " + group.getWord());
                }
            // next group
            }
        // next row
        }
        // next paragraph
        Messagebox.show(Labels.getLabel("button.saveComplete"));
    } catch (Exception e) {
        LOG.error("Failure in save", e);
        throw new RuntimeException(e);
    }
}
Also used : User(com.joliciel.jochre.security.User) Shape(com.joliciel.jochre.graphics.Shape) ImageStatus(com.joliciel.jochre.graphics.ImageStatus) ArrayList(java.util.ArrayList) RowOfShapes(com.joliciel.jochre.graphics.RowOfShapes) Paragraph(com.joliciel.jochre.graphics.Paragraph) GraphicsDao(com.joliciel.jochre.graphics.GraphicsDao) GroupOfShapes(com.joliciel.jochre.graphics.GroupOfShapes) Comboitem(org.zkoss.zul.Comboitem) List(java.util.List) ArrayList(java.util.ArrayList)

Example 12 with Paragraph

use of com.joliciel.jochre.graphics.Paragraph in project jochre by urieli.

the class UnknownWordListWriter method onImageComplete.

@Override
public void onImageComplete(JochreImage image) {
    try {
        for (Paragraph paragraph : image.getParagraphs()) {
            if (!paragraph.isJunk()) {
                for (RowOfShapes row : paragraph.getRows()) {
                    for (GroupOfShapes group : row.getGroups()) {
                        if (group.getBestLetterSequence() != null) {
                            for (LetterSequence subsequence : group.getBestLetterSequence().getSubsequences()) {
                                for (CountedOutcome<String> wordFrequency : subsequence.getWordFrequencies()) {
                                    if (wordFrequency.getCount() == 0) {
                                        writer.write(wordFrequency.getOutcome() + "\n");
                                        writer.flush();
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    } catch (IOException e) {
        LOG.error("Failed to write to UnknownWordListWriter", e);
        throw new RuntimeException(e);
    }
}
Also used : LetterSequence(com.joliciel.jochre.letterGuesser.LetterSequence) GroupOfShapes(com.joliciel.jochre.graphics.GroupOfShapes) RowOfShapes(com.joliciel.jochre.graphics.RowOfShapes) IOException(java.io.IOException) Paragraph(com.joliciel.jochre.graphics.Paragraph)

Example 13 with Paragraph

use of com.joliciel.jochre.graphics.Paragraph in project jochre by urieli.

the class CorpusLexiconBuilder method buildLexicon.

/**
 * Build a lexicon from the training corpus.
 */
public TextFileLexicon buildLexicon() {
    TextFileLexicon lexicon = new TextFileLexicon();
    JochreCorpusImageReader imageReader = new JochreCorpusImageReader(jochreSession);
    imageReader.setSelectionCriteria(criteria);
    String wordText = "";
    while (imageReader.hasNext()) {
        JochreImage image = imageReader.next();
        for (Paragraph paragraph : image.getParagraphs()) {
            // rows ending in dashes can only be held-over within the same
            // paragraph.
            // to avoid strange things like a page number getting added to
            // the word,
            // if the dash is on the last row of the page.
            String holdoverWord = null;
            for (RowOfShapes row : paragraph.getRows()) {
                for (GroupOfShapes group : row.getGroups()) {
                    if (group.isBrokenWord())
                        continue;
                    wordText = "";
                    for (Shape shape : group.getShapes()) {
                        if (shape.getLetter() != null)
                            wordText += shape.getLetter();
                    }
                    if (wordText.length() == 0) {
                        lexicon.incrementEntry("");
                        continue;
                    }
                    List<String> words = jochreSession.getLinguistics().splitText(wordText);
                    int i = 0;
                    for (String word : words) {
                        if (i == 0) {
                            // first word
                            if (holdoverWord != null && holdoverWord.length() > 0) {
                                word = holdoverWord + word;
                                holdoverWord = null;
                            }
                        }
                        if (i == words.size() - 1) {
                            // last word
                            if (group.getIndex() == row.getGroups().size() - 1 && word.endsWith("-")) {
                                // a dash at the end of a line
                                if (group.isHardHyphen())
                                    holdoverWord = word;
                                else
                                    holdoverWord = word.substring(0, word.length() - 1);
                                word = "";
                            }
                        }
                        lexicon.incrementEntry(word);
                        i++;
                    }
                }
            }
        }
    }
    return lexicon;
}
Also used : JochreCorpusImageReader(com.joliciel.jochre.graphics.JochreCorpusImageReader) JochreImage(com.joliciel.jochre.graphics.JochreImage) Shape(com.joliciel.jochre.graphics.Shape) GroupOfShapes(com.joliciel.jochre.graphics.GroupOfShapes) RowOfShapes(com.joliciel.jochre.graphics.RowOfShapes) Paragraph(com.joliciel.jochre.graphics.Paragraph)

Example 14 with Paragraph

use of com.joliciel.jochre.graphics.Paragraph in project jochre by urieli.

the class SegmentationTest method testAlsacien1.

/**
 * Segmentation errors reported for Alsacien.
 *
 * @throws Exception
 */
@Test
public void testAlsacien1(@Mocked final JochrePage jochrePage, @Mocked final JochreDocument jochreDoc) throws Exception {
    Map<String, Object> configMap = new HashMap<>();
    configMap.put("jochre.locale", "de");
    Config config = ConfigFactory.parseMap(configMap).withFallback(ConfigFactory.load());
    JochreSession jochreSession = new JochreSession(config);
    new Expectations() {

        {
            jochrePage.getDocument();
            result = jochreDoc;
            minTimes = 0;
            jochreDoc.isLeftToRight();
            result = true;
            minTimes = 0;
        }
    };
    String imageName = "Alsacien1.jpg";
    LOG.debug(imageName);
    InputStream imageFileStream = getClass().getResourceAsStream("/com/joliciel/jochre/segmentation/" + imageName);
    assertNotNull(imageFileStream);
    BufferedImage image = ImageIO.read(imageFileStream);
    SourceImage sourceImage = new SourceImage(jochrePage, "", image, jochreSession);
    Segmenter segmenter = new Segmenter(sourceImage, jochreSession);
    segmenter.segment();
    List<Rectangle> textPars = new ArrayList<>();
    Rectangle textPar1 = new Rectangle(715, 517, 462, 115);
    // TODO: for now it's splitting this paragraph by row, since it's assuming
    // paragraphs cannot be
    // both outdented and indented on the same page
    // Rectangle textPar2 = new Rectangle(50, 666, 1798, 1039);
    Rectangle textPar3 = new Rectangle(55, 1837, 1777, 335);
    Rectangle textPar4 = new Rectangle(50, 2211, 1765, 154);
    Rectangle textPar5 = new Rectangle(44, 2404, 1782, 511);
    Rectangle textPar6 = new Rectangle(50, 2948, 1776, 154);
    Rectangle textPar7 = new Rectangle(50, 3135, 1770, 77);
    // title paragraph
    textPars.add(textPar1);
    // textPars.add(textPar2);
    textPars.add(textPar3);
    textPars.add(textPar4);
    textPars.add(textPar5);
    textPars.add(textPar6);
    textPars.add(textPar7);
    int i = 0;
    int j = 0;
    List<Paragraph> textParagraphs = new ArrayList<>();
    for (Paragraph par : sourceImage.getParagraphs()) {
        Rectangle real = new Rectangle(par.getLeft(), par.getTop(), par.getRight() - par.getLeft(), par.getBottom() - par.getTop());
        Rectangle expected = textPars.get(i);
        Rectangle intersection = expected.intersection(real);
        double realArea = real.width * real.height;
        double expectedArea = expected.width * expected.height;
        double intersectionArea = intersection.width * intersection.height;
        double realRatio = intersectionArea / realArea;
        double expectedRatio = intersectionArea / expectedArea;
        LOG.debug("Paragraph " + j + ": " + par.toString());
        LOG.debug("realRatio: " + realRatio);
        LOG.debug("expectedRatio: " + expectedRatio);
        if (realRatio > 0.8 && expectedRatio > 0.8) {
            LOG.debug("Found");
            textParagraphs.add(par);
            i++;
        }
        j++;
    }
    assertEquals(textPars.size(), textParagraphs.size());
    int[] rowCounts = new int[] { 1, 4, 2, 6, 2, 1 };
    int[] wordCountsFirstRow = new int[] { 2, 0, 0, 0, 0, 0, 0 };
    for (i = 0; i < textParagraphs.size(); i++) {
        assertEquals("row count " + i, rowCounts[i], textParagraphs.get(i).getRows().size());
        RowOfShapes row = textParagraphs.get(i).getRows().get(0);
        if (wordCountsFirstRow[i] > 0)
            assertEquals("word count " + i, wordCountsFirstRow[i], row.getGroups().size());
    }
}
Also used : Expectations(mockit.Expectations) SourceImage(com.joliciel.jochre.graphics.SourceImage) HashMap(java.util.HashMap) Config(com.typesafe.config.Config) InputStream(java.io.InputStream) Rectangle(java.awt.Rectangle) ArrayList(java.util.ArrayList) RowOfShapes(com.joliciel.jochre.graphics.RowOfShapes) Segmenter(com.joliciel.jochre.graphics.Segmenter) BufferedImage(java.awt.image.BufferedImage) Paragraph(com.joliciel.jochre.graphics.Paragraph) JochreSession(com.joliciel.jochre.JochreSession) Test(org.junit.Test)

Example 15 with Paragraph

use of com.joliciel.jochre.graphics.Paragraph in project jochre by urieli.

the class SegmentationTest method testAlsacienPlay3.

/**
 * Segmentation errors reported for Alsacien play - challenging because of the
 * unusual indentation.
 *
 * @throws Exception
 */
@Test
public void testAlsacienPlay3(@Mocked final JochrePage jochrePage, @Mocked final JochreDocument jochreDoc) throws Exception {
    Map<String, Object> configMap = new HashMap<>();
    configMap.put("jochre.locale", "de");
    Config config = ConfigFactory.parseMap(configMap).withFallback(ConfigFactory.load());
    JochreSession jochreSession = new JochreSession(config);
    new Expectations() {

        {
            jochrePage.getDocument();
            result = jochreDoc;
            minTimes = 0;
            jochreDoc.isLeftToRight();
            result = true;
            minTimes = 0;
        }
    };
    String imageName = "AlsacienPlay3.jpg";
    LOG.debug(imageName);
    InputStream imageFileStream = getClass().getResourceAsStream("/com/joliciel/jochre/segmentation/" + imageName);
    assertNotNull(imageFileStream);
    BufferedImage image = ImageIO.read(imageFileStream);
    SourceImage sourceImage = new SourceImage(jochrePage, "", image, jochreSession);
    Segmenter segmenter = new Segmenter(sourceImage, jochreSession);
    segmenter.segment();
    List<Rectangle> textPars = new ArrayList<>();
    Rectangle textPar1 = new Rectangle(712, 532, 556, 52);
    Rectangle textPar2 = new Rectangle(324, 600, 1324, 128);
    Rectangle textPar3 = new Rectangle(680, 730, 592, 50);
    Rectangle textPar4 = new Rectangle(404, 808, 684, 48);
    // title paragraph
    textPars.add(textPar1);
    textPars.add(textPar2);
    textPars.add(textPar3);
    textPars.add(textPar4);
    int i = 0;
    int j = 0;
    List<Paragraph> textParagraphs = new ArrayList<>();
    for (Paragraph par : sourceImage.getParagraphs()) {
        Rectangle real = new Rectangle(par.getLeft(), par.getTop(), par.getRight() - par.getLeft(), par.getBottom() - par.getTop());
        Rectangle expected = textPars.get(i);
        Rectangle intersection = expected.intersection(real);
        double realArea = real.width * real.height;
        double expectedArea = expected.width * expected.height;
        double intersectionArea = intersection.width * intersection.height;
        double realRatio = intersectionArea / realArea;
        double expectedRatio = intersectionArea / expectedArea;
        LOG.debug("Paragraph " + j + ": " + par.toString());
        LOG.debug("realRatio: " + realRatio);
        LOG.debug("expectedRatio: " + expectedRatio);
        if (realRatio > 0.8 && expectedRatio > 0.8) {
            LOG.debug("Found");
            textParagraphs.add(par);
            i++;
            if (i >= textPars.size())
                break;
        }
        j++;
    }
    assertEquals(textPars.size(), textParagraphs.size());
    int[] rowCounts = new int[] { 1, 2, 1, 1 };
    // TODO: words in "spaced" rows (uses spacing to emphasize instead of bold
    // or italics) get split
    // should try to detect multiple single letter words
    int[] wordCountsFirstRow = new int[] { 0, 10, 0, 5 };
    for (i = 0; i < textParagraphs.size(); i++) {
        assertEquals("row count " + i, rowCounts[i], textParagraphs.get(i).getRows().size());
        RowOfShapes row = textParagraphs.get(i).getRows().get(0);
        if (wordCountsFirstRow[i] > 0)
            assertEquals("word count " + i, wordCountsFirstRow[i], row.getGroups().size());
    }
}
Also used : Expectations(mockit.Expectations) SourceImage(com.joliciel.jochre.graphics.SourceImage) HashMap(java.util.HashMap) Config(com.typesafe.config.Config) InputStream(java.io.InputStream) Rectangle(java.awt.Rectangle) ArrayList(java.util.ArrayList) RowOfShapes(com.joliciel.jochre.graphics.RowOfShapes) Segmenter(com.joliciel.jochre.graphics.Segmenter) BufferedImage(java.awt.image.BufferedImage) Paragraph(com.joliciel.jochre.graphics.Paragraph) JochreSession(com.joliciel.jochre.JochreSession) Test(org.junit.Test)

Aggregations

Paragraph (com.joliciel.jochre.graphics.Paragraph)17 RowOfShapes (com.joliciel.jochre.graphics.RowOfShapes)17 ArrayList (java.util.ArrayList)12 GroupOfShapes (com.joliciel.jochre.graphics.GroupOfShapes)10 Test (org.junit.Test)10 Shape (com.joliciel.jochre.graphics.Shape)9 JochreSession (com.joliciel.jochre.JochreSession)8 Config (com.typesafe.config.Config)8 Segmenter (com.joliciel.jochre.graphics.Segmenter)7 SourceImage (com.joliciel.jochre.graphics.SourceImage)7 BufferedImage (java.awt.image.BufferedImage)7 InputStream (java.io.InputStream)7 JochreImage (com.joliciel.jochre.graphics.JochreImage)6 Rectangle (java.awt.Rectangle)6 HashMap (java.util.HashMap)6 JochreDocument (com.joliciel.jochre.doc.JochreDocument)4 JochrePage (com.joliciel.jochre.doc.JochrePage)4 StringWriter (java.io.StringWriter)3 Expectations (mockit.Expectations)3 LetterSequence (com.joliciel.jochre.letterGuesser.LetterSequence)2