Search in sources :

Example 1 with IncompatibleDocumentException

use of de.tudarmstadt.ukp.clarin.webanno.webapp.remoteapi.aero.exception.IncompatibleDocumentException in project webanno by webanno.

the class AeroRemoteApiController method createCompatibleCas.

private CAS createCompatibleCas(long aProjectId, long aDocumentId, MultipartFile aFile, Optional<String> aFormatId) throws RemoteApiException, ClassNotFoundException, IOException, UIMAException {
    Project project = getProject(aProjectId);
    SourceDocument document = getDocument(project, aDocumentId);
    // Check if the format is supported
    String format = aFormatId.orElse(FORMAT_DEFAULT);
    if (!importExportService.getReadableFormatById(format).isPresent()) {
        throw new UnsupportedFormatException("Format [%s] not supported. Acceptable formats are %s.", format, importExportService.getReadableFormats().stream().map(FormatSupport::getId).sorted().collect(Collectors.toList()));
    }
    // Convert the uploaded annotation document into a CAS
    File tmpFile = null;
    CAS annotationCas;
    try {
        tmpFile = File.createTempFile("upload", ".bin");
        aFile.transferTo(tmpFile);
        annotationCas = importExportService.importCasFromFile(tmpFile, project, format);
    } finally {
        if (tmpFile != null) {
            FileUtils.forceDelete(tmpFile);
        }
    }
    // Check if the uploaded file is compatible with the source document. They are compatible
    // if the text is the same and if all the token and sentence annotations have the same
    // offsets.
    CAS initialCas = documentService.createOrReadInitialCas(document);
    String initialText = initialCas.getDocumentText();
    String annotationText = annotationCas.getDocumentText();
    // If any of the texts contains tailing line breaks, we ignore that. We assume at the moment
    // that nobody will have created annotations over that trailing line breaks.
    initialText = StringUtils.chomp(initialText);
    annotationText = StringUtils.chomp(annotationText);
    if (ObjectUtils.notEqual(initialText, annotationText)) {
        int diffIndex = StringUtils.indexOfDifference(initialText, annotationText);
        String expected = initialText.substring(diffIndex, Math.min(initialText.length(), diffIndex + 20));
        String actual = annotationText.substring(diffIndex, Math.min(annotationText.length(), diffIndex + 20));
        throw new IncompatibleDocumentException("Text of annotation document does not match text of source document at offset " + "[%d]. Expected [%s] but found [%s].", diffIndex, expected, actual);
    }
    // Just in case we really had to chomp off a trailing line break from the annotation CAS,
    // make sure we copy over the proper text from the initial CAS
    // NOT AT HOME THIS YOU SHOULD TRY
    // SETTING THE SOFA STRING FORCEFULLY FOLLOWING THE DARK SIDE IS!
    forceOverwriteSofa(annotationCas, initialCas.getDocumentText());
    Collection<AnnotationFS> annotationSentences = selectSentences(annotationCas);
    Collection<AnnotationFS> initialSentences = selectSentences(initialCas);
    if (annotationSentences.size() != initialSentences.size()) {
        throw new IncompatibleDocumentException("Expected [%d] sentences, but annotation document contains [%d] sentences.", initialSentences.size(), annotationSentences.size());
    }
    assertCompatibleOffsets(initialSentences, annotationSentences);
    Collection<AnnotationFS> annotationTokens = selectTokens(annotationCas);
    Collection<AnnotationFS> initialTokens = selectTokens(initialCas);
    if (annotationTokens.size() != initialTokens.size()) {
        throw new IncompatibleDocumentException("Expected [%d] sentences, but annotation document contains [%d] sentences.", initialSentences.size(), annotationSentences.size());
    }
    assertCompatibleOffsets(initialTokens, annotationTokens);
    return annotationCas;
}
Also used : RProject(de.tudarmstadt.ukp.clarin.webanno.webapp.remoteapi.aero.model.RProject) Project(de.tudarmstadt.ukp.clarin.webanno.model.Project) AnnotationFS(org.apache.uima.cas.text.AnnotationFS) UnsupportedFormatException(de.tudarmstadt.ukp.clarin.webanno.webapp.remoteapi.aero.exception.UnsupportedFormatException) CAS(org.apache.uima.cas.CAS) SourceDocument(de.tudarmstadt.ukp.clarin.webanno.model.SourceDocument) File(java.io.File) ZipFile(java.util.zip.ZipFile) MultipartFile(org.springframework.web.multipart.MultipartFile) IncompatibleDocumentException(de.tudarmstadt.ukp.clarin.webanno.webapp.remoteapi.aero.exception.IncompatibleDocumentException)

Aggregations

Project (de.tudarmstadt.ukp.clarin.webanno.model.Project)1 SourceDocument (de.tudarmstadt.ukp.clarin.webanno.model.SourceDocument)1 IncompatibleDocumentException (de.tudarmstadt.ukp.clarin.webanno.webapp.remoteapi.aero.exception.IncompatibleDocumentException)1 UnsupportedFormatException (de.tudarmstadt.ukp.clarin.webanno.webapp.remoteapi.aero.exception.UnsupportedFormatException)1 RProject (de.tudarmstadt.ukp.clarin.webanno.webapp.remoteapi.aero.model.RProject)1 File (java.io.File)1 ZipFile (java.util.zip.ZipFile)1 CAS (org.apache.uima.cas.CAS)1 AnnotationFS (org.apache.uima.cas.text.AnnotationFS)1 MultipartFile (org.springframework.web.multipart.MultipartFile)1