Search in sources :

Example 1 with TaxonomyRemoteServiceInvalidBehaviorException

use of org.ambraproject.rhino.service.taxonomy.TaxonomyRemoteServiceInvalidBehaviorException in project rhino by PLOS.

the class TaxonomyClassificationServiceImpl method getRawTerms.

/**
   * @inheritDoc
   */
@Override
public List<String> getRawTerms(Document articleXml, Article article, boolean isTextRequired) {
    RuntimeConfiguration.TaxonomyConfiguration configuration = getTaxonomyConfiguration();
    String toCategorize = getCategorizationContent(articleXml);
    ArticleIngestion latest = articleCrudService.readLatestRevision(article).getIngestion();
    String header = String.format(MESSAGE_HEADER, new SimpleDateFormat("yyyy-MM-dd").format(latest.getPublicationDate()), latest.getJournal().getTitle(), latest.getArticleType(), article.getDoi());
    String aiMessage = String.format(MESSAGE_BEGIN, configuration.getThesaurus()) + StringEscapeUtils.escapeXml10(String.format(MESSAGE_DOC_ELEMENT, header, toCategorize)) + MESSAGE_END;
    HttpPost post = new HttpPost(configuration.getServer().toString());
    post.setEntity(new StringEntity(aiMessage, APPLICATION_XML_UTF_8));
    DocumentBuilder documentBuilder = newDocumentBuilder();
    Document response;
    try (CloseableHttpResponse httpResponse = httpClient.execute(post);
        InputStream stream = httpResponse.getEntity().getContent()) {
        response = documentBuilder.parse(stream);
    } catch (IOException e) {
        throw new TaxonomyRemoteServiceNotAvailableException(e);
    } catch (SAXException e) {
        throw new TaxonomyRemoteServiceInvalidBehaviorException("Invalid XML returned from " + configuration.getServer(), e);
    }
    //parse result
    NodeList vectorElements = response.getElementsByTagName("VectorElement");
    List<String> results = new ArrayList<>(vectorElements.getLength());
    // Add the text that is sent to taxonomy server if isTextRequired is true
    if (isTextRequired) {
        toCategorize = StringEscapeUtils.unescapeXml(toCategorize);
        results.add(toCategorize);
    }
    //The first and last elements of the vector response are just MAITERMS
    for (int i = 1; i < vectorElements.getLength() - 1; i++) {
        results.add(vectorElements.item(i).getTextContent());
    }
    if ((isTextRequired && results.size() == 1) || results.isEmpty()) {
        log.error("Taxonomy server returned 0 terms. " + article.getDoi());
    }
    return results;
}
Also used : ArticleIngestion(org.ambraproject.rhino.model.ArticleIngestion) HttpPost(org.apache.http.client.methods.HttpPost) InputStream(java.io.InputStream) NodeList(org.w3c.dom.NodeList) ArrayList(java.util.ArrayList) IOException(java.io.IOException) Document(org.w3c.dom.Document) RuntimeConfiguration(org.ambraproject.rhino.config.RuntimeConfiguration) SAXException(org.xml.sax.SAXException) StringEntity(org.apache.http.entity.StringEntity) TaxonomyRemoteServiceInvalidBehaviorException(org.ambraproject.rhino.service.taxonomy.TaxonomyRemoteServiceInvalidBehaviorException) DocumentBuilder(javax.xml.parsers.DocumentBuilder) AmbraService.newDocumentBuilder(org.ambraproject.rhino.service.impl.AmbraService.newDocumentBuilder) CloseableHttpResponse(org.apache.http.client.methods.CloseableHttpResponse) SimpleDateFormat(java.text.SimpleDateFormat) TaxonomyRemoteServiceNotAvailableException(org.ambraproject.rhino.service.taxonomy.TaxonomyRemoteServiceNotAvailableException)

Example 2 with TaxonomyRemoteServiceInvalidBehaviorException

use of org.ambraproject.rhino.service.taxonomy.TaxonomyRemoteServiceInvalidBehaviorException in project rhino by PLOS.

the class TaxonomyClassificationServiceImpl method parseVectorElement.

/**
   * Parses a single line of the XML response from the taxonomy server.
   *
   * @param vectorElement The text body of a line of the response
   * @return the term and weight of the term
   */
@VisibleForTesting
static WeightedTerm parseVectorElement(String vectorElement) {
    Matcher match = TERM_PATTERN.matcher(vectorElement);
    if (match.find()) {
        String text = match.group(1);
        int value = Integer.parseInt(match.group(2));
        return new WeightedTerm(text, value);
    } else {
        //Bad term
        throw new TaxonomyRemoteServiceInvalidBehaviorException("Invalid syntax: " + vectorElement);
    }
}
Also used : WeightedTerm(org.ambraproject.rhino.service.taxonomy.WeightedTerm) TaxonomyRemoteServiceInvalidBehaviorException(org.ambraproject.rhino.service.taxonomy.TaxonomyRemoteServiceInvalidBehaviorException) Matcher(java.util.regex.Matcher) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Aggregations

TaxonomyRemoteServiceInvalidBehaviorException (org.ambraproject.rhino.service.taxonomy.TaxonomyRemoteServiceInvalidBehaviorException)2 VisibleForTesting (com.google.common.annotations.VisibleForTesting)1 IOException (java.io.IOException)1 InputStream (java.io.InputStream)1 SimpleDateFormat (java.text.SimpleDateFormat)1 ArrayList (java.util.ArrayList)1 Matcher (java.util.regex.Matcher)1 DocumentBuilder (javax.xml.parsers.DocumentBuilder)1 RuntimeConfiguration (org.ambraproject.rhino.config.RuntimeConfiguration)1 ArticleIngestion (org.ambraproject.rhino.model.ArticleIngestion)1 AmbraService.newDocumentBuilder (org.ambraproject.rhino.service.impl.AmbraService.newDocumentBuilder)1 TaxonomyRemoteServiceNotAvailableException (org.ambraproject.rhino.service.taxonomy.TaxonomyRemoteServiceNotAvailableException)1 WeightedTerm (org.ambraproject.rhino.service.taxonomy.WeightedTerm)1 CloseableHttpResponse (org.apache.http.client.methods.CloseableHttpResponse)1 HttpPost (org.apache.http.client.methods.HttpPost)1 StringEntity (org.apache.http.entity.StringEntity)1 Document (org.w3c.dom.Document)1 NodeList (org.w3c.dom.NodeList)1 SAXException (org.xml.sax.SAXException)1