Search in sources :

Example 31 with AnalysedText

use of org.apache.stanbol.enhancer.nlp.model.AnalysedText in project stanbol by apache.

the class NlpEngineHelper method initAnalysedText.

/**
     * Retrieves - or if not present - creates the {@link AnalysedText} content
     * part for the parsed {@link ContentItem}. If the {@link Blob} with the
     * mime type '<code>text/plain</code>' is present this method
     * throws an {@link IllegalStateException} (this method internally uses
     * {@link #getPlainText(EnhancementEngine, ContentItem, boolean)} with
     * <code>true</code> as third parameters. Users of this method should call
     * this method with <code>false</code> as third parameter in their 
     * {@link EnhancementEngine#canEnhance(ContentItem)} implementation.<p>
     * <i>NOTE:</i> This method is intended for Engines that want to create an
     * empty {@link AnalysedText} content part. Engines that assume that this
     * content part is already present (e.g. if the consume already existing
     * annotations) should use the 
     * {@link #getAnalysedText(EnhancementEngine, ContentItem, boolean)}
     * method instead.
     * @param engine the EnhancementEngine calling this method (used for logging)
     * @param analysedTextFactory the {@link AnalysedTextFactory} used to create
     * the {@link AnalysedText} instance (if not present).
     * @param ci the {@link ContentItem}
     * @return the AnalysedText
     * @throws EngineException on any exception while accessing the 
     * '<code>text/plain</code>' Blob
     * @throws IllegalStateException if no '<code>text/plain</code>' Blob is
     * present as content part of the parsed {@link ContentItem} or the parsed
     * {@link AnalysedTextFactory} is <code>null</code>. <i>NOTE</i> that 
     * {@link IllegalStateException} are only thrown if the {@link AnalysedText}
     * ContentPart is not yet present in the parsed {@link ContentItem}
     */
public static AnalysedText initAnalysedText(EnhancementEngine engine, AnalysedTextFactory analysedTextFactory, ContentItem ci) throws EngineException {
    AnalysedText at = AnalysedTextUtils.getAnalysedText(ci);
    if (at == null) {
        if (analysedTextFactory == null) {
            throw new IllegalStateException("Unable to initialise AnalysedText" + "ContentPart because the parsed AnalysedTextFactory is NULL");
        }
        Entry<IRI, Blob> textBlob = getPlainText(engine, ci, true);
        //we need to create
        ci.getLock().writeLock().lock();
        try {
            //try again to retrieve (maybe an concurrent thread has created
            //the content part in the meantime
            at = AnalysedTextUtils.getAnalysedText(ci);
            if (at == null) {
                log.debug(" ... create new AnalysedText instance for Engine {}", engine.getName());
                at = analysedTextFactory.createAnalysedText(ci, textBlob.getValue());
            }
        } catch (IOException e) {
            throw new EngineException("Unable to create AnalysetText instance for Blob " + textBlob.getKey() + " of ContentItem " + ci.getUri() + "!", e);
        } finally {
            ci.getLock().writeLock().unlock();
        }
    } else {
        log.debug(" ... use existing AnalysedText instance for Engine {}", engine.getName());
    }
    return at;
}
Also used : AnalysedText(org.apache.stanbol.enhancer.nlp.model.AnalysedText) IRI(org.apache.clerezza.commons.rdf.IRI) Blob(org.apache.stanbol.enhancer.servicesapi.Blob) EngineException(org.apache.stanbol.enhancer.servicesapi.EngineException) IOException(java.io.IOException)

Example 32 with AnalysedText

use of org.apache.stanbol.enhancer.nlp.model.AnalysedText in project stanbol by apache.

the class CorefFeatureSupportTest method testSerializationAndParse.

@Test
public void testSerializationAndParse() throws IOException {
    String serialized = getSerializedString();
    Assert.assertTrue(serialized.contains(jsonCorefCheckObama));
    Assert.assertTrue(serialized.contains(jsonCorefCheckHe));
    AnalysedText parsedAt = getParsedAnalysedText(serialized);
    assertAnalysedTextEquality(parsedAt);
}
Also used : AnalysedText(org.apache.stanbol.enhancer.nlp.model.AnalysedText) Test(org.junit.Test)

Aggregations

AnalysedText (org.apache.stanbol.enhancer.nlp.model.AnalysedText)32 EngineException (org.apache.stanbol.enhancer.servicesapi.EngineException)15 Token (org.apache.stanbol.enhancer.nlp.model.Token)13 NlpEngineHelper.getAnalysedText (org.apache.stanbol.enhancer.nlp.utils.NlpEngineHelper.getAnalysedText)13 IOException (java.io.IOException)9 IRI (org.apache.clerezza.commons.rdf.IRI)9 TripleImpl (org.apache.clerezza.commons.rdf.impl.utils.TripleImpl)8 Sentence (org.apache.stanbol.enhancer.nlp.model.Sentence)8 PosTag (org.apache.stanbol.enhancer.nlp.pos.PosTag)8 Test (org.junit.Test)7 Graph (org.apache.clerezza.commons.rdf.Graph)6 NlpEngineHelper.initAnalysedText (org.apache.stanbol.enhancer.nlp.utils.NlpEngineHelper.initAnalysedText)6 Language (org.apache.clerezza.commons.rdf.Language)5 PlainLiteralImpl (org.apache.clerezza.commons.rdf.impl.utils.PlainLiteralImpl)5 Section (org.apache.stanbol.enhancer.nlp.model.Section)5 Span (org.apache.stanbol.enhancer.nlp.model.Span)5 NerTag (org.apache.stanbol.enhancer.nlp.ner.NerTag)5 ArrayList (java.util.ArrayList)4 Chunk (org.apache.stanbol.enhancer.nlp.model.Chunk)4 Value (org.apache.stanbol.enhancer.nlp.model.annotation.Value)4