Search in sources :

Example 11 with Entity

use of org.apache.stanbol.enhancer.engines.entitylinking.Entity in project stanbol by apache.

the class ReferencedSiteSearcher method lookup.

@Override
public Collection<? extends Entity> lookup(IRI field, Set<IRI> includeFields, List<String> search, String[] languages, Integer limit, Integer offset) throws IllegalStateException {
    //build the query and than return the result
    Site site = getSearchService();
    if (site == null) {
        throw new IllegalStateException("ReferencedSite " + siteId + " is currently not available");
    }
    queryStats.begin();
    FieldQuery query = EntitySearcherUtils.createFieldQuery(site.getQueryFactory(), field, includeFields, search, languages);
    if (limit != null && limit > 0) {
        query.setLimit(limit);
    } else if (this.limit != null) {
        query.setLimit(this.limit);
    }
    if (offset != null && offset.intValue() > 0) {
        query.setOffset(offset.intValue());
    }
    QueryResultList<Representation> results;
    try {
        results = site.find(query);
    } catch (SiteException e) {
        throw new IllegalStateException("Exception while searchign for " + search + '@' + Arrays.toString(languages) + "in the ReferencedSite " + site.getId(), e);
    }
    queryStats.complete();
    if (!results.isEmpty()) {
        Set<String> languagesSet = new HashSet<String>(Arrays.asList(languages));
        Collection<Entity> entities = new ArrayList<Entity>(results.size());
        for (Representation result : results) {
            resultStats.begin();
            entities.add(new EntityhubEntity(result, null, languagesSet));
            resultStats.complete();
        }
        return entities;
    } else {
        return Collections.emptyList();
    }
}
Also used : Site(org.apache.stanbol.entityhub.servicesapi.site.Site) FieldQuery(org.apache.stanbol.entityhub.servicesapi.query.FieldQuery) Entity(org.apache.stanbol.enhancer.engines.entitylinking.Entity) ArrayList(java.util.ArrayList) Representation(org.apache.stanbol.entityhub.servicesapi.model.Representation) SiteException(org.apache.stanbol.entityhub.servicesapi.site.SiteException) HashSet(java.util.HashSet)

Example 12 with Entity

use of org.apache.stanbol.enhancer.engines.entitylinking.Entity in project stanbol by apache.

the class TestSearcherImpl method addEntity.

public void addEntity(Entity rep) {
    entities.put(rep.getUri(), rep);
    Iterator<Literal> labels = rep.getText(nameField);
    while (labels.hasNext()) {
        Literal label = labels.next();
        for (String token : tokenizer.tokenize(label.getLexicalForm(), null)) {
            Collection<Entity> values = data.get(token);
            if (values == null) {
                values = new ArrayList<Entity>();
                data.put(label.getLexicalForm(), values);
            }
            values.add(rep);
        }
    }
}
Also used : Entity(org.apache.stanbol.enhancer.engines.entitylinking.Entity) Literal(org.apache.clerezza.commons.rdf.Literal)

Example 13 with Entity

use of org.apache.stanbol.enhancer.engines.entitylinking.Entity in project stanbol by apache.

the class InMemoryEntityIndex method join.

/**
     * Searches for Elements that do contain all the parsed Query Tokens
     * @param queryTokens the query tokens. MUST NOT be NULL, empty or contain
     * any NULL or empty string as element
     * @return matching entities or an empty Set if none.
     */
private Set<Entity> join(String... queryTokens) {
    //TODO: how to create a generic typed array
    @SuppressWarnings("unchecked") Collection<Entity>[] tokenResults = new Collection[queryTokens.length];
    for (int i = 0; i < queryTokens.length; i++) {
        Collection<Entity> tokenResult = index.get(queryTokens[i].toLowerCase(Locale.ROOT));
        if (tokenResult == null || tokenResult.isEmpty()) {
            return Collections.emptySet();
        }
        tokenResults[i] = tokenResult;
    }
    Set<Entity> join = new HashSet<Entity>(tokenResults[0]);
    if (tokenResults.length == 1) {
        return join;
    }
    //else we need to join the single results
    //we want to join the shortest results first
    Arrays.sort(tokenResults, COLLECTION_SIZE_COMPARATOR);
    for (int i = 1; i < tokenResults.length && !join.isEmpty(); i++) {
        Set<Entity> old = join;
        //new set to add all elements
        join = new HashSet<Entity>();
        for (Iterator<Entity> it = tokenResults[i].iterator(); it.hasNext() && !old.isEmpty(); ) {
            Entity e = it.next();
            if (old.remove(e)) {
                join.add(e);
            }
        }
    }
    return join;
}
Also used : Entity(org.apache.stanbol.enhancer.engines.entitylinking.Entity) Collection(java.util.Collection) HashSet(java.util.HashSet)

Aggregations

Entity (org.apache.stanbol.enhancer.engines.entitylinking.Entity)13 ArrayList (java.util.ArrayList)6 HashSet (java.util.HashSet)6 Literal (org.apache.clerezza.commons.rdf.Literal)5 Graph (org.apache.clerezza.commons.rdf.Graph)4 IRI (org.apache.clerezza.commons.rdf.IRI)4 TripleImpl (org.apache.clerezza.commons.rdf.impl.utils.TripleImpl)4 Language (org.apache.clerezza.commons.rdf.Language)3 Triple (org.apache.clerezza.commons.rdf.Triple)3 PlainLiteralImpl (org.apache.clerezza.commons.rdf.impl.utils.PlainLiteralImpl)3 LinkedEntity (org.apache.stanbol.enhancer.engines.entitylinking.impl.LinkedEntity)3 Collection (java.util.Collection)2 HashMap (java.util.HashMap)2 RDFTerm (org.apache.clerezza.commons.rdf.RDFTerm)2 Occurrence (org.apache.stanbol.enhancer.engines.entitylinking.impl.LinkedEntity.Occurrence)2 Suggestion (org.apache.stanbol.enhancer.engines.entitylinking.impl.Suggestion)2 NlpEngineHelper.getLanguage (org.apache.stanbol.enhancer.nlp.utils.NlpEngineHelper.getLanguage)2 Representation (org.apache.stanbol.entityhub.servicesapi.model.Representation)2 FieldQuery (org.apache.stanbol.entityhub.servicesapi.query.FieldQuery)2 Entry (java.util.Map.Entry)1