Search in sources :

Example 1 with IMatchable

use of eu.etaxonomy.cdm.strategy.match.IMatchable in project cdmlib by cybertaxonomy.

the class ImportDeduplicationHelper method getMatchingEntity.

private <S extends IMatchable> Optional<S> getMatchingEntity(S entityOrig, DedupInfo<S> dedupInfo, boolean parsed) {
    S entity = CdmBase.deproxy(entityOrig);
    // choose matcher depending on the type of matching required. If matching of a parsed entity is required
    // try to use the parsed matcher (if it exists)
    IMatchStrategy matcher = parsed && dedupInfo.parsedMatcher != null ? dedupInfo.parsedMatcher : dedupInfo.defaultMatcher;
    Predicate<S> matchFilter = persistedEntity -> {
        try {
            return matcher.invoke((IMatchable) entity, (IMatchable) persistedEntity).isSuccessful();
        } catch (MatchException e) {
            throw new RuntimeException(e);
        }
    };
    // TODO casting
    Optional<S> result = Optional.ofNullable(getEntityByTitle(((IdentifiableEntity<?>) entity).getTitleCache(), dedupInfo)).orElse(new HashSet<>()).stream().filter(matchFilter).findAny();
    if (result.isPresent() || dedupInfo.status == Status.USE_MAP || repository == null) {
        return result;
    } else {
        try {
            return (Optional) repository.getCommonService().findMatching((IMatchable) entity, matcher).stream().findFirst();
        } catch (MatchException e) {
            throw new RuntimeException(e);
        }
    }
}
Also used : Arrays(java.util.Arrays) Institution(eu.etaxonomy.cdm.model.agent.Institution) DefaultMatchStrategy(eu.etaxonomy.cdm.strategy.match.DefaultMatchStrategy) IMatchStrategyEqual(eu.etaxonomy.cdm.strategy.match.IMatchStrategyEqual) Team(eu.etaxonomy.cdm.model.agent.Team) Reference(eu.etaxonomy.cdm.model.reference.Reference) HashMap(java.util.HashMap) ICdmBase(eu.etaxonomy.cdm.model.common.ICdmBase) HashSet(java.util.HashSet) Logger(org.apache.log4j.Logger) MatchStrategyFactory(eu.etaxonomy.cdm.strategy.match.MatchStrategyFactory) RightsType(eu.etaxonomy.cdm.model.media.RightsType) INonViralName(eu.etaxonomy.cdm.model.name.INonViralName) HybridRelationship(eu.etaxonomy.cdm.model.name.HybridRelationship) Map(java.util.Map) ImportStateBase(eu.etaxonomy.cdm.io.common.ImportStateBase) CdmBase(eu.etaxonomy.cdm.model.common.CdmBase) Predicate(java.util.function.Predicate) Rights(eu.etaxonomy.cdm.model.media.Rights) CdmUtils(eu.etaxonomy.cdm.common.CdmUtils) Set(java.util.Set) UUID(java.util.UUID) Collection(eu.etaxonomy.cdm.model.occurrence.Collection) ImportResult(eu.etaxonomy.cdm.io.common.ImportResult) TeamOrPersonBase(eu.etaxonomy.cdm.model.agent.TeamOrPersonBase) MatchException(eu.etaxonomy.cdm.strategy.match.MatchException) ICdmRepository(eu.etaxonomy.cdm.api.application.ICdmRepository) IService(eu.etaxonomy.cdm.api.service.IService) List(java.util.List) AgentBase(eu.etaxonomy.cdm.model.agent.AgentBase) IMatchable(eu.etaxonomy.cdm.strategy.match.IMatchable) IMatchStrategy(eu.etaxonomy.cdm.strategy.match.IMatchStrategy) Optional(java.util.Optional) IdentifiableEntity(eu.etaxonomy.cdm.model.common.IdentifiableEntity) TaxonName(eu.etaxonomy.cdm.model.name.TaxonName) MatchMode(eu.etaxonomy.cdm.strategy.match.MatchMode) Person(eu.etaxonomy.cdm.model.agent.Person) IdentifiableEntity(eu.etaxonomy.cdm.model.common.IdentifiableEntity) Optional(java.util.Optional) IMatchable(eu.etaxonomy.cdm.strategy.match.IMatchable) MatchException(eu.etaxonomy.cdm.strategy.match.MatchException) IMatchStrategy(eu.etaxonomy.cdm.strategy.match.IMatchStrategy) HashSet(java.util.HashSet)

Example 2 with IMatchable

use of eu.etaxonomy.cdm.strategy.match.IMatchable in project cdmlib by cybertaxonomy.

the class IdentifiableServiceBase method deduplicate.

@Override
@Transactional(readOnly = false)
public int deduplicate(Class<? extends T> clazz, IMatchStrategyEqual matchStrategy, IMergeStrategy mergeStrategy) {
    DeduplicateState dedupState = new DeduplicateState();
    if (clazz == null) {
        logger.warn("Deduplication clazz must not be null!");
        return 0;
    }
    if (!(IMatchable.class.isAssignableFrom(clazz) && IMergable.class.isAssignableFrom(clazz))) {
        logger.warn("Deduplication implemented only for classes implementing IMatchable and IMergeable. No deduplication performed!");
        return 0;
    }
    Class matchableClass = clazz;
    if (matchStrategy == null) {
        matchStrategy = DefaultMatchStrategy.NewInstance(matchableClass);
    }
    List<T> nextGroup = new ArrayList<>();
    int result = 0;
    // double countTotal = count(clazz);
    // 
    // Number countPagesN = Math.ceil(countTotal/dedupState.pageSize.doubleValue()) ;
    // int countPages = countPagesN.intValue();
    // 
    List<OrderHint> orderHints = Arrays.asList(new OrderHint[] { new OrderHint("titleCache", SortOrder.ASCENDING) });
    while (!dedupState.isCompleted) {
        // get x page sizes
        List<? extends T> objectList = getPages(clazz, dedupState, orderHints);
        // after each page check if any changes took place
        int nUnEqualPages = handleAllPages(objectList, dedupState, nextGroup, matchStrategy, mergeStrategy);
        nUnEqualPages = nUnEqualPages + dedupState.pageSize * dedupState.startPage;
        // refresh start page counter
        int finishedPages = nUnEqualPages / dedupState.pageSize;
        dedupState.startPage = finishedPages;
    }
    result += handleLastGroup(nextGroup, matchStrategy, mergeStrategy);
    return result;
}
Also used : OrderHint(eu.etaxonomy.cdm.persistence.query.OrderHint) IMatchable(eu.etaxonomy.cdm.strategy.match.IMatchable) IMergable(eu.etaxonomy.cdm.strategy.merge.IMergable) ArrayList(java.util.ArrayList) OrderHint(eu.etaxonomy.cdm.persistence.query.OrderHint) Transactional(org.springframework.transaction.annotation.Transactional)

Example 3 with IMatchable

use of eu.etaxonomy.cdm.strategy.match.IMatchable in project cdmlib by cybertaxonomy.

the class IdentifiableServiceBase method handleLastGroup.

private int handleLastGroup(List<T> group, IMatchStrategyEqual matchStrategy, IMergeStrategy mergeStrategy) {
    int result = 0;
    int size = group.size();
    // set to collect all objects, that have been merged already
    Set<Integer> exclude = new HashSet<>();
    for (int i = 0; i < size - 1; i++) {
        if (exclude.contains(i)) {
            continue;
        }
        for (int j = i + 1; j < size; j++) {
            if (exclude.contains(j)) {
                continue;
            }
            T firstObject = group.get(i);
            T secondObject = group.get(j);
            try {
                if (matchStrategy.invoke((IMatchable) firstObject, (IMatchable) secondObject).isSuccessful()) {
                    commonService.merge((IMergable) firstObject, (IMergable) secondObject, mergeStrategy);
                    exclude.add(j);
                    result++;
                }
            } catch (MatchException e) {
                logger.warn("MatchException when trying to match " + firstObject.getTitleCache());
                e.printStackTrace();
            } catch (MergeException e) {
                logger.warn("MergeException when trying to merge " + firstObject.getTitleCache());
                e.printStackTrace();
            }
        }
    }
    return result;
}
Also used : IMatchable(eu.etaxonomy.cdm.strategy.match.IMatchable) MergeException(eu.etaxonomy.cdm.strategy.merge.MergeException) MatchException(eu.etaxonomy.cdm.strategy.match.MatchException) OrderHint(eu.etaxonomy.cdm.persistence.query.OrderHint) HashSet(java.util.HashSet)

Aggregations

IMatchable (eu.etaxonomy.cdm.strategy.match.IMatchable)3 OrderHint (eu.etaxonomy.cdm.persistence.query.OrderHint)2 MatchException (eu.etaxonomy.cdm.strategy.match.MatchException)2 HashSet (java.util.HashSet)2 ICdmRepository (eu.etaxonomy.cdm.api.application.ICdmRepository)1 IService (eu.etaxonomy.cdm.api.service.IService)1 CdmUtils (eu.etaxonomy.cdm.common.CdmUtils)1 ImportResult (eu.etaxonomy.cdm.io.common.ImportResult)1 ImportStateBase (eu.etaxonomy.cdm.io.common.ImportStateBase)1 AgentBase (eu.etaxonomy.cdm.model.agent.AgentBase)1 Institution (eu.etaxonomy.cdm.model.agent.Institution)1 Person (eu.etaxonomy.cdm.model.agent.Person)1 Team (eu.etaxonomy.cdm.model.agent.Team)1 TeamOrPersonBase (eu.etaxonomy.cdm.model.agent.TeamOrPersonBase)1 CdmBase (eu.etaxonomy.cdm.model.common.CdmBase)1 ICdmBase (eu.etaxonomy.cdm.model.common.ICdmBase)1 IdentifiableEntity (eu.etaxonomy.cdm.model.common.IdentifiableEntity)1 Rights (eu.etaxonomy.cdm.model.media.Rights)1 RightsType (eu.etaxonomy.cdm.model.media.RightsType)1 HybridRelationship (eu.etaxonomy.cdm.model.name.HybridRelationship)1