Search in sources :

Example 1 with EsAccessException

use of org.codelibs.fess.crawler.exception.EsAccessException in project fess-crawler by codelibs.

the class AbstractCrawlerService method doInsertAll.

protected <T> BulkResponse doInsertAll(final List<T> list, final OpType opType) {
    try {
        return getClient().get(c -> {
            final BulkRequestBuilder bulkRequest = c.prepareBulk();
            for (final T target : list) {
                final String id = getId(getSessionId(target), getUrl(target));
                final XContentBuilder source = getXContentBuilder(target);
                bulkRequest.add(c.prepareIndex(index, type, id).setSource(source).setOpType(opType));
                setId(target, id);
            }
            return bulkRequest.setRefreshPolicy(RefreshPolicy.IMMEDIATE).execute();
        });
    } catch (final Exception e) {
        throw new EsAccessException("Failed to insert " + list, e);
    }
}
Also used : EsAccessException(org.codelibs.fess.crawler.exception.EsAccessException) BulkRequestBuilder(org.elasticsearch.action.bulk.BulkRequestBuilder) XContentBuilder(org.elasticsearch.common.xcontent.XContentBuilder) IndexNotFoundException(org.elasticsearch.index.IndexNotFoundException) ParseException(java.text.ParseException) EsAccessException(org.codelibs.fess.crawler.exception.EsAccessException) IOException(java.io.IOException)

Example 2 with EsAccessException

use of org.codelibs.fess.crawler.exception.EsAccessException in project fess-crawler by codelibs.

the class AbstractCrawlerService method insert.

protected IndexResponse insert(final Object target, final OpType opType) {
    final String id = getId(getSessionId(target), getUrl(target));
    final XContentBuilder source = getXContentBuilder(target);
    try {
        final IndexResponse response = getClient().get(c -> c.prepareIndex(index, type, id).setSource(source).setOpType(opType).setRefreshPolicy(RefreshPolicy.IMMEDIATE).execute());
        setId(target, id);
        return response;
    } catch (final Exception e) {
        throw new EsAccessException("Failed to insert " + id, e);
    }
}
Also used : CreateIndexResponse(org.elasticsearch.action.admin.indices.create.CreateIndexResponse) IndexResponse(org.elasticsearch.action.index.IndexResponse) EsAccessException(org.codelibs.fess.crawler.exception.EsAccessException) XContentBuilder(org.elasticsearch.common.xcontent.XContentBuilder) IndexNotFoundException(org.elasticsearch.index.IndexNotFoundException) ParseException(java.text.ParseException) EsAccessException(org.codelibs.fess.crawler.exception.EsAccessException) IOException(java.io.IOException)

Example 3 with EsAccessException

use of org.codelibs.fess.crawler.exception.EsAccessException in project fess-crawler by codelibs.

the class EsUrlQueueService method updateSessionId.

@Override
public void updateSessionId(final String oldSessionId, final String newSessionId) {
    SearchResponse response = null;
    while (true) {
        if (response == null) {
            response = getClient().get(c -> c.prepareSearch(index).setTypes(type).setScroll(new TimeValue(scrollTimeout)).setQuery(QueryBuilders.boolQuery().filter(QueryBuilders.termQuery(SESSION_ID, oldSessionId))).setSize(scrollSize).execute());
        } else {
            final String scrollId = response.getScrollId();
            response = getClient().get(c -> c.prepareSearchScroll(scrollId).setScroll(new TimeValue(scrollTimeout)).execute());
        }
        final SearchHits searchHits = response.getHits();
        if (searchHits.getHits().length == 0) {
            break;
        }
        final BulkResponse bulkResponse = getClient().get(c -> {
            final BulkRequestBuilder builder = c.prepareBulk();
            for (final SearchHit searchHit : searchHits) {
                final UpdateRequestBuilder updateRequest = c.prepareUpdate(index, type, searchHit.getId()).setDoc(SESSION_ID, newSessionId);
                builder.add(updateRequest);
            }
            return builder.execute();
        });
        if (bulkResponse.hasFailures()) {
            throw new EsAccessException(bulkResponse.buildFailureMessage());
        }
    }
}
Also used : SortBuilders(org.elasticsearch.search.sort.SortBuilders) SearchHits(org.elasticsearch.search.SearchHits) LoggerFactory(org.slf4j.LoggerFactory) UpdateRequestBuilder(org.elasticsearch.action.update.UpdateRequestBuilder) QueryBuilders(org.elasticsearch.index.query.QueryBuilders) ArrayList(java.util.ArrayList) PreDestroy(javax.annotation.PreDestroy) OpType(org.elasticsearch.action.DocWriteRequest.OpType) EsUrlQueue(org.codelibs.fess.crawler.entity.EsUrlQueue) Map(java.util.Map) TimeValue(org.elasticsearch.common.unit.TimeValue) SearchResponse(org.elasticsearch.action.search.SearchResponse) RefreshPolicy(org.elasticsearch.action.support.WriteRequest.RefreshPolicy) SearchHit(org.elasticsearch.search.SearchHit) Logger(org.slf4j.Logger) EsAccessException(org.codelibs.fess.crawler.exception.EsAccessException) ConcurrentHashMap(java.util.concurrent.ConcurrentHashMap) Resource(javax.annotation.Resource) StringUtil(org.codelibs.core.lang.StringUtil) BulkResponse(org.elasticsearch.action.bulk.BulkResponse) Collectors(java.util.stream.Collectors) Constants(org.codelibs.fess.crawler.Constants) UrlQueueService(org.codelibs.fess.crawler.service.UrlQueueService) List(java.util.List) PostConstruct(javax.annotation.PostConstruct) SortOrder(org.elasticsearch.search.sort.SortOrder) AccessResult(org.codelibs.fess.crawler.entity.AccessResult) Queue(java.util.Queue) UrlQueue(org.codelibs.fess.crawler.entity.UrlQueue) ConcurrentLinkedQueue(java.util.concurrent.ConcurrentLinkedQueue) BulkRequestBuilder(org.elasticsearch.action.bulk.BulkRequestBuilder) SearchHit(org.elasticsearch.search.SearchHit) EsAccessException(org.codelibs.fess.crawler.exception.EsAccessException) UpdateRequestBuilder(org.elasticsearch.action.update.UpdateRequestBuilder) BulkResponse(org.elasticsearch.action.bulk.BulkResponse) SearchHits(org.elasticsearch.search.SearchHits) BulkRequestBuilder(org.elasticsearch.action.bulk.BulkRequestBuilder) TimeValue(org.elasticsearch.common.unit.TimeValue) SearchResponse(org.elasticsearch.action.search.SearchResponse)

Example 4 with EsAccessException

use of org.codelibs.fess.crawler.exception.EsAccessException in project fess-crawler by codelibs.

the class EsUrlQueueService method poll.

@Override
public EsUrlQueue poll(final String sessionId) {
    final QueueHolder queueHolder = getQueueHolder(sessionId);
    final Queue<EsUrlQueue> waitingQueue = queueHolder.waitingQueue;
    final Queue<EsUrlQueue> crawlingQueue = queueHolder.crawlingQueue;
    EsUrlQueue urlQueue = waitingQueue.poll();
    if (urlQueue != null) {
        if (crawlingQueue.size() > maxCrawlingQueueSize) {
            crawlingQueue.poll();
        }
        crawlingQueue.add(urlQueue);
        return urlQueue;
    }
    synchronized (queueHolder) {
        urlQueue = waitingQueue.poll();
        if (urlQueue == null) {
            final List<EsUrlQueue> urlQueueList = getList(EsUrlQueue.class, sessionId, null, 0, pollingFetchSize, SortBuilders.fieldSort(CREATE_TIME).order(SortOrder.ASC));
            if (urlQueueList.isEmpty()) {
                return null;
            }
            if (logger.isDebugEnabled()) {
                logger.debug("Queued URL: {}", urlQueueList);
            }
            waitingQueue.addAll(urlQueueList);
            if (!urlQueueList.isEmpty()) {
                try {
                    // delete from es
                    final BulkResponse response = getClient().get(c -> {
                        final BulkRequestBuilder bulkBuilder = c.prepareBulk();
                        for (final EsUrlQueue uq : urlQueueList) {
                            bulkBuilder.add(c.prepareDelete(index, type, uq.getId()));
                        }
                        return bulkBuilder.setRefreshPolicy(RefreshPolicy.IMMEDIATE).execute();
                    });
                    if (response.hasFailures()) {
                        logger.warn(response.buildFailureMessage());
                    }
                } catch (final Exception e) {
                    throw new EsAccessException("Failed to delete " + urlQueueList, e);
                }
            }
            urlQueue = waitingQueue.poll();
            if (urlQueue == null) {
                return null;
            }
        }
    }
    if (crawlingQueue.size() > maxCrawlingQueueSize) {
        crawlingQueue.poll();
    }
    crawlingQueue.add(urlQueue);
    return urlQueue;
}
Also used : EsAccessException(org.codelibs.fess.crawler.exception.EsAccessException) BulkResponse(org.elasticsearch.action.bulk.BulkResponse) EsUrlQueue(org.codelibs.fess.crawler.entity.EsUrlQueue) BulkRequestBuilder(org.elasticsearch.action.bulk.BulkRequestBuilder) EsAccessException(org.codelibs.fess.crawler.exception.EsAccessException)

Example 5 with EsAccessException

use of org.codelibs.fess.crawler.exception.EsAccessException in project fess-crawler by codelibs.

the class AbstractCrawlerService method delete.

public void delete(final Consumer<SearchRequestBuilder> callback) {
    SearchResponse response = null;
    while (true) {
        if (response == null) {
            response = getClient().get(c -> {
                final SearchRequestBuilder builder = c.prepareSearch(index).setTypes(type).setScroll(new TimeValue(scrollTimeout)).setSize(scrollSize);
                callback.accept(builder);
                return builder.execute();
            });
        } else {
            final String scrollId = response.getScrollId();
            response = getClient().get(c -> c.prepareSearchScroll(scrollId).setScroll(new TimeValue(scrollTimeout)).execute());
        }
        final SearchHits searchHits = response.getHits();
        if (searchHits.getHits().length == 0) {
            break;
        }
        final BulkResponse bulkResponse = getClient().get(c -> {
            final BulkRequestBuilder bulkBuilder = c.prepareBulk();
            for (final SearchHit searchHit : searchHits) {
                bulkBuilder.add(c.prepareDelete(index, type, searchHit.getId()));
            }
            return bulkBuilder.execute();
        });
        if (bulkResponse.hasFailures()) {
            throw new EsAccessException(bulkResponse.buildFailureMessage());
        }
    }
    refresh();
}
Also used : EsClient(org.codelibs.fess.crawler.client.EsClient) GetResponse(org.elasticsearch.action.get.GetResponse) SearchHits(org.elasticsearch.search.SearchHits) Date(java.util.Date) MappingMetaData(org.elasticsearch.cluster.metadata.MappingMetaData) LoggerFactory(org.slf4j.LoggerFactory) XContentBuilder(org.elasticsearch.common.xcontent.XContentBuilder) QueryBuilders(org.elasticsearch.index.query.QueryBuilders) Converter(org.codelibs.core.beans.Converter) IndexNotFoundException(org.elasticsearch.index.IndexNotFoundException) Map(java.util.Map) SearchResponse(org.elasticsearch.action.search.SearchResponse) XContentFactory.jsonBuilder(org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder) DeleteResponse(org.elasticsearch.action.delete.DeleteResponse) ParseException(java.text.ParseException) RefreshPolicy(org.elasticsearch.action.support.WriteRequest.RefreshPolicy) SearchHit(org.elasticsearch.search.SearchHit) EsAccessException(org.codelibs.fess.crawler.exception.EsAccessException) TimeZone(java.util.TimeZone) BulkItemResponse(org.elasticsearch.action.bulk.BulkItemResponse) Timestamp(java.sql.Timestamp) Resource(javax.annotation.Resource) BulkResponse(org.elasticsearch.action.bulk.BulkResponse) CreateIndexResponse(org.elasticsearch.action.admin.indices.create.CreateIndexResponse) Base64(java.util.Base64) List(java.util.List) EsAccessResultData(org.codelibs.fess.crawler.entity.EsAccessResultData) FileUtil(org.codelibs.core.io.FileUtil) MessageDigestUtil(org.codelibs.core.security.MessageDigestUtil) BoolQueryBuilder(org.elasticsearch.index.query.BoolQueryBuilder) HashFunction(com.google.common.hash.HashFunction) BulkRequestBuilder(org.elasticsearch.action.bulk.BulkRequestBuilder) PropertyDesc(org.codelibs.core.beans.PropertyDesc) BeanUtil(org.codelibs.core.beans.util.BeanUtil) ImmutableOpenMap(org.elasticsearch.common.collect.ImmutableOpenMap) XContentType(org.elasticsearch.common.xcontent.XContentType) GetMappingsResponse(org.elasticsearch.action.admin.indices.mapping.get.GetMappingsResponse) SimpleDateFormat(java.text.SimpleDateFormat) PutMappingResponse(org.elasticsearch.action.admin.indices.mapping.put.PutMappingResponse) Hashing(com.google.common.hash.Hashing) ArrayList(java.util.ArrayList) OpType(org.elasticsearch.action.DocWriteRequest.OpType) Charset(java.nio.charset.Charset) EsAccessResult(org.codelibs.fess.crawler.entity.EsAccessResult) TimeValue(org.elasticsearch.common.unit.TimeValue) IndexResponse(org.elasticsearch.action.index.IndexResponse) Result(org.elasticsearch.action.DocWriteResponse.Result) SortBuilder(org.elasticsearch.search.sort.SortBuilder) QueryBuilder(org.elasticsearch.index.query.QueryBuilder) Logger(org.slf4j.Logger) StringUtil(org.codelibs.core.lang.StringUtil) IOException(java.io.IOException) BeanDescFactory(org.codelibs.core.beans.factory.BeanDescFactory) Consumer(java.util.function.Consumer) IndicesExistsResponse(org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse) BeanDesc(org.codelibs.core.beans.BeanDesc) SearchRequestBuilder(org.elasticsearch.action.search.SearchRequestBuilder) RefreshResponse(org.elasticsearch.action.admin.indices.refresh.RefreshResponse) EsResultList(org.codelibs.fess.crawler.util.EsResultList) SearchRequestBuilder(org.elasticsearch.action.search.SearchRequestBuilder) SearchHit(org.elasticsearch.search.SearchHit) EsAccessException(org.codelibs.fess.crawler.exception.EsAccessException) BulkResponse(org.elasticsearch.action.bulk.BulkResponse) SearchHits(org.elasticsearch.search.SearchHits) BulkRequestBuilder(org.elasticsearch.action.bulk.BulkRequestBuilder) TimeValue(org.elasticsearch.common.unit.TimeValue) SearchResponse(org.elasticsearch.action.search.SearchResponse)

Aggregations

EsAccessException (org.codelibs.fess.crawler.exception.EsAccessException)9 BulkRequestBuilder (org.elasticsearch.action.bulk.BulkRequestBuilder)5 BulkResponse (org.elasticsearch.action.bulk.BulkResponse)5 SearchResponse (org.elasticsearch.action.search.SearchResponse)5 IndexNotFoundException (org.elasticsearch.index.IndexNotFoundException)5 SearchHit (org.elasticsearch.search.SearchHit)5 IOException (java.io.IOException)4 ParseException (java.text.ParseException)4 ArrayList (java.util.ArrayList)4 Map (java.util.Map)4 SearchRequestBuilder (org.elasticsearch.action.search.SearchRequestBuilder)4 SearchHits (org.elasticsearch.search.SearchHits)4 List (java.util.List)3 StringUtil (org.codelibs.core.lang.StringUtil)3 EsAccessResult (org.codelibs.fess.crawler.entity.EsAccessResult)3 EsResultList (org.codelibs.fess.crawler.util.EsResultList)3 IndexResponse (org.elasticsearch.action.index.IndexResponse)3 TimeValue (org.elasticsearch.common.unit.TimeValue)3 Logger (org.slf4j.Logger)3 LoggerFactory (org.slf4j.LoggerFactory)3