use of org.opensearch.search.builder.SearchSourceBuilder in project anomaly-detection by opensearch-project.
the class ADDataMigrator method migrateDetectorInternalStateToRealtimeTask.
/**
* Migrate detector internal state to realtime task.
*/
public void migrateDetectorInternalStateToRealtimeTask() {
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder().query(new MatchAllQueryBuilder()).size(MAX_DETECTOR_UPPER_LIMIT);
SearchRequest searchRequest = new SearchRequest(ANOMALY_DETECTOR_JOB_INDEX).source(searchSourceBuilder);
client.search(searchRequest, ActionListener.wrap(r -> {
if (r == null || r.getHits().getTotalHits() == null || r.getHits().getTotalHits().value == 0) {
logger.info("No anomaly detector job found, no need to migrate");
return;
}
ConcurrentLinkedQueue<AnomalyDetectorJob> detectorJobs = new ConcurrentLinkedQueue<>();
Iterator<SearchHit> iterator = r.getHits().iterator();
while (iterator.hasNext()) {
SearchHit searchHit = iterator.next();
try (XContentParser parser = createXContentParserFromRegistry(xContentRegistry, searchHit.getSourceRef())) {
ensureExpectedToken(XContentParser.Token.START_OBJECT, parser.nextToken(), parser);
AnomalyDetectorJob job = AnomalyDetectorJob.parse(parser);
detectorJobs.add(job);
} catch (IOException e) {
logger.error("Fail to parse AD job " + searchHit.getId(), e);
}
}
logger.info("Total AD jobs to backfill realtime task: {}", detectorJobs.size());
backfillRealtimeTask(detectorJobs, true);
}, e -> {
if (ExceptionUtil.getErrorMessage(e).contains("all shards failed")) {
// This error may happen when AD job index not ready for query as some nodes not in cluster yet.
// Will recreate realtime task when AD job starts.
logger.warn("No available shards of AD job index, reset dataMigrated as false");
this.dataMigrated.set(false);
} else if (!(e instanceof IndexNotFoundException)) {
logger.error("Failed to migrate AD data", e);
}
}));
}
use of org.opensearch.search.builder.SearchSourceBuilder in project anomaly-detection by opensearch-project.
the class SearchFeatureDao method getFeaturesForPeriodByBatch.
public void getFeaturesForPeriodByBatch(AnomalyDetector detector, Entity entity, long startTime, long endTime, ActionListener<Map<Long, Optional<double[]>>> listener) throws IOException {
SearchSourceBuilder searchSourceBuilder = batchFeatureQuery(detector, entity, startTime, endTime, xContent);
logger.debug("Batch query for detector {}: {} ", detector.getDetectorId(), searchSourceBuilder);
SearchRequest searchRequest = new SearchRequest(detector.getIndices().toArray(new String[0])).source(searchSourceBuilder);
client.search(searchRequest, ActionListener.wrap(response -> {
listener.onResponse(parseBucketAggregationResponse(response, detector.getEnabledFeatureIds()));
}, listener::onFailure));
}
use of org.opensearch.search.builder.SearchSourceBuilder in project anomaly-detection by opensearch-project.
the class SearchFeatureDao method getHighestCountEntities.
/**
* Get list of entities with high count in descending order within specified time range
* @param detector detector config
* @param startTime start time of time range
* @param endTime end time of time range
* @param maxEntitiesSize max top entities
* @param minimumDocCount minimum doc count for top entities
* @param pageSize page size when query multi-category HC detector's top entities
* @param listener listener to return back the entities
*/
public void getHighestCountEntities(AnomalyDetector detector, long startTime, long endTime, int maxEntitiesSize, int minimumDocCount, int pageSize, ActionListener<List<Entity>> listener) {
if (!detector.isMultientityDetector()) {
listener.onResponse(null);
return;
}
RangeQueryBuilder rangeQuery = new RangeQueryBuilder(detector.getTimeField()).from(startTime).to(endTime).format("epoch_millis").includeLower(true).includeUpper(false);
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery().filter(rangeQuery).filter(detector.getFilterQuery());
AggregationBuilder bucketAggs = null;
if (detector.getCategoryField().size() == 1) {
bucketAggs = AggregationBuilders.terms(AGG_NAME_TOP).size(maxEntitiesSize).field(detector.getCategoryField().get(0));
} else {
/*
* We don't have an efficient solution for terms aggregation on multiple fields.
* Terms aggregation does not support collecting terms from multiple fields in the same document.
* We have to work around the limitation by using a script to retrieve terms from multiple fields.
* The workaround disables the global ordinals optimization and thus causes a markedly longer
* slowdown. This is because scripting is tugging on memory and has to iterate through
* all of the documents at least once to create run-time fields.
*
* We evaluated composite and terms aggregation using a generated data set with one
* million entities. Each entity has two documents. Composite aggregation finishes
* around 40 seconds. Terms aggregation performs differently on different clusters.
* On a 3 data node cluster, terms aggregation does not finish running within 2 hours
* on a 5 primary shard index. On a 15 data node cluster, terms aggregation needs 217 seconds
* on a 15 primary shard index. On a 30 data node cluster, terms aggregation needs 47 seconds
* on a 30 primary shard index.
*
* Here we work around the problem using composite aggregation. Composite aggregation cannot
* give top entities without collecting all aggregated results. Paginated results are returned
* in the natural order of composite keys. This is fine for Preview API. Preview API needs the
* top entities to make sure there is enough data for training and showing the results. We
* can paginate entities and filter out entities that do not have enough docs (e.g., 256 docs).
* As long as we have collected the desired number of entities (e.g., 5 entities), we can stop
* pagination.
*
* Example composite query:
* {
* "size": 0,
* "query": {
* "bool": {
* "filter": [{
* "range": {
* "@timestamp": {
* "from": 1626118340000,
* "to": 1626294912000,
* "include_lower": true,
* "include_upper": false,
* "format": "epoch_millis",
* "boost": 1.0
* }
* }
* }, {
* "match_all": {
* "boost": 1.0
* }
* }],
* "adjust_pure_negative": true,
* "boost": 1.0
* }
* },
* "track_total_hits": -1,
* "aggregations": {
* "top_agg": {
* "composite": {
* "size": 1,
* "sources": [{
* "service": {
* "terms": {
* "field": "service",
* "missing_bucket": false,
* "order": "asc"
* }
* }
* }, {
* "host": {
* "terms": {
* "field": "host",
* "missing_bucket": false,
* "order": "asc"
* }
* }
* }]
* },
* "aggregations": {
* "bucketSort": {
* "bucket_sort": {
* "sort": [{
* "_count": {
* "order": "desc"
* }
* }],
* "from": 0,
* "size": 5,
* "gap_policy": "SKIP"
* }
* }
* }
* }
* }
* }
*
*/
bucketAggs = AggregationBuilders.composite(AGG_NAME_TOP, detector.getCategoryField().stream().map(f -> new TermsValuesSourceBuilder(f).field(f)).collect(Collectors.toList())).size(pageSize).subAggregation(PipelineAggregatorBuilders.bucketSort("bucketSort", Arrays.asList(new FieldSortBuilder("_count").order(SortOrder.DESC))).size(maxEntitiesSize));
}
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder().query(boolQueryBuilder).aggregation(bucketAggs).trackTotalHits(false).size(0);
SearchRequest searchRequest = new SearchRequest().indices(detector.getIndices().toArray(new String[0])).source(searchSourceBuilder);
client.search(searchRequest, new TopEntitiesListener(listener, detector, searchSourceBuilder, // TODO: tune timeout for historical analysis based on performance test result
clock.millis() + previewTimeoutInMilliseconds, maxEntitiesSize, minimumDocCount));
}
use of org.opensearch.search.builder.SearchSourceBuilder in project anomaly-detection by opensearch-project.
the class ParseUtilsTests method testAddUserRoleFilterWithNormalUserBackendRole.
public void testAddUserRoleFilterWithNormalUserBackendRole() {
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
String backendRole1 = randomAlphaOfLength(5);
String backendRole2 = randomAlphaOfLength(5);
addUserBackendRolesFilter(new User(randomAlphaOfLength(5), ImmutableList.of(backendRole1, backendRole2), ImmutableList.of(randomAlphaOfLength(5)), ImmutableList.of(randomAlphaOfLength(5))), searchSourceBuilder);
assertEquals("{\"query\":{\"bool\":{\"must\":[{\"nested\":{\"query\":{\"terms\":{\"user.backend_roles.keyword\":" + "[\"" + backendRole1 + "\",\"" + backendRole2 + "\"]," + "\"boost\":1.0}},\"path\":\"user\",\"ignore_unmapped\":false,\"score_mode\":\"none\",\"boost\":1.0}}]," + "\"adjust_pure_negative\":true,\"boost\":1.0}}}", searchSourceBuilder.toString());
}
use of org.opensearch.search.builder.SearchSourceBuilder in project anomaly-detection by opensearch-project.
the class ParseUtilsTests method testAddUserRoleFilterWithNullUser.
public void testAddUserRoleFilterWithNullUser() {
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
addUserBackendRolesFilter(null, searchSourceBuilder);
assertEquals("{}", searchSourceBuilder.toString());
}
Aggregations