use of org.opensearch.ad.ml.EntityModel in project anomaly-detection by opensearch-project.
the class CheckpointWriteWorkerTests method testEmptyDetectorId.
@SuppressWarnings("unchecked")
public void testEmptyDetectorId() {
ModelState<EntityModel> state = mock(ModelState.class);
when(state.getLastCheckpointTime()).thenReturn(Instant.now());
EntityModel model = mock(EntityModel.class);
when(state.getModel()).thenReturn(model);
when(state.getDetectorId()).thenReturn(null);
when(state.getModelId()).thenReturn("a");
worker.write(state, true, RequestPriority.MEDIUM);
verify(checkpoint, never()).batchWrite(any(), any());
}
use of org.opensearch.ad.ml.EntityModel in project anomaly-detection by opensearch-project.
the class PriorityCache method getAllModelProfile.
@Override
public List<ModelProfile> getAllModelProfile(String detectorId) {
CacheBuffer cacheBuffer = activeEnities.get(detectorId);
List<ModelProfile> res = new ArrayList<>();
if (cacheBuffer != null) {
long size = cacheBuffer.getMemoryConsumptionPerEntity();
cacheBuffer.getAllModels().forEach(entry -> {
EntityModel model = entry.getModel();
Entity entity = null;
if (model != null && model.getEntity().isPresent()) {
entity = model.getEntity().get();
}
res.add(new ModelProfile(entry.getModelId(), entity, size));
});
}
return res;
}
use of org.opensearch.ad.ml.EntityModel in project anomaly-detection by opensearch-project.
the class CacheBuffer method remove.
/**
* Remove everything associated with the key and make a checkpoint.
*
* @param keyToRemove The key to remove
* @return the associated ModelState associated with the key, or null if there
* is no associated ModelState for the key
*/
public ModelState<EntityModel> remove(String keyToRemove) {
priorityTracker.removePriority(keyToRemove);
// if shared cache is empty, we are using reserved memory
boolean reserved = sharedCacheEmpty();
ModelState<EntityModel> valueRemoved = items.remove(keyToRemove);
if (valueRemoved != null) {
if (!reserved) {
// release in shared memory
memoryTracker.releaseMemory(memoryConsumptionPerEntity, false, Origin.HC_DETECTOR);
}
EntityModel modelRemoved = valueRemoved.getModel();
if (modelRemoved != null) {
// null model has only samples. For null model we save a checkpoint
// regardless of last checkpoint time. whether If we don't save,
// we throw the new samples and might never be able to initialize the model
boolean isNullModel = !modelRemoved.getTrcf().isPresent();
checkpointWriteQueue.write(valueRemoved, isNullModel, RequestPriority.MEDIUM);
modelRemoved.clear();
}
}
return valueRemoved;
}
use of org.opensearch.ad.ml.EntityModel in project anomaly-detection by opensearch-project.
the class CacheBuffer method maintenance.
/**
* Remove expired state and save checkpoints of existing states
* @return removed states
*/
public List<ModelState<EntityModel>> maintenance() {
List<ModelState<EntityModel>> modelsToSave = new ArrayList<>();
List<ModelState<EntityModel>> removedStates = new ArrayList<>();
items.entrySet().stream().forEach(entry -> {
String entityModelId = entry.getKey();
try {
ModelState<EntityModel> modelState = entry.getValue();
Instant now = clock.instant();
if (modelState.getLastUsedTime().plus(modelTtl).isBefore(now)) {
// race conditions can happen between the put and one of the following operations:
// remove: not a problem as all of the data structures are concurrent.
// Two threads removing the same entry is not a problem.
// clear: not a problem as we are releasing memory in MemoryTracker.
// The removed one loses references and soon GC will collect it.
// We have memory tracking correction to fix incorrect memory usage record.
// put: not a problem as we are unlikely to maintain an entry that's not
// already in the cache
// remove method saves checkpoint as well
removedStates.add(remove(entityModelId));
} else if (random.nextInt(6) == 0) {
// checkpoint is relatively big compared to other queued requests
// save checkpoints with 1/6 probability as we expect to save
// all every 6 hours statistically
//
// Background:
// We will save a checkpoint when
//
// (a)removing the model from cache.
// (b) cold start
// (c) no complete model only a few samples. If we don't save new samples,
// we will never be able to have enough samples for a trained mode.
// (d) periodically save in case of exceptions.
//
// This branch is doing d). Previously, I will do it every hour for all
// in-cache models. Consider we are moving to 1M entities, this will bring
// the cluster in a heavy payload every hour. That's why I am doing it randomly
// (expected 6 hours for each checkpoint statistically).
//
// I am doing it random since maintaining a state of which one has been saved
// and which one hasn't are not cheap. Also, the models in the cache can be
// dynamically changing. Will have to maintain the state in the removing logic.
// Random is a lazy way to deal with this as it is stateless and statistically sound.
//
// If a checkpoint does not fall into the 6-hour bucket in a particular scenario, the model
// is stale (i.e., we don't recover from the freshest model in disaster.).
//
// All in all, randomness is mostly due to performance and easy maintenance.
modelsToSave.add(modelState);
}
} catch (Exception e) {
LOG.warn("Failed to finish maintenance for model id " + entityModelId, e);
}
});
checkpointWriteQueue.writeAll(modelsToSave, detectorId, false, RequestPriority.MEDIUM);
return removedStates;
}
Aggregations