use of org.apache.tika.language.detect.LanguageConfidence in project tika by apache.
the class OptimaizeLangDetector method detectAll.
@Override
public List<LanguageResult> detectAll() {
// TODO throw exception if models haven't been loaded, or auto-load all?
List<LanguageResult> result = new ArrayList<>();
List<DetectedLanguage> rawResults = detector.getProbabilities(writer.toString());
for (DetectedLanguage rawResult : rawResults) {
// TODO figure out right level for confidence brackets.
LanguageConfidence confidence = rawResult.getProbability() > 0.9 ? LanguageConfidence.HIGH : LanguageConfidence.MEDIUM;
result.add(new LanguageResult(makeLanguageName(rawResult.getLocale()), confidence, (float) rawResult.getProbability()));
}
if (result.isEmpty()) {
result.add(LanguageResult.NULL);
}
return result;
}
Aggregations