Search in sources :

Example 1 with LanguageConfidence

use of org.apache.tika.language.detect.LanguageConfidence in project tika by apache.

the class OptimaizeLangDetector method detectAll.

@Override
public List<LanguageResult> detectAll() {
    // TODO throw exception if models haven't been loaded, or auto-load all?
    List<LanguageResult> result = new ArrayList<>();
    List<DetectedLanguage> rawResults = detector.getProbabilities(writer.toString());
    for (DetectedLanguage rawResult : rawResults) {
        // TODO figure out right level for confidence brackets.
        LanguageConfidence confidence = rawResult.getProbability() > 0.9 ? LanguageConfidence.HIGH : LanguageConfidence.MEDIUM;
        result.add(new LanguageResult(makeLanguageName(rawResult.getLocale()), confidence, (float) rawResult.getProbability()));
    }
    if (result.isEmpty()) {
        result.add(LanguageResult.NULL);
    }
    return result;
}
Also used : LanguageResult(org.apache.tika.language.detect.LanguageResult) LanguageConfidence(org.apache.tika.language.detect.LanguageConfidence) ArrayList(java.util.ArrayList) DetectedLanguage(com.optimaize.langdetect.DetectedLanguage)

Aggregations

DetectedLanguage (com.optimaize.langdetect.DetectedLanguage)1 ArrayList (java.util.ArrayList)1 LanguageConfidence (org.apache.tika.language.detect.LanguageConfidence)1 LanguageResult (org.apache.tika.language.detect.LanguageResult)1