Search in sources :

Example 1 with LanguageResult

use of org.apache.tika.language.detect.LanguageResult in project tika by apache.

the class Language method languageDetection.

public static void languageDetection() throws IOException {
    LanguageDetector detector = new OptimaizeLangDetector().loadModels();
    LanguageResult result = detector.detect("Alla människor är födda fria och lika i värde och rättigheter.");
    System.out.println(result.getLanguage());
}
Also used : LanguageDetector(org.apache.tika.language.detect.LanguageDetector) LanguageResult(org.apache.tika.language.detect.LanguageResult) OptimaizeLangDetector(org.apache.tika.langdetect.OptimaizeLangDetector)

Example 2 with LanguageResult

use of org.apache.tika.language.detect.LanguageResult in project tika by apache.

the class Language method languageDetectionWithHandler.

public static void languageDetectionWithHandler() throws Exception {
    LanguageHandler handler = new LanguageHandler();
    new AutoDetectParser().parse(System.in, handler, new Metadata(), new ParseContext());
    LanguageResult result = handler.getLanguage();
    System.out.println(result.getLanguage());
}
Also used : LanguageHandler(org.apache.tika.language.detect.LanguageHandler) LanguageResult(org.apache.tika.language.detect.LanguageResult) Metadata(org.apache.tika.metadata.Metadata) ParseContext(org.apache.tika.parser.ParseContext) AutoDetectParser(org.apache.tika.parser.AutoDetectParser)

Example 3 with LanguageResult

use of org.apache.tika.language.detect.LanguageResult in project tika by apache.

the class LanguageDetectorExample method detectLanguage.

public String detectLanguage(String text) throws IOException {
    LanguageDetector detector = new OptimaizeLangDetector().loadModels();
    LanguageResult result = detector.detect(text);
    return result.getLanguage();
}
Also used : LanguageDetector(org.apache.tika.language.detect.LanguageDetector) LanguageResult(org.apache.tika.language.detect.LanguageResult) OptimaizeLangDetector(org.apache.tika.langdetect.OptimaizeLangDetector)

Example 4 with LanguageResult

use of org.apache.tika.language.detect.LanguageResult in project tika by apache.

the class Lingo24LangDetector method detectAll.

@Override
public List<LanguageResult> detectAll() {
    List<LanguageResult> result = new ArrayList<>();
    String language = detect(writer.toString());
    if (language != null) {
        result.add(new LanguageResult(language, LanguageConfidence.MEDIUM, 1));
    } else {
        result.add(new LanguageResult(language, LanguageConfidence.NONE, 0));
    }
    return result;
}
Also used : LanguageResult(org.apache.tika.language.detect.LanguageResult) ArrayList(java.util.ArrayList)

Example 5 with LanguageResult

use of org.apache.tika.language.detect.LanguageResult in project tika by apache.

the class OptimaizeLangDetector method detectAll.

@Override
public List<LanguageResult> detectAll() {
    // TODO throw exception if models haven't been loaded, or auto-load all?
    List<LanguageResult> result = new ArrayList<>();
    List<DetectedLanguage> rawResults = detector.getProbabilities(writer.toString());
    for (DetectedLanguage rawResult : rawResults) {
        // TODO figure out right level for confidence brackets.
        LanguageConfidence confidence = rawResult.getProbability() > 0.9 ? LanguageConfidence.HIGH : LanguageConfidence.MEDIUM;
        result.add(new LanguageResult(makeLanguageName(rawResult.getLocale()), confidence, (float) rawResult.getProbability()));
    }
    if (result.isEmpty()) {
        result.add(LanguageResult.NULL);
    }
    return result;
}
Also used : LanguageResult(org.apache.tika.language.detect.LanguageResult) LanguageConfidence(org.apache.tika.language.detect.LanguageConfidence) ArrayList(java.util.ArrayList) DetectedLanguage(com.optimaize.langdetect.DetectedLanguage)

Aggregations

LanguageResult (org.apache.tika.language.detect.LanguageResult)20 LanguageDetector (org.apache.tika.language.detect.LanguageDetector)10 OptimaizeLangDetector (org.apache.tika.langdetect.OptimaizeLangDetector)7 LanguageWriter (org.apache.tika.language.detect.LanguageWriter)7 Test (org.junit.Test)6 Consumes (javax.ws.rs.Consumes)3 POST (javax.ws.rs.POST)3 PUT (javax.ws.rs.PUT)3 Path (javax.ws.rs.Path)3 Produces (javax.ws.rs.Produces)3 ArrayList (java.util.ArrayList)2 LanguageHandler (org.apache.tika.language.detect.LanguageHandler)2 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)2 ParseContext (org.apache.tika.parser.ParseContext)2 ContentHandler (org.xml.sax.ContentHandler)2 DetectedLanguage (com.optimaize.langdetect.DetectedLanguage)1 File (java.io.File)1 IOException (java.io.IOException)1 InputStream (java.io.InputStream)1 Detector (org.apache.tika.detect.Detector)1