Search in sources :

Example 6 with LanguageWriter

use of org.apache.tika.language.detect.LanguageWriter in project tika by apache.

the class TextLangDetectorTest method test.

@Test
public void test() throws Exception {
    assumeTrue(TextLangDetector.canRun());
    LanguageDetector detector = new TextLangDetector();
    LanguageWriter writer = new LanguageWriter(detector);
    List<String> lines = IOUtils.readLines(TextLangDetectorTest.class.getResourceAsStream("text-test.tsv"));
    for (String line : lines) {
        String[] data = line.split("\t");
        if (data.length != 2)
            continue;
        writer.reset();
        writer.append(data[1]);
        LanguageResult result = detector.detect();
        assertNotNull(result);
        assertEquals(data[0], result.getLanguage());
    }
    writer.close();
}
Also used : LanguageDetector(org.apache.tika.language.detect.LanguageDetector) LanguageResult(org.apache.tika.language.detect.LanguageResult) LanguageWriter(org.apache.tika.language.detect.LanguageWriter) Test(org.junit.Test)

Example 7 with LanguageWriter

use of org.apache.tika.language.detect.LanguageWriter in project tika by apache.

the class Language method languageDetectionWithWriter.

public static void languageDetectionWithWriter() throws IOException {
    // TODO support version of LanguageWriter that doesn't need a detector.
    LanguageDetector detector = new OptimaizeLangDetector().loadModels();
    LanguageWriter writer = new LanguageWriter(detector);
    writer.append("Minden emberi lény");
    writer.append(" szabadon születik és");
    writer.append(" egyenlő méltósága és");
    writer.append(" joga van.");
    LanguageResult result = writer.getLanguage();
    System.out.println(result.getLanguage());
    writer.close();
}
Also used : LanguageDetector(org.apache.tika.language.detect.LanguageDetector) LanguageResult(org.apache.tika.language.detect.LanguageResult) OptimaizeLangDetector(org.apache.tika.langdetect.OptimaizeLangDetector) LanguageWriter(org.apache.tika.language.detect.LanguageWriter)

Aggregations

LanguageDetector (org.apache.tika.language.detect.LanguageDetector)7 LanguageResult (org.apache.tika.language.detect.LanguageResult)7 LanguageWriter (org.apache.tika.language.detect.LanguageWriter)7 Test (org.junit.Test)6 OptimaizeLangDetector (org.apache.tika.langdetect.OptimaizeLangDetector)1