Search in sources :

Example 1 with OptimaizeLangDetector

use of org.apache.tika.langdetect.OptimaizeLangDetector in project tika by apache.

the class Language method languageDetection.

public static void languageDetection() throws IOException {
    LanguageDetector detector = new OptimaizeLangDetector().loadModels();
    LanguageResult result = detector.detect("Alla människor är födda fria och lika i värde och rättigheter.");
    System.out.println(result.getLanguage());
}
Also used : LanguageDetector(org.apache.tika.language.detect.LanguageDetector) LanguageResult(org.apache.tika.language.detect.LanguageResult) OptimaizeLangDetector(org.apache.tika.langdetect.OptimaizeLangDetector)

Example 2 with OptimaizeLangDetector

use of org.apache.tika.langdetect.OptimaizeLangDetector in project tika by apache.

the class LanguageDetectorExample method detectLanguage.

public String detectLanguage(String text) throws IOException {
    LanguageDetector detector = new OptimaizeLangDetector().loadModels();
    LanguageResult result = detector.detect(text);
    return result.getLanguage();
}
Also used : LanguageDetector(org.apache.tika.language.detect.LanguageDetector) LanguageResult(org.apache.tika.language.detect.LanguageResult) OptimaizeLangDetector(org.apache.tika.langdetect.OptimaizeLangDetector)

Example 3 with OptimaizeLangDetector

use of org.apache.tika.langdetect.OptimaizeLangDetector in project tika by apache.

the class Language method languageDetectionWithWriter.

public static void languageDetectionWithWriter() throws IOException {
    // TODO support version of LanguageWriter that doesn't need a detector.
    LanguageDetector detector = new OptimaizeLangDetector().loadModels();
    LanguageWriter writer = new LanguageWriter(detector);
    writer.append("Minden emberi lény");
    writer.append(" szabadon születik és");
    writer.append(" egyenlő méltósága és");
    writer.append(" joga van.");
    LanguageResult result = writer.getLanguage();
    System.out.println(result.getLanguage());
    writer.close();
}
Also used : LanguageDetector(org.apache.tika.language.detect.LanguageDetector) LanguageResult(org.apache.tika.language.detect.LanguageResult) OptimaizeLangDetector(org.apache.tika.langdetect.OptimaizeLangDetector) LanguageWriter(org.apache.tika.language.detect.LanguageWriter)

Example 4 with OptimaizeLangDetector

use of org.apache.tika.langdetect.OptimaizeLangDetector in project tika by apache.

the class MyFirstTika method parseUsingComponents.

public static String parseUsingComponents(String filename, TikaConfig tikaConfig, Metadata metadata) throws Exception {
    MimeTypes mimeRegistry = tikaConfig.getMimeRepository();
    System.out.println("Examining: [" + filename + "]");
    metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
    System.out.println("The MIME type (based on filename) is: [" + mimeRegistry.detect(null, metadata) + "]");
    InputStream stream = TikaInputStream.get(new File(filename));
    System.out.println("The MIME type (based on MAGIC) is: [" + mimeRegistry.detect(stream, metadata) + "]");
    stream = TikaInputStream.get(new File(filename));
    Detector detector = tikaConfig.getDetector();
    System.out.println("The MIME type (based on the Detector interface) is: [" + detector.detect(stream, metadata) + "]");
    LanguageDetector langDetector = new OptimaizeLangDetector().loadModels();
    LanguageResult lang = langDetector.detect(FileUtils.readFileToString(new File(filename), UTF_8));
    System.out.println("The language of this content is: [" + lang.getLanguage() + "]");
    // Get a non-detecting parser that handles all the types it can
    Parser parser = tikaConfig.getParser();
    // Tell it what we think the content is
    MediaType type = detector.detect(stream, metadata);
    metadata.set(Metadata.CONTENT_TYPE, type.toString());
    // Have the file parsed to get the content and metadata
    ContentHandler handler = new BodyContentHandler();
    parser.parse(stream, handler, metadata, new ParseContext());
    return handler.toString();
}
Also used : LanguageDetector(org.apache.tika.language.detect.LanguageDetector) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) LanguageDetector(org.apache.tika.language.detect.LanguageDetector) Detector(org.apache.tika.detect.Detector) OptimaizeLangDetector(org.apache.tika.langdetect.OptimaizeLangDetector) LanguageResult(org.apache.tika.language.detect.LanguageResult) TikaInputStream(org.apache.tika.io.TikaInputStream) InputStream(java.io.InputStream) ParseContext(org.apache.tika.parser.ParseContext) OptimaizeLangDetector(org.apache.tika.langdetect.OptimaizeLangDetector) MediaType(org.apache.tika.mime.MediaType) MimeTypes(org.apache.tika.mime.MimeTypes) File(java.io.File) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ContentHandler(org.xml.sax.ContentHandler) Parser(org.apache.tika.parser.Parser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser)

Example 5 with OptimaizeLangDetector

use of org.apache.tika.langdetect.OptimaizeLangDetector in project tika by apache.

the class TranslateResource method autoTranslate.

@PUT
@POST
@Path("/all/{translator}/{dest}")
@Consumes("*/*")
@Produces("text/plain")
public String autoTranslate(final InputStream is, @PathParam("translator") String translator, @PathParam("dest") String dLang) throws TikaException, IOException {
    final String content = IOUtils.toString(is, UTF_8);
    LanguageResult language = new OptimaizeLangDetector().loadModels().detect(content);
    if (language.isUnknown()) {
        throw new TikaException("Unable to detect language to use for translation of text");
    }
    String sLang = language.getLanguage();
    LOG.info("LanguageIdentifier: detected source lang: [{}]", sLang);
    return doTranslate(content, translator, sLang, dLang);
}
Also used : LanguageResult(org.apache.tika.language.detect.LanguageResult) TikaException(org.apache.tika.exception.TikaException) OptimaizeLangDetector(org.apache.tika.langdetect.OptimaizeLangDetector) Path(javax.ws.rs.Path) POST(javax.ws.rs.POST) Consumes(javax.ws.rs.Consumes) Produces(javax.ws.rs.Produces) PUT(javax.ws.rs.PUT)

Aggregations

OptimaizeLangDetector (org.apache.tika.langdetect.OptimaizeLangDetector)7 LanguageResult (org.apache.tika.language.detect.LanguageResult)7 LanguageDetector (org.apache.tika.language.detect.LanguageDetector)4 Consumes (javax.ws.rs.Consumes)3 POST (javax.ws.rs.POST)3 PUT (javax.ws.rs.PUT)3 Path (javax.ws.rs.Path)3 Produces (javax.ws.rs.Produces)3 File (java.io.File)1 InputStream (java.io.InputStream)1 Detector (org.apache.tika.detect.Detector)1 TikaException (org.apache.tika.exception.TikaException)1 TikaInputStream (org.apache.tika.io.TikaInputStream)1 LanguageWriter (org.apache.tika.language.detect.LanguageWriter)1 MediaType (org.apache.tika.mime.MediaType)1 MimeTypes (org.apache.tika.mime.MimeTypes)1 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)1 ParseContext (org.apache.tika.parser.ParseContext)1 Parser (org.apache.tika.parser.Parser)1 BodyContentHandler (org.apache.tika.sax.BodyContentHandler)1