Search in sources :

Example 1 with SentenceTokenizer

use of org.languagetool.tokenizers.SentenceTokenizer in project languagetool by languagetool-org.

the class CommandLineTools method checkText.

/**
   * Check the given text and print results to System.out.
   *
   * @param contents a text to check (may be more than one sentence)
   * @param lt Initialized LanguageTool
   * @param isXmlFormat whether to print the result in XML format
   * @param isJsonFormat whether to print the result in JSON format
   * @param contextSize error text context size: -1 for default
   * @param lineOffset line number offset to be added to line numbers in matches
   * @param prevMatches number of previously matched rules
   * @param apiMode mode of xml/json printout for simple xml/json output
   * @return Number of rule matches to the input text.
   */
public static int checkText(String contents, JLanguageTool lt, boolean isXmlFormat, boolean isJsonFormat, int contextSize, int lineOffset, int prevMatches, StringTools.ApiPrintMode apiMode, boolean listUnknownWords, List<String> unknownWords) throws IOException {
    if (contextSize == -1) {
        contextSize = DEFAULT_CONTEXT_SIZE;
    }
    long startTime = System.currentTimeMillis();
    List<RuleMatch> ruleMatches = lt.check(contents);
    // adjust line numbers
    for (RuleMatch r : ruleMatches) {
        r.setLine(r.getLine() + lineOffset);
        r.setEndLine(r.getEndLine() + lineOffset);
    }
    if (isXmlFormat) {
        if (listUnknownWords && apiMode == StringTools.ApiPrintMode.NORMAL_API) {
            unknownWords = lt.getUnknownWords();
        }
        RuleMatchAsXmlSerializer serializer = new RuleMatchAsXmlSerializer();
        String xml = serializer.ruleMatchesToXml(ruleMatches, contents, contextSize, apiMode, lt.getLanguage(), unknownWords);
        PrintStream out = new PrintStream(System.out, true, "UTF-8");
        out.print(xml);
    } else if (isJsonFormat) {
        RuleMatchesAsJsonSerializer serializer = new RuleMatchesAsJsonSerializer();
        String json = serializer.ruleMatchesToJson(ruleMatches, contents, contextSize, lt.getLanguage());
        PrintStream out = new PrintStream(System.out, true, "UTF-8");
        out.print(json);
    } else {
        printMatches(ruleMatches, prevMatches, contents, contextSize);
    }
    //display stats if it's not in a buffered mode
    if (apiMode == StringTools.ApiPrintMode.NORMAL_API && !isJsonFormat) {
        SentenceTokenizer sentenceTokenizer = lt.getLanguage().getSentenceTokenizer();
        int sentenceCount = sentenceTokenizer.tokenize(contents).size();
        displayTimeStats(startTime, sentenceCount, isXmlFormat);
    }
    return ruleMatches.size();
}
Also used : RuleMatchesAsJsonSerializer(org.languagetool.tools.RuleMatchesAsJsonSerializer) PrintStream(java.io.PrintStream) RuleMatch(org.languagetool.rules.RuleMatch) SentenceTokenizer(org.languagetool.tokenizers.SentenceTokenizer) RuleMatchAsXmlSerializer(org.languagetool.tools.RuleMatchAsXmlSerializer)

Aggregations

PrintStream (java.io.PrintStream)1 RuleMatch (org.languagetool.rules.RuleMatch)1 SentenceTokenizer (org.languagetool.tokenizers.SentenceTokenizer)1 RuleMatchAsXmlSerializer (org.languagetool.tools.RuleMatchAsXmlSerializer)1 RuleMatchesAsJsonSerializer (org.languagetool.tools.RuleMatchesAsJsonSerializer)1