Search in sources :

Example 1 with Sentence

use of com.hankcs.hanlp.corpus.document.sentence.Sentence in project HanLP by hankcs.

the class Document method toString.

@Override
public String toString() {
    StringBuilder sb = new StringBuilder();
    for (Sentence sentence : sentenceList) {
        sb.append(sentence);
        sb.append(' ');
    }
    if (sb.length() > 0)
        sb.deleteCharAt(sb.length() - 1);
    return sb.toString();
}
Also used : Sentence(com.hankcs.hanlp.corpus.document.sentence.Sentence)

Example 2 with Sentence

use of com.hankcs.hanlp.corpus.document.sentence.Sentence in project HanLP by hankcs.

the class Document method create.

public static Document create(String param) {
    Pattern pattern = Pattern.compile(".+?((。/w)|(!/w )|(?/w )|\\n|$)");
    Matcher matcher = pattern.matcher(param);
    List<Sentence> sentenceList = new LinkedList<Sentence>();
    while (matcher.find()) {
        String single = matcher.group();
        Sentence sentence = Sentence.create(single);
        if (sentence == null) {
            logger.warning("使用" + single + "构建句子失败");
            return null;
        }
        sentenceList.add(sentence);
    }
    return new Document(sentenceList);
}
Also used : Pattern(java.util.regex.Pattern) Matcher(java.util.regex.Matcher) Sentence(com.hankcs.hanlp.corpus.document.sentence.Sentence) LinkedList(java.util.LinkedList)

Example 3 with Sentence

use of com.hankcs.hanlp.corpus.document.sentence.Sentence in project HanLP by hankcs.

the class Document method getSimpleSentenceList.

/**
     * 获取简单的句子列表,其中复合词会被拆分为简单词
     * @return
     */
public List<List<Word>> getSimpleSentenceList() {
    List<List<Word>> simpleList = new LinkedList<List<Word>>();
    for (Sentence sentence : sentenceList) {
        List<Word> wordList = new LinkedList<Word>();
        for (IWord word : sentence.wordList) {
            if (word instanceof CompoundWord) {
                for (Word inner : ((CompoundWord) word).innerList) {
                    wordList.add(inner);
                }
            } else {
                wordList.add((Word) word);
            }
        }
        simpleList.add(wordList);
    }
    return simpleList;
}
Also used : CompoundWord(com.hankcs.hanlp.corpus.document.sentence.word.CompoundWord) Word(com.hankcs.hanlp.corpus.document.sentence.word.Word) IWord(com.hankcs.hanlp.corpus.document.sentence.word.IWord) List(java.util.List) LinkedList(java.util.LinkedList) Sentence(com.hankcs.hanlp.corpus.document.sentence.Sentence) CompoundWord(com.hankcs.hanlp.corpus.document.sentence.word.CompoundWord) LinkedList(java.util.LinkedList) IWord(com.hankcs.hanlp.corpus.document.sentence.word.IWord)

Example 4 with Sentence

use of com.hankcs.hanlp.corpus.document.sentence.Sentence in project HanLP by hankcs.

the class Document method getSimpleSentenceList.

/**
     * 获取简单的句子列表
     * @param spilt 如果为真,其中复合词会被拆分为简单词
     * @return
     */
public List<List<Word>> getSimpleSentenceList(boolean spilt) {
    List<List<Word>> simpleList = new LinkedList<List<Word>>();
    for (Sentence sentence : sentenceList) {
        List<Word> wordList = new LinkedList<Word>();
        for (IWord word : sentence.wordList) {
            if (word instanceof CompoundWord) {
                if (spilt) {
                    for (Word inner : ((CompoundWord) word).innerList) {
                        wordList.add(inner);
                    }
                } else {
                    wordList.add(((CompoundWord) word).toWord());
                }
            } else {
                wordList.add((Word) word);
            }
        }
        simpleList.add(wordList);
    }
    return simpleList;
}
Also used : CompoundWord(com.hankcs.hanlp.corpus.document.sentence.word.CompoundWord) Word(com.hankcs.hanlp.corpus.document.sentence.word.Word) IWord(com.hankcs.hanlp.corpus.document.sentence.word.IWord) List(java.util.List) LinkedList(java.util.LinkedList) Sentence(com.hankcs.hanlp.corpus.document.sentence.Sentence) CompoundWord(com.hankcs.hanlp.corpus.document.sentence.word.CompoundWord) LinkedList(java.util.LinkedList) IWord(com.hankcs.hanlp.corpus.document.sentence.word.IWord)

Example 5 with Sentence

use of com.hankcs.hanlp.corpus.document.sentence.Sentence in project HanLP by hankcs.

the class Document method getSimpleSentenceList.

/**
     * 获取简单的句子列表,其中复合词的标签如果是set中指定的话会被拆分为简单词
     * @param labelSet
     * @return
     */
public List<List<Word>> getSimpleSentenceList(Set<String> labelSet) {
    List<List<Word>> simpleList = new LinkedList<List<Word>>();
    for (Sentence sentence : sentenceList) {
        List<Word> wordList = new LinkedList<Word>();
        for (IWord word : sentence.wordList) {
            if (word instanceof CompoundWord) {
                if (labelSet.contains(word.getLabel())) {
                    for (Word inner : ((CompoundWord) word).innerList) {
                        wordList.add(inner);
                    }
                } else {
                    wordList.add(((CompoundWord) word).toWord());
                }
            } else {
                wordList.add((Word) word);
            }
        }
        simpleList.add(wordList);
    }
    return simpleList;
}
Also used : CompoundWord(com.hankcs.hanlp.corpus.document.sentence.word.CompoundWord) Word(com.hankcs.hanlp.corpus.document.sentence.word.Word) IWord(com.hankcs.hanlp.corpus.document.sentence.word.IWord) List(java.util.List) LinkedList(java.util.LinkedList) Sentence(com.hankcs.hanlp.corpus.document.sentence.Sentence) CompoundWord(com.hankcs.hanlp.corpus.document.sentence.word.CompoundWord) LinkedList(java.util.LinkedList) IWord(com.hankcs.hanlp.corpus.document.sentence.word.IWord)

Aggregations

Sentence (com.hankcs.hanlp.corpus.document.sentence.Sentence)6 LinkedList (java.util.LinkedList)5 IWord (com.hankcs.hanlp.corpus.document.sentence.word.IWord)4 List (java.util.List)4 CompoundWord (com.hankcs.hanlp.corpus.document.sentence.word.CompoundWord)3 Word (com.hankcs.hanlp.corpus.document.sentence.word.Word)3 Matcher (java.util.regex.Matcher)1 Pattern (java.util.regex.Pattern)1