Search in sources :

Example 16 with IWord

use of com.hankcs.hanlp.corpus.document.sentence.word.IWord in project HanLP by hankcs.

the class AdjustCorpus method testPlay.

public void testPlay() throws Exception {
    final TFDictionary tfDictionary = new TFDictionary();
    CorpusLoader.walk("D:\\JavaProjects\\CorpusToolBox\\data\\2014", new CorpusLoader.Handler() {

        @Override
        public void handle(Document document) {
            for (List<IWord> wordList : document.getComplexSentenceList()) {
                for (IWord word : wordList) {
                    if (word instanceof CompoundWord && word.getLabel().equals("ns")) {
                        tfDictionary.add(word.toString());
                    }
                }
            }
        }
    });
    tfDictionary.saveTxtTo("data/test/complex_ns.txt");
}
Also used : TFDictionary(com.hankcs.hanlp.corpus.dictionary.TFDictionary) CorpusLoader(com.hankcs.hanlp.corpus.document.CorpusLoader) List(java.util.List) Document(com.hankcs.hanlp.corpus.document.Document) CompoundWord(com.hankcs.hanlp.corpus.document.sentence.word.CompoundWord) IWord(com.hankcs.hanlp.corpus.document.sentence.word.IWord)

Example 17 with IWord

use of com.hankcs.hanlp.corpus.document.sentence.word.IWord in project HanLP by hankcs.

the class TestAdjustCoreDictionary method testSimplifyNZ.

public void testSimplifyNZ() throws Exception {
    final DictionaryMaker nzDictionary = new DictionaryMaker();
    CorpusLoader.walk("D:\\Doc\\语料库\\2014", new CorpusLoader.Handler() {

        @Override
        public void handle(Document document) {
            for (List<IWord> sentence : document.getComplexSentenceList()) {
                for (IWord word : sentence) {
                    if (word instanceof CompoundWord && "nz".equals(word.getLabel())) {
                        nzDictionary.add(word);
                    }
                }
            }
        }
    });
    nzDictionary.saveTxtTo("data/test/nz.txt");
}
Also used : CorpusLoader(com.hankcs.hanlp.corpus.document.CorpusLoader) DictionaryMaker(com.hankcs.hanlp.corpus.dictionary.DictionaryMaker) List(java.util.List) Document(com.hankcs.hanlp.corpus.document.Document) CompoundWord(com.hankcs.hanlp.corpus.document.sentence.word.CompoundWord) IWord(com.hankcs.hanlp.corpus.document.sentence.word.IWord)

Aggregations

IWord (com.hankcs.hanlp.corpus.document.sentence.word.IWord)17 LinkedList (java.util.LinkedList)11 Word (com.hankcs.hanlp.corpus.document.sentence.word.Word)8 List (java.util.List)8 CompoundWord (com.hankcs.hanlp.corpus.document.sentence.word.CompoundWord)7 CorpusLoader (com.hankcs.hanlp.corpus.document.CorpusLoader)4 Document (com.hankcs.hanlp.corpus.document.Document)4 Sentence (com.hankcs.hanlp.corpus.document.sentence.Sentence)4 DictionaryMaker (com.hankcs.hanlp.corpus.dictionary.DictionaryMaker)3 TFDictionary (com.hankcs.hanlp.corpus.dictionary.TFDictionary)1 Matcher (java.util.regex.Matcher)1 Pattern (java.util.regex.Pattern)1