Search in sources :

Example 1 with CRFSegment

use of com.hankcs.hanlp.seg.CRF.CRFSegment in project HanLP by hankcs.

the class TestSegment method testCRFSegment.

public void testCRFSegment() throws Exception {
    HanLP.Config.enableDebug();
    //        HanLP.Config.ShowTermNature = false;
    Segment segment = new CRFSegment();
    System.out.println(segment.seg("有句谚语叫做一个萝卜一个坑儿"));
}
Also used : Segment(com.hankcs.hanlp.seg.Segment) DoubleArrayTrieSegment(com.hankcs.hanlp.seg.Other.DoubleArrayTrieSegment) CRFSegment(com.hankcs.hanlp.seg.CRF.CRFSegment) DijkstraSegment(com.hankcs.hanlp.seg.Dijkstra.DijkstraSegment) ViterbiSegment(com.hankcs.hanlp.seg.Viterbi.ViterbiSegment) CRFSegment(com.hankcs.hanlp.seg.CRF.CRFSegment)

Example 2 with CRFSegment

use of com.hankcs.hanlp.seg.CRF.CRFSegment in project HanLP by hankcs.

the class TestSegment method testIssue199.

public void testIssue199() throws Exception {
    Segment segment = new CRFSegment();
    // 开启自定义词典
    segment.enableCustomDictionary(false);
    segment.enablePartOfSpeechTagging(true);
    List<Term> termList = segment.seg("更多采购");
    System.out.println(termList);
    for (Term term : termList) {
        if (term.nature == null) {
            System.out.println("识别到新词:" + term.word);
        }
    }
}
Also used : Term(com.hankcs.hanlp.seg.common.Term) ResultTerm(com.hankcs.hanlp.seg.common.ResultTerm) Segment(com.hankcs.hanlp.seg.Segment) DoubleArrayTrieSegment(com.hankcs.hanlp.seg.Other.DoubleArrayTrieSegment) CRFSegment(com.hankcs.hanlp.seg.CRF.CRFSegment) DijkstraSegment(com.hankcs.hanlp.seg.Dijkstra.DijkstraSegment) ViterbiSegment(com.hankcs.hanlp.seg.Viterbi.ViterbiSegment) CRFSegment(com.hankcs.hanlp.seg.CRF.CRFSegment)

Example 3 with CRFSegment

use of com.hankcs.hanlp.seg.CRF.CRFSegment in project HanLP by hankcs.

the class DemoMultithreadingSegment method main.

public static void main(String[] args) {
    // CRF分词器效果好,速度慢,并行化之后可以提高一些速度
    Segment segment = new CRFSegment();
    String text = "举办纪念活动铭记二战历史,不忘战争带给人类的深重灾难,是为了防止悲剧重演,确保和平永驻;" + "铭记二战历史,更是为了提醒国际社会,需要共同捍卫二战胜利成果和国际公平正义," + "必须警惕和抵制在历史认知和维护战后国际秩序问题上的倒行逆施。";
    HanLP.Config.ShowTermNature = false;
    System.out.println(segment.seg(text));
    int pressure = 10000;
    StringBuilder sbBigText = new StringBuilder(text.length() * pressure);
    for (int i = 0; i < pressure; i++) {
        sbBigText.append(text);
    }
    text = sbBigText.toString();
    System.gc();
    long start;
    double costTime;
    // 测个速度
    segment.enableMultithreading(false);
    start = System.currentTimeMillis();
    segment.seg(text);
    costTime = (System.currentTimeMillis() - start) / (double) 1000;
    System.out.printf("单线程分词速度:%.2f字每秒\n", text.length() / costTime);
    System.gc();
    // 或者 segment.enableMultithreading(4);
    segment.enableMultithreading(true);
    start = System.currentTimeMillis();
    segment.seg(text);
    costTime = (System.currentTimeMillis() - start) / (double) 1000;
    System.out.printf("多线程分词速度:%.2f字每秒\n", text.length() / costTime);
    System.gc();
// Note:
// 内部的并行化机制可以对1万字以上的大文本开启多线程分词
// 另一方面,HanLP中的任何Segment本身都是线程安全的。
// 你可以开10个线程用同一个CRFSegment对象切分任意文本,不需要任何线程同步的措施,每个线程都可以得到正确的结果。
}
Also used : Segment(com.hankcs.hanlp.seg.Segment) CRFSegment(com.hankcs.hanlp.seg.CRF.CRFSegment) CRFSegment(com.hankcs.hanlp.seg.CRF.CRFSegment)

Example 4 with CRFSegment

use of com.hankcs.hanlp.seg.CRF.CRFSegment in project HanLP by hankcs.

the class DemoCRFSegment method main.

public static void main(String[] args) {
    // 关闭词性显示
    HanLP.Config.ShowTermNature = false;
    Segment segment = new CRFSegment().enableCustomDictionary(false);
    String[] sentenceArray = new String[] { "HanLP是由一系列模型与算法组成的Java工具包,目标是普及自然语言处理在生产环境中的应用。", // 繁体无压力
    "鐵桿部隊憤怒情緒集結 馬英九腹背受敵", "馬英九回應連勝文“丐幫說”:稱黨內同志談話應謹慎", // 专业名词有一定辨识能力
    "高锰酸钾,强氧化剂,紫红色晶体,可溶于水,遇乙醇即被还原。常用作消毒剂、水净化剂、氧化剂、漂白剂、毒气吸收剂、二氧化碳精制剂等。", // 非新闻语料
    "《夜晚的骰子》通过描述浅草的舞女在暗夜中扔骰子的情景,寄托了作者对庶民生活区的情感", // 微博
    "这个像是真的[委屈]前面那个打扮太江户了,一点不上品...@hankcs", "鼎泰丰的小笼一点味道也没有...每样都淡淡的...淡淡的,哪有食堂2A的好次", "克里斯蒂娜·克罗尔说:不,我不是虎妈。我全家都热爱音乐,我也鼓励他们这么做。", "今日APPS:Sago Mini Toolbox培养孩子动手能力", "财政部副部长王保安调任国家统计局党组书记", "2.34米男子娶1.53米女粉丝 称夫妻生活没问题", "你看过穆赫兰道吗", "国办发布网络提速降费十四条指导意见 鼓励流量不清零", "乐视超级手机能否承载贾布斯的生态梦" };
    for (String sentence : sentenceArray) {
        List<Term> termList = segment.seg(sentence);
        System.out.println(termList);
    }
    /**
         * 内存CookBook:
         * HanLP内部有智能的内存池,对于同一个CRF模型(模型文件路径作为id区分),只要它没被释放或者内存充足,就不会重新加载。
         */
    for (int i = 0; i < 5; ++i) {
        segment = new CRFSegment();
    }
}
Also used : Term(com.hankcs.hanlp.seg.common.Term) Segment(com.hankcs.hanlp.seg.Segment) CRFSegment(com.hankcs.hanlp.seg.CRF.CRFSegment) CRFSegment(com.hankcs.hanlp.seg.CRF.CRFSegment)

Example 5 with CRFSegment

use of com.hankcs.hanlp.seg.CRF.CRFSegment in project HanLP by hankcs.

the class TestCRF method testEnglishAndNumber.

public void testEnglishAndNumber() throws Exception {
    String text = "2.34米";
    //        System.out.println(CRFSegment.atomSegment(text.toCharArray()));
    HanLP.Config.enableDebug();
    CRFSegment segment = new CRFSegment();
    System.out.println(segment.seg(text));
}
Also used : CRFSegment(com.hankcs.hanlp.seg.CRF.CRFSegment)

Aggregations

CRFSegment (com.hankcs.hanlp.seg.CRF.CRFSegment)6 Segment (com.hankcs.hanlp.seg.Segment)4 DijkstraSegment (com.hankcs.hanlp.seg.Dijkstra.DijkstraSegment)2 DoubleArrayTrieSegment (com.hankcs.hanlp.seg.Other.DoubleArrayTrieSegment)2 ViterbiSegment (com.hankcs.hanlp.seg.Viterbi.ViterbiSegment)2 Term (com.hankcs.hanlp.seg.common.Term)2 ResultTerm (com.hankcs.hanlp.seg.common.ResultTerm)1