Search in sources :

Example 31 with Segment

use of com.hankcs.hanlp.seg.Segment in project HanLP by hankcs.

the class DemoTranslatedNameRecognition method main.

public static void main(String[] args) {
    String[] testCase = new String[] { "一桶冰水当头倒下,微软的比尔盖茨、Facebook的扎克伯格跟桑德博格、亚马逊的贝索斯、苹果的库克全都不惜湿身入镜,这些硅谷的科技人,飞蛾扑火似地牺牲演出,其实全为了慈善。", "世界上最长的姓名是简森·乔伊·亚历山大·比基·卡利斯勒·达夫·埃利奥特·福克斯·伊维鲁莫·马尔尼·梅尔斯·帕特森·汤普森·华莱士·普雷斯顿。" };
    Segment segment = HanLP.newSegment().enableTranslatedNameRecognize(true);
    for (String sentence : testCase) {
        List<Term> termList = segment.seg(sentence);
        System.out.println(termList);
    }
}
Also used : Term(com.hankcs.hanlp.seg.common.Term) Segment(com.hankcs.hanlp.seg.Segment)

Example 32 with Segment

use of com.hankcs.hanlp.seg.Segment in project HanLP by hankcs.

the class DemoCRFSegment method main.

public static void main(String[] args) {
    // 关闭词性显示
    HanLP.Config.ShowTermNature = false;
    Segment segment = new CRFSegment().enableCustomDictionary(false);
    String[] sentenceArray = new String[] { "HanLP是由一系列模型与算法组成的Java工具包,目标是普及自然语言处理在生产环境中的应用。", // 繁体无压力
    "鐵桿部隊憤怒情緒集結 馬英九腹背受敵", "馬英九回應連勝文“丐幫說”:稱黨內同志談話應謹慎", // 专业名词有一定辨识能力
    "高锰酸钾,强氧化剂,紫红色晶体,可溶于水,遇乙醇即被还原。常用作消毒剂、水净化剂、氧化剂、漂白剂、毒气吸收剂、二氧化碳精制剂等。", // 非新闻语料
    "《夜晚的骰子》通过描述浅草的舞女在暗夜中扔骰子的情景,寄托了作者对庶民生活区的情感", // 微博
    "这个像是真的[委屈]前面那个打扮太江户了,一点不上品...@hankcs", "鼎泰丰的小笼一点味道也没有...每样都淡淡的...淡淡的,哪有食堂2A的好次", "克里斯蒂娜·克罗尔说:不,我不是虎妈。我全家都热爱音乐,我也鼓励他们这么做。", "今日APPS:Sago Mini Toolbox培养孩子动手能力", "财政部副部长王保安调任国家统计局党组书记", "2.34米男子娶1.53米女粉丝 称夫妻生活没问题", "你看过穆赫兰道吗", "国办发布网络提速降费十四条指导意见 鼓励流量不清零", "乐视超级手机能否承载贾布斯的生态梦" };
    for (String sentence : sentenceArray) {
        List<Term> termList = segment.seg(sentence);
        System.out.println(termList);
    }
    /**
         * 内存CookBook:
         * HanLP内部有智能的内存池,对于同一个CRF模型(模型文件路径作为id区分),只要它没被释放或者内存充足,就不会重新加载。
         */
    for (int i = 0; i < 5; ++i) {
        segment = new CRFSegment();
    }
}
Also used : Term(com.hankcs.hanlp.seg.common.Term) Segment(com.hankcs.hanlp.seg.Segment) CRFSegment(com.hankcs.hanlp.seg.CRF.CRFSegment) CRFSegment(com.hankcs.hanlp.seg.CRF.CRFSegment)

Example 33 with Segment

use of com.hankcs.hanlp.seg.Segment in project HanLP by hankcs.

the class DemoJapaneseNameRecognition method main.

public static void main(String[] args) {
    String[] testCase = new String[] { "北川景子参演了林诣彬导演的《速度与激情3》", "林志玲亮相网友:确定不是波多野结衣?", "龟山千广和近藤公园在龟山公园里喝酒赏花" };
    Segment segment = HanLP.newSegment().enableJapaneseNameRecognize(true);
    for (String sentence : testCase) {
        List<Term> termList = segment.seg(sentence);
        System.out.println(termList);
    }
}
Also used : Term(com.hankcs.hanlp.seg.common.Term) Segment(com.hankcs.hanlp.seg.Segment)

Aggregations

Segment (com.hankcs.hanlp.seg.Segment)33 CRFSegment (com.hankcs.hanlp.seg.CRF.CRFSegment)20 DijkstraSegment (com.hankcs.hanlp.seg.Dijkstra.DijkstraSegment)20 ViterbiSegment (com.hankcs.hanlp.seg.Viterbi.ViterbiSegment)19 DoubleArrayTrieSegment (com.hankcs.hanlp.seg.Other.DoubleArrayTrieSegment)18 Term (com.hankcs.hanlp.seg.common.Term)12 NShortSegment (com.hankcs.hanlp.seg.NShort.NShortSegment)4 ResultTerm (com.hankcs.hanlp.seg.common.ResultTerm)4 HMMSegment (com.hankcs.hanlp.seg.HMM.HMMSegment)2 DictionaryMaker (com.hankcs.hanlp.corpus.dictionary.DictionaryMaker)1 Item (com.hankcs.hanlp.corpus.dictionary.item.Item)1 CharacterBasedGenerativeModelSegment (com.hankcs.hanlp.seg.CharacterBasedGenerativeModelSegment)1