Search in sources :

Example 1 with MMSeg

use of com.chenlb.mmseg4j.MMSeg in project java-basic by tzuyichao.

the class TestMMSeg4J method main.

public static void main(String[] args) throws IOException {
    Dictionary dictionary = Dictionary.getInstance();
    MMSeg mmSeg = new MMSeg(new StringReader("上一堂課之後跑18km與2500rpm的挑戰"), new ComplexSeg(dictionary));
    Word word = null;
    boolean first = true;
    while ((word = mmSeg.next()) != null) {
        System.out.println(word.getString());
    }
}
Also used : Dictionary(com.chenlb.mmseg4j.Dictionary) Word(com.chenlb.mmseg4j.Word) MMSeg(com.chenlb.mmseg4j.MMSeg) ComplexSeg(com.chenlb.mmseg4j.ComplexSeg) StringReader(java.io.StringReader)

Example 2 with MMSeg

use of com.chenlb.mmseg4j.MMSeg in project jstarcraft-nlp by HongZhaoHua.

the class MmsegSegmentFactory method getNlpTokenizer.

@Override
protected NlpTokenizer<? extends NlpToken> getNlpTokenizer(Map<String, String> configurations) {
    MMSeg segment = build(configurations);
    MmsegTokenizer tokenizer = new MmsegTokenizer(segment);
    return tokenizer;
}
Also used : MMSeg(com.chenlb.mmseg4j.MMSeg) MmsegTokenizer(com.jstarcraft.nlp.tokenization.mmseg.MmsegTokenizer)

Example 3 with MMSeg

use of com.chenlb.mmseg4j.MMSeg in project jstarcraft-nlp by HongZhaoHua.

the class MmsegSegmentFactory method build.

@Override
public MMSeg build(Map<String, String> configurations) {
    Dictionary dictionary;
    String dictionaryPath = get(configurations, "dictionaryPath");
    if (StringUtility.isBlank(dictionaryPath)) {
        dictionary = Dictionary.getInstance();
    } else {
        File file = new File(dictionaryPath);
        dictionary = Dictionary.getInstance(file);
    }
    String configuration = get(configurations, "mode", "MaxWord");
    Seg seg = null;
    switch(configuration) {
        case "Complex":
            seg = new ComplexSeg(dictionary);
            break;
        case "Simple":
            seg = new SimpleSeg(dictionary);
            break;
        case "MaxWord":
            seg = new MaxWordSeg(dictionary);
            break;
        default:
            throw new IllegalArgumentException();
    }
    MMSeg mmSeg = new MMSeg(new StringReader(""), seg);
    return mmSeg;
}
Also used : Dictionary(com.chenlb.mmseg4j.Dictionary) SimpleSeg(com.chenlb.mmseg4j.SimpleSeg) MaxWordSeg(com.chenlb.mmseg4j.MaxWordSeg) ComplexSeg(com.chenlb.mmseg4j.ComplexSeg) SimpleSeg(com.chenlb.mmseg4j.SimpleSeg) Seg(com.chenlb.mmseg4j.Seg) MMSeg(com.chenlb.mmseg4j.MMSeg) ComplexSeg(com.chenlb.mmseg4j.ComplexSeg) MMSeg(com.chenlb.mmseg4j.MMSeg) MaxWordSeg(com.chenlb.mmseg4j.MaxWordSeg) StringReader(java.io.StringReader) File(java.io.File)

Example 4 with MMSeg

use of com.chenlb.mmseg4j.MMSeg in project jstarcraft-nlp by HongZhaoHua.

the class MmsegTokenizerTestCase method getTokenizer.

@Override
protected NlpTokenizer<? extends NlpToken> getTokenizer() {
    Dictionary dictionary = Dictionary.getInstance();
    ComplexSeg complex = new ComplexSeg(dictionary);
    MMSeg mmSeg = new MMSeg(new StringReader(""), complex);
    return new MmsegTokenizer(mmSeg);
}
Also used : Dictionary(com.chenlb.mmseg4j.Dictionary) ComplexSeg(com.chenlb.mmseg4j.ComplexSeg) MMSeg(com.chenlb.mmseg4j.MMSeg) StringReader(java.io.StringReader) MmsegTokenizer(com.jstarcraft.nlp.tokenization.mmseg.MmsegTokenizer)

Example 5 with MMSeg

use of com.chenlb.mmseg4j.MMSeg in project incubator-hugegraph by apache.

the class MMSeg4JAnalyzer method segment.

@Override
public Set<String> segment(String text) {
    Set<String> result = InsertionOrderUtil.newSet();
    MMSeg mmSeg = new MMSeg(new StringReader(text), this.seg);
    try {
        Word word = null;
        while ((word = mmSeg.next()) != null) {
            result.add(word.getString());
        }
    } catch (Exception e) {
        throw new HugeException("MMSeg4j segment text '%s' failed", e, text);
    }
    return result;
}
Also used : Word(com.chenlb.mmseg4j.Word) MMSeg(com.chenlb.mmseg4j.MMSeg) StringReader(java.io.StringReader) HugeException(com.baidu.hugegraph.HugeException) HugeException(com.baidu.hugegraph.HugeException) ConfigException(com.baidu.hugegraph.config.ConfigException)

Aggregations

MMSeg (com.chenlb.mmseg4j.MMSeg)5 StringReader (java.io.StringReader)4 ComplexSeg (com.chenlb.mmseg4j.ComplexSeg)3 Dictionary (com.chenlb.mmseg4j.Dictionary)3 Word (com.chenlb.mmseg4j.Word)2 MmsegTokenizer (com.jstarcraft.nlp.tokenization.mmseg.MmsegTokenizer)2 HugeException (com.baidu.hugegraph.HugeException)1 ConfigException (com.baidu.hugegraph.config.ConfigException)1 MaxWordSeg (com.chenlb.mmseg4j.MaxWordSeg)1 Seg (com.chenlb.mmseg4j.Seg)1 SimpleSeg (com.chenlb.mmseg4j.SimpleSeg)1 File (java.io.File)1