Search in sources :

Example 1 with Emit

use of com.hankcs.hanlp.algoritm.ahocorasick.trie.Emit in project HanLP by hankcs.

the class AhoCorasickDoubleArrayTrieTest method testTwoAC.

public void testTwoAC() throws Exception {
    TreeMap<String, String> map = new TreeMap<String, String>();
    IOUtil.LineIterator iterator = new IOUtil.LineIterator("data/dictionary/CoreNatureDictionary.mini.txt");
    while (iterator.hasNext()) {
        String line = iterator.next().split("\\s")[0];
        map.put(line, line);
    }
    Trie trie = new Trie();
    trie.addAllKeyword(map.keySet());
    AhoCorasickDoubleArrayTrie<String> act = new AhoCorasickDoubleArrayTrie<String>();
    act.build(map);
    for (String key : map.keySet()) {
        Collection<Emit> emits = trie.parseText(key);
        Set<String> otherSet = new HashSet<String>();
        for (Emit emit : emits) {
            otherSet.add(emit.getKeyword() + emit.getEnd());
        }
        List<AhoCorasickDoubleArrayTrie<String>.Hit<String>> entries = act.parseText(key);
        Set<String> mySet = new HashSet<String>();
        for (AhoCorasickDoubleArrayTrie<String>.Hit<String> entry : entries) {
            mySet.add(entry.value + (entry.end - 1));
        }
        assertEquals(otherSet, mySet);
    }
}
Also used : Emit(com.hankcs.hanlp.algoritm.ahocorasick.trie.Emit) AhoCorasickDoubleArrayTrie(com.hankcs.hanlp.collection.AhoCorasick.AhoCorasickDoubleArrayTrie) IOUtil(com.hankcs.hanlp.corpus.io.IOUtil) Trie(com.hankcs.hanlp.algoritm.ahocorasick.trie.Trie) AhoCorasickDoubleArrayTrie(com.hankcs.hanlp.collection.AhoCorasick.AhoCorasickDoubleArrayTrie) DoubleArrayTrie(com.hankcs.hanlp.collection.trie.DoubleArrayTrie)

Example 2 with Emit

use of com.hankcs.hanlp.algoritm.ahocorasick.trie.Emit in project HanLP by hankcs.

the class AhoCorasickDoubleArrayTrieTest method testAC.

public void testAC() throws Exception {
    Trie trie = new Trie();
    trie.addKeyword("hers");
    trie.addKeyword("his");
    trie.addKeyword("she");
    trie.addKeyword("he");
    Collection<Emit> emits = trie.parseText("ushers");
    System.out.println(emits);
}
Also used : Emit(com.hankcs.hanlp.algoritm.ahocorasick.trie.Emit) Trie(com.hankcs.hanlp.algoritm.ahocorasick.trie.Trie) AhoCorasickDoubleArrayTrie(com.hankcs.hanlp.collection.AhoCorasick.AhoCorasickDoubleArrayTrie) DoubleArrayTrie(com.hankcs.hanlp.collection.trie.DoubleArrayTrie)

Aggregations

Emit (com.hankcs.hanlp.algoritm.ahocorasick.trie.Emit)2 Trie (com.hankcs.hanlp.algoritm.ahocorasick.trie.Trie)2 AhoCorasickDoubleArrayTrie (com.hankcs.hanlp.collection.AhoCorasick.AhoCorasickDoubleArrayTrie)2 DoubleArrayTrie (com.hankcs.hanlp.collection.trie.DoubleArrayTrie)2 IOUtil (com.hankcs.hanlp.corpus.io.IOUtil)1