Search in sources :

Example 11 with CoNLLWord

use of com.hankcs.hanlp.corpus.dependency.CoNll.CoNLLWord in project HanLP by hankcs.

the class MaxEntDependencyModelMaker method makeModel.

public static boolean makeModel(String corpusLoadPath, String modelSavePath) throws IOException {
    BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(IOUtil.newOutputStream(modelSavePath)));
    LinkedList<CoNLLSentence> sentenceList = CoNLLLoader.loadSentenceList(corpusLoadPath);
    int id = 1;
    for (CoNLLSentence sentence : sentenceList) {
        System.out.printf("%d / %d...", id++, sentenceList.size());
        String[][] edgeArray = sentence.getEdgeArray();
        CoNLLWord[] word = sentence.getWordArrayWithRoot();
        for (int i = 0; i < word.length; ++i) {
            for (int j = 0; j < word.length; ++j) {
                if (i == j)
                    continue;
                // 这就是一个边的实例,从i出发,到j,当然它可能存在也可能不存在,不存在取null照样是一个实例
                List<String> contextList = new LinkedList<String>();
                // 先生成i和j的原子特征
                contextList.addAll(generateSingleWordContext(word, i, "i"));
                contextList.addAll(generateSingleWordContext(word, j, "j"));
                // 然后生成二元组的特征
                contextList.addAll(generateUniContext(word, i, j));
                // 将特征字符串化
                for (String f : contextList) {
                    bw.write(f);
                    bw.write(' ');
                }
                // 事件名称为依存关系
                bw.write("" + edgeArray[i][j]);
                bw.newLine();
            }
        }
        System.out.println("done.");
    }
    bw.close();
    return true;
}
Also used : CoNLLWord(com.hankcs.hanlp.corpus.dependency.CoNll.CoNLLWord) CoNLLSentence(com.hankcs.hanlp.corpus.dependency.CoNll.CoNLLSentence)

Example 12 with CoNLLWord

use of com.hankcs.hanlp.corpus.dependency.CoNll.CoNLLWord in project HanLP by hankcs.

the class WordNatureWeightModelMaker method makeModel.

public static boolean makeModel(String corpusLoadPath, String modelSavePath) {
    Set<String> posSet = new TreeSet<String>();
    DictionaryMaker dictionaryMaker = new DictionaryMaker();
    for (CoNLLSentence sentence : CoNLLLoader.loadSentenceList(corpusLoadPath)) {
        for (CoNLLWord word : sentence.word) {
            addPair(word.NAME, word.HEAD.NAME, word.DEPREL, dictionaryMaker);
            addPair(word.NAME, wrapTag(word.HEAD.POSTAG), word.DEPREL, dictionaryMaker);
            addPair(wrapTag(word.POSTAG), word.HEAD.NAME, word.DEPREL, dictionaryMaker);
            addPair(wrapTag(word.POSTAG), wrapTag(word.HEAD.POSTAG), word.DEPREL, dictionaryMaker);
            posSet.add(word.POSTAG);
        }
    }
    for (CoNLLSentence sentence : CoNLLLoader.loadSentenceList(corpusLoadPath)) {
        for (CoNLLWord word : sentence.word) {
            addPair(word.NAME, word.HEAD.NAME, word.DEPREL, dictionaryMaker);
            addPair(word.NAME, wrapTag(word.HEAD.POSTAG), word.DEPREL, dictionaryMaker);
            addPair(wrapTag(word.POSTAG), word.HEAD.NAME, word.DEPREL, dictionaryMaker);
            addPair(wrapTag(word.POSTAG), wrapTag(word.HEAD.POSTAG), word.DEPREL, dictionaryMaker);
            posSet.add(word.POSTAG);
        }
    }
    StringBuilder sb = new StringBuilder();
    for (String pos : posSet) {
        sb.append("case \"" + pos + "\":\n");
    }
    IOUtil.saveTxt("data/model/dependency/pos-thu.txt", sb.toString());
    return dictionaryMaker.saveTxtTo(modelSavePath);
}
Also used : TreeSet(java.util.TreeSet) CoNLLWord(com.hankcs.hanlp.corpus.dependency.CoNll.CoNLLWord) CoNLLSentence(com.hankcs.hanlp.corpus.dependency.CoNll.CoNLLSentence) DictionaryMaker(com.hankcs.hanlp.corpus.dictionary.DictionaryMaker)

Aggregations

CoNLLWord (com.hankcs.hanlp.corpus.dependency.CoNll.CoNLLWord)12 CoNLLSentence (com.hankcs.hanlp.corpus.dependency.CoNll.CoNLLSentence)10 Term (com.hankcs.hanlp.seg.common.Term)4 DictionaryMaker (com.hankcs.hanlp.corpus.dictionary.DictionaryMaker)2 Evaluator (com.hankcs.hanlp.corpus.dependency.CoNll.Evaluator)1 Item (com.hankcs.hanlp.corpus.dictionary.item.Item)1 Edge (com.hankcs.hanlp.dependency.common.Edge)1 Node (com.hankcs.hanlp.dependency.common.Node)1 State (com.hankcs.hanlp.dependency.common.State)1 Table (com.hankcs.hanlp.model.crf.Table)1 BufferedWriter (java.io.BufferedWriter)1 FileOutputStream (java.io.FileOutputStream)1 OutputStreamWriter (java.io.OutputStreamWriter)1 ArrayList (java.util.ArrayList)1 LinkedList (java.util.LinkedList)1 PriorityQueue (java.util.PriorityQueue)1 TreeSet (java.util.TreeSet)1