Search in sources :

Example 1 with ESIndex

use of edu.neu.ccs.pyramid.elasticsearch.ESIndex in project pyramid by cheng-li.

the class App1 method keywordsFilter.

//    public static String splitListToString(List<String> splitValues){
//        String splitValueAll = "";
//        for (int i=0;i<splitValues.size();i++){
//            splitValueAll = splitValueAll+splitValues.get(i);
//            if (i<splitValues.size()-1){
//                splitValueAll = splitValueAll+"_";
//            }
//        }
//        return splitValueAll;
//    }
/**
     * filter ngrams by given unigrams in the file
     * do not filter unigram candidates
     */
private static Set<Ngram> keywordsFilter(Config config, ESIndex index, Set<Ngram> ngrams) throws IOException {
    String externalKeywordsFile = config.getString("train.feature.filterNgrams.keyWordsFile");
    List<String> lines = FileUtils.readLines(new File(externalKeywordsFile));
    String analyzer = config.getString("train.feature.analyzer");
    Set<String> keywords = new HashSet<>();
    for (String line : lines) {
        keywords.add(index.analyze(line, analyzer).getNgram());
    }
    return ngrams.stream().parallel().filter(ngram -> ngram.getN() == 1 || containsKeyWords(ngram, keywords)).collect(Collectors.toSet());
}
Also used : java.util.logging(java.util.logging) java.util(java.util) BoundedBlockPriorityQueue(edu.neu.ccs.pyramid.util.BoundedBlockPriorityQueue) Multiset(com.google.common.collect.Multiset) NgramEnumerator(edu.neu.ccs.pyramid.feature_extraction.NgramEnumerator) edu.neu.ccs.pyramid.feature(edu.neu.ccs.pyramid.feature) Pair(edu.neu.ccs.pyramid.util.Pair) Config(edu.neu.ccs.pyramid.configuration.Config) FeatureLoader(edu.neu.ccs.pyramid.elasticsearch.FeatureLoader) Terms(org.elasticsearch.search.aggregations.bucket.terms.Terms) BufferedWriter(java.io.BufferedWriter) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper) FileWriter(java.io.FileWriter) FileUtils(org.apache.commons.io.FileUtils) IOException(java.io.IOException) Collectors(java.util.stream.Collectors) File(java.io.File) MultiLabelIndex(edu.neu.ccs.pyramid.elasticsearch.MultiLabelIndex) ConcurrentHashMultiset(com.google.common.collect.ConcurrentHashMultiset) ESIndex(edu.neu.ccs.pyramid.elasticsearch.ESIndex) NgramTemplate(edu.neu.ccs.pyramid.feature_extraction.NgramTemplate) Serialization(edu.neu.ccs.pyramid.util.Serialization) Paths(java.nio.file.Paths) edu.neu.ccs.pyramid.dataset(edu.neu.ccs.pyramid.dataset) StumpSelector(edu.neu.ccs.pyramid.feature_extraction.StumpSelector) Pattern(java.util.regex.Pattern) File(java.io.File)

Example 2 with ESIndex

use of edu.neu.ccs.pyramid.elasticsearch.ESIndex in project pyramid by cheng-li.

the class IndexChecker method main.

public static void main(String[] args) throws Exception {
    if (args.length != 1) {
        throw new IllegalArgumentException("Please specify a properties file.");
    }
    Config config = new Config(args[0]);
    System.out.println(config);
    ESIndex index = loadIndex(config);
    List<String> fields = config.getStrings("fieldsToCheck");
    for (String field : fields) {
        check(index, field);
    }
    for (String field : fields) {
        checkEmpty(index, field);
    }
    index.close();
}
Also used : Config(edu.neu.ccs.pyramid.configuration.Config) ESIndex(edu.neu.ccs.pyramid.elasticsearch.ESIndex)

Example 3 with ESIndex

use of edu.neu.ccs.pyramid.elasticsearch.ESIndex in project pyramid by cheng-li.

the class IndexChecker method loadIndex.

static ESIndex loadIndex(Config config) throws Exception {
    ESIndex.Builder builder = new ESIndex.Builder().setIndexName(config.getString("index.indexName")).setClusterName(config.getString("index.clusterName")).setClientType(config.getString("index.clientType")).setDocumentType(config.getString("index.documentType"));
    if (config.getString("index.clientType").equals("transport")) {
        String[] hosts = config.getString("index.hosts").split(Pattern.quote(","));
        String[] ports = config.getString("index.ports").split(Pattern.quote(","));
        builder.addHostsAndPorts(hosts, ports);
    }
    ESIndex index = builder.build();
    System.out.println("index loaded");
    System.out.println("there are " + index.getNumDocs() + " documents in the index.");
    return index;
}
Also used : ESIndex(edu.neu.ccs.pyramid.elasticsearch.ESIndex)

Aggregations

ESIndex (edu.neu.ccs.pyramid.elasticsearch.ESIndex)3 Config (edu.neu.ccs.pyramid.configuration.Config)2 ObjectMapper (com.fasterxml.jackson.databind.ObjectMapper)1 ConcurrentHashMultiset (com.google.common.collect.ConcurrentHashMultiset)1 Multiset (com.google.common.collect.Multiset)1 edu.neu.ccs.pyramid.dataset (edu.neu.ccs.pyramid.dataset)1 FeatureLoader (edu.neu.ccs.pyramid.elasticsearch.FeatureLoader)1 MultiLabelIndex (edu.neu.ccs.pyramid.elasticsearch.MultiLabelIndex)1 edu.neu.ccs.pyramid.feature (edu.neu.ccs.pyramid.feature)1 NgramEnumerator (edu.neu.ccs.pyramid.feature_extraction.NgramEnumerator)1 NgramTemplate (edu.neu.ccs.pyramid.feature_extraction.NgramTemplate)1 StumpSelector (edu.neu.ccs.pyramid.feature_extraction.StumpSelector)1 BoundedBlockPriorityQueue (edu.neu.ccs.pyramid.util.BoundedBlockPriorityQueue)1 Pair (edu.neu.ccs.pyramid.util.Pair)1 Serialization (edu.neu.ccs.pyramid.util.Serialization)1 BufferedWriter (java.io.BufferedWriter)1 File (java.io.File)1 FileWriter (java.io.FileWriter)1 IOException (java.io.IOException)1 Paths (java.nio.file.Paths)1