Search in sources :

Example 1 with ESDriver

use of org.apache.sdap.mudrod.driver.ESDriver in project incubator-sdap-mudrod by apache.

the class CrawlerDetection method checkByRateInParallel.

void checkByRateInParallel() throws InterruptedException, IOException {
    JavaRDD<String> userRDD = getUserRDD(this.httpType);
    LOG.info("Original User count: {}", userRDD.count());
    int userCount = 0;
    userCount = userRDD.mapPartitions((FlatMapFunction<Iterator<String>, Integer>) iterator -> {
        ESDriver tmpES = new ESDriver(props);
        tmpES.createBulkProcessor();
        List<Integer> realUserNums = new ArrayList<>();
        while (iterator.hasNext()) {
            String s = iterator.next();
            Integer realUser = checkByRate(tmpES, s);
            realUserNums.add(realUser);
        }
        tmpES.destroyBulkProcessor();
        tmpES.close();
        return realUserNums.iterator();
    }).reduce((Function2<Integer, Integer, Integer>) (a, b) -> a + b);
    LOG.info("User count: {}", Integer.toString(userCount));
}
Also used : java.util(java.util) Function2(org.apache.spark.api.java.function.Function2) AggregationBuilder(org.elasticsearch.search.aggregations.AggregationBuilder) MudrodConstants(org.apache.sdap.mudrod.main.MudrodConstants) DiscoveryStepAbstract(org.apache.sdap.mudrod.discoveryengine.DiscoveryStepAbstract) Histogram(org.elasticsearch.search.aggregations.bucket.histogram.Histogram) LoggerFactory(org.slf4j.LoggerFactory) QueryBuilders(org.elasticsearch.index.query.QueryBuilders) IndexRequest(org.elasticsearch.action.index.IndexRequest) Matcher(java.util.regex.Matcher) Seconds(org.joda.time.Seconds) TimeValue(org.elasticsearch.common.unit.TimeValue) SearchResponse(org.elasticsearch.action.search.SearchResponse) JavaRDD(org.apache.spark.api.java.JavaRDD) FlatMapFunction(org.apache.spark.api.java.function.FlatMapFunction) DateHistogramInterval(org.elasticsearch.search.aggregations.bucket.histogram.DateHistogramInterval) SearchHit(org.elasticsearch.search.SearchHit) ISODateTimeFormat(org.joda.time.format.ISODateTimeFormat) Logger(org.slf4j.Logger) DateTimeFormatter(org.joda.time.format.DateTimeFormatter) Terms(org.elasticsearch.search.aggregations.bucket.terms.Terms) DateTime(org.joda.time.DateTime) AggregationBuilders(org.elasticsearch.search.aggregations.AggregationBuilders) IOException(java.io.IOException) ESDriver(org.apache.sdap.mudrod.driver.ESDriver) SparkDriver(org.apache.sdap.mudrod.driver.SparkDriver) Pattern(java.util.regex.Pattern) BoolQueryBuilder(org.elasticsearch.index.query.BoolQueryBuilder) Order(org.elasticsearch.search.aggregations.bucket.histogram.Histogram.Order) ESDriver(org.apache.sdap.mudrod.driver.ESDriver) Function2(org.apache.spark.api.java.function.Function2)

Example 2 with ESDriver

use of org.apache.sdap.mudrod.driver.ESDriver in project incubator-sdap-mudrod by apache.

the class SessionGenerator method combineShortSessionsInParallel.

public void combineShortSessionsInParallel(int timeThres) throws InterruptedException, IOException {
    JavaRDD<String> userRDD = getUserRDD(this.cleanupType);
    userRDD.foreachPartition(new VoidFunction<Iterator<String>>() {

        /**
         */
        private static final long serialVersionUID = 1L;

        @Override
        public void call(Iterator<String> arg0) throws Exception {
            ESDriver tmpES = new ESDriver(props);
            tmpES.createBulkProcessor();
            while (arg0.hasNext()) {
                String s = arg0.next();
                combineShortSessions(tmpES, s, timeThres);
            }
            tmpES.destroyBulkProcessor();
            tmpES.close();
        }
    });
}
Also used : ESDriver(org.apache.sdap.mudrod.driver.ESDriver) ElasticsearchException(org.elasticsearch.ElasticsearchException) IOException(java.io.IOException)

Example 3 with ESDriver

use of org.apache.sdap.mudrod.driver.ESDriver in project incubator-sdap-mudrod by apache.

the class SessionGenerator method genSessionByRefererInParallel.

public void genSessionByRefererInParallel(int timeThres) throws InterruptedException, IOException {
    JavaRDD<String> userRDD = getUserRDD(this.cleanupType);
    int sessionCount = 0;
    sessionCount = userRDD.mapPartitions(new FlatMapFunction<Iterator<String>, Integer>() {

        /**
         */
        private static final long serialVersionUID = 1L;

        @Override
        public Iterator<Integer> call(Iterator<String> arg0) throws Exception {
            ESDriver tmpES = new ESDriver(props);
            tmpES.createBulkProcessor();
            List<Integer> sessionNums = new ArrayList<>();
            while (arg0.hasNext()) {
                String s = arg0.next();
                Integer sessionNum = genSessionByReferer(tmpES, s, timeThres);
                sessionNums.add(sessionNum);
            }
            tmpES.destroyBulkProcessor();
            tmpES.close();
            return sessionNums.iterator();
        }
    }).reduce(new Function2<Integer, Integer, Integer>() {

        /**
         */
        private static final long serialVersionUID = 1L;

        @Override
        public Integer call(Integer a, Integer b) {
            return a + b;
        }
    });
    LOG.info("Initial Session count: {}", Integer.toString(sessionCount));
}
Also used : ESDriver(org.apache.sdap.mudrod.driver.ESDriver) Function2(org.apache.spark.api.java.function.Function2) ElasticsearchException(org.elasticsearch.ElasticsearchException) IOException(java.io.IOException)

Example 4 with ESDriver

use of org.apache.sdap.mudrod.driver.ESDriver in project incubator-sdap-mudrod by apache.

the class SessionExtractor method extractRankingTrainDataInParallel.

protected JavaRDD<RankingTrainData> extractRankingTrainDataInParallel(Properties props, SparkDriver spark, ESDriver es) {
    List<String> logIndexList = es.getIndexListWithPrefix(props.getProperty(MudrodConstants.LOG_INDEX));
    LOG.info(logIndexList.toString());
    List<String> sessionIdList = new ArrayList<>();
    for (String logIndex : logIndexList) {
        List<String> tmpsessionList = this.getSessions(props, es, logIndex);
        sessionIdList.addAll(tmpsessionList);
    }
    JavaRDD<String> sessionRDD = spark.sc.parallelize(sessionIdList, 16);
    JavaRDD<RankingTrainData> clickStreamRDD = sessionRDD.mapPartitions(new FlatMapFunction<Iterator<String>, RankingTrainData>() {

        /**
         */
        private static final long serialVersionUID = 1L;

        @Override
        public Iterator<RankingTrainData> call(Iterator<String> arg0) throws Exception {
            ESDriver tmpES = new ESDriver(props);
            tmpES.createBulkProcessor();
            Session session = new Session(props, tmpES);
            List<RankingTrainData> clickstreams = new ArrayList<>();
            while (arg0.hasNext()) {
                String s = arg0.next();
                String[] sArr = s.split(",");
                List<RankingTrainData> clicks = session.getRankingTrainData(sArr[1], sArr[2], sArr[0]);
                clickstreams.addAll(clicks);
            }
            tmpES.destroyBulkProcessor();
            tmpES.close();
            return clickstreams.iterator();
        }
    });
    LOG.info("Clickstream number: {}", clickStreamRDD.count());
    return clickStreamRDD;
}
Also used : ESDriver(org.apache.sdap.mudrod.driver.ESDriver) ArrayList(java.util.ArrayList) Iterator(java.util.Iterator) ArrayList(java.util.ArrayList) List(java.util.List)

Example 5 with ESDriver

use of org.apache.sdap.mudrod.driver.ESDriver in project incubator-sdap-mudrod by apache.

the class MudrodContextListener method contextInitialized.

/**
 * @see ServletContextListener#contextInitialized(ServletContextEvent)
 */
@Override
public void contextInitialized(ServletContextEvent arg0) {
    me = new MudrodEngine();
    Properties props = me.loadConfig();
    me.setESDriver(new ESDriver(props));
    me.setSparkDriver(new SparkDriver(props));
    ServletContext ctx = arg0.getServletContext();
    Searcher searcher = new Searcher(props, me.getESDriver(), null);
    Ranker ranker = new Ranker(props, me.getESDriver(), me.getSparkDriver());
    ctx.setAttribute("MudrodInstance", me);
    ctx.setAttribute("MudrodSearcher", searcher);
    ctx.setAttribute("MudrodRanker", ranker);
}
Also used : MudrodEngine(org.apache.sdap.mudrod.main.MudrodEngine) ESDriver(org.apache.sdap.mudrod.driver.ESDriver) SparkDriver(org.apache.sdap.mudrod.driver.SparkDriver) Searcher(org.apache.sdap.mudrod.ssearch.Searcher) ServletContext(javax.servlet.ServletContext) Properties(java.util.Properties) Ranker(org.apache.sdap.mudrod.ssearch.Ranker)

Aggregations

ESDriver (org.apache.sdap.mudrod.driver.ESDriver)9 IOException (java.io.IOException)5 SparkDriver (org.apache.sdap.mudrod.driver.SparkDriver)3 Function2 (org.apache.spark.api.java.function.Function2)3 ArrayList (java.util.ArrayList)2 Iterator (java.util.Iterator)2 List (java.util.List)2 MudrodEngine (org.apache.sdap.mudrod.main.MudrodEngine)2 ElasticsearchException (org.elasticsearch.ElasticsearchException)2 JsonObject (com.google.gson.JsonObject)1 java.util (java.util)1 Properties (java.util.Properties)1 ExecutionException (java.util.concurrent.ExecutionException)1 Matcher (java.util.regex.Matcher)1 Pattern (java.util.regex.Pattern)1 ServletContext (javax.servlet.ServletContext)1 CommandLine (org.apache.commons.cli.CommandLine)1 CommandLineParser (org.apache.commons.cli.CommandLineParser)1 GnuParser (org.apache.commons.cli.GnuParser)1 HelpFormatter (org.apache.commons.cli.HelpFormatter)1