Search in sources :

Example 11 with PratilipiFilter

use of com.pratilipi.common.util.PratilipiFilter in project pratilipi by Pratilipi.

the class PratilipiIdfApi method get.

@Get
public GenericResponse get(GenericRequest request) throws UnexpectedServerException {
    Date idfGenerationDate = new Date();
    DataAccessor dataAccessor = DataAccessorFactory.getDataAccessor();
    PratilipiFilter pratilipiFilter = new PratilipiFilter();
    String cursor = null;
    DataListCursorTuple<Long> pratilipiIdListCursorTupe = dataAccessor.getPratilipiIdList(pratilipiFilter, cursor, null, null);
    List<Long> pratilipiIdList = pratilipiIdListCursorTupe.getDataList();
    // Populate Keyword-Frequency map.
    final HashMap<String, Integer> keywordFrequencyMap = new HashMap<>();
    for (Long pratilipiId : pratilipiIdList) {
        String[] keywords = PratilipiDataUtil.getPratilipiKeywords(pratilipiId).split("\\s+");
        if (keywords == null)
            continue;
        for (String keyword : keywords) {
            if (keywordFrequencyMap.containsKey(keyword))
                keywordFrequencyMap.put(keyword, keywordFrequencyMap.get(keyword) + 1);
            else
                keywordFrequencyMap.put(keyword, 1);
        }
    }
    // Sort Keyword-Frequency map in descending order of frequency
    Comparator<String> comparator = new Comparator<String>() {

        @Override
        public int compare(String a, String b) {
            return keywordFrequencyMap.get(a) >= keywordFrequencyMap.get(b) ? -1 : 1;
        }
    };
    TreeMap<String, Integer> sortedKeywordFrequencyMap = new TreeMap<>(comparator);
    sortedKeywordFrequencyMap.putAll(keywordFrequencyMap);
    // Transform sorted map to csv string
    StringBuilder csv = new StringBuilder();
    for (Map.Entry<String, Integer> entry : sortedKeywordFrequencyMap.entrySet()) {
        csv.append(entry.getKey() + ",");
        csv.append(entry.getValue().toString() + ",");
        csv.append("\n");
    }
    // Persist csv string in BlobStore
    BlobAccessor blobAccessor = DataAccessorFactory.getBlobAccessor();
    BlobEntry blobEntry = blobAccessor.newBlob("pratilipi/" + new SimpleDateFormat("yyyy-MM-dd-HH:mm").format(idfGenerationDate) + "-idf.csv", null, "text/plain");
    blobEntry.setData(csv.toString().getBytes(Charset.forName("UTF-8")));
    blobAccessor.createOrUpdateBlob(blobEntry);
    logger.log(Level.INFO, "Generated IDF with " + keywordFrequencyMap.size() + " keywords.");
    return new GenericResponse();
}
Also used : HashMap(java.util.HashMap) GenericResponse(com.pratilipi.api.shared.GenericResponse) DataAccessor(com.pratilipi.data.DataAccessor) BlobEntry(com.pratilipi.data.type.BlobEntry) TreeMap(java.util.TreeMap) Date(java.util.Date) Comparator(java.util.Comparator) PratilipiFilter(com.pratilipi.common.util.PratilipiFilter) BlobAccessor(com.pratilipi.data.BlobAccessor) HashMap(java.util.HashMap) TreeMap(java.util.TreeMap) Map(java.util.Map) SimpleDateFormat(java.text.SimpleDateFormat) Get(com.pratilipi.api.annotation.Get)

Aggregations

PratilipiFilter (com.pratilipi.common.util.PratilipiFilter)11 DataAccessor (com.pratilipi.data.DataAccessor)7 Get (com.pratilipi.api.annotation.Get)6 GenericResponse (com.pratilipi.api.shared.GenericResponse)4 PratilipiData (com.pratilipi.data.client.PratilipiData)4 Pratilipi (com.pratilipi.data.type.Pratilipi)4 HashMap (java.util.HashMap)4 Gson (com.google.gson.Gson)2 JsonObject (com.google.gson.JsonObject)2 PratilipiListV2Api (com.pratilipi.api.impl.pratilipi.PratilipiListV2Api)2 InvalidArgumentException (com.pratilipi.common.exception.InvalidArgumentException)2 DocAccessor (com.pratilipi.data.DocAccessor)2 Author (com.pratilipi.data.type.Author)2 Event (com.pratilipi.data.type.Event)2 PratilipiContentDoc (com.pratilipi.data.type.PratilipiContentDoc)2 ArrayList (java.util.ArrayList)2 Date (java.util.Date)2 LinkedList (java.util.LinkedList)2 Language (com.pratilipi.common.type.Language)1 PratilipiState (com.pratilipi.common.type.PratilipiState)1