Search in sources :

Example 6 with ScoredDocuments

use of io.anserini.rerank.ScoredDocuments in project Anserini by castorini.

the class PyseriniEntryPoint method search.

/**
   * Prints TREC submission file to the standard output stream.
   *
   * @param topics     queries
   * @param similarity similarity
   * @throws IOException
   * @throws ParseException
   */
public Map<String, Float> search(SortedMap<Integer, String> topics, Similarity similarity, int numHits, RerankerCascade cascade, boolean useQueryParser, boolean keepstopwords) throws IOException, ParseException {
    Map<String, Float> scoredDocs = new LinkedHashMap<>();
    IndexSearcher searcher = new IndexSearcher(reader);
    searcher.setSimilarity(similarity);
    EnglishAnalyzer ea = keepstopwords ? new EnglishAnalyzer(CharArraySet.EMPTY_SET) : new EnglishAnalyzer();
    QueryParser queryParser = new QueryParser(FIELD_BODY, ea);
    queryParser.setDefaultOperator(QueryParser.Operator.OR);
    for (Map.Entry<Integer, String> entry : topics.entrySet()) {
        int qID = entry.getKey();
        String queryString = entry.getValue();
        Query query = useQueryParser ? queryParser.parse(queryString) : AnalyzerUtils.buildBagOfWordsQuery(FIELD_BODY, ea, queryString);
        TopDocs rs = searcher.search(query, numHits);
        ScoreDoc[] hits = rs.scoreDocs;
        List<String> queryTokens = AnalyzerUtils.tokenize(ea, queryString);
        RerankerContext context = new RerankerContext(searcher, query, String.valueOf(qID), queryString, queryTokens, FIELD_BODY, null);
        ScoredDocuments docs = cascade.run(ScoredDocuments.fromTopDocs(rs, searcher), context);
        for (int i = 0; i < docs.documents.length; i++) {
            String docid = docs.documents[i].getField(FIELD_ID).stringValue();
            float score = docs.scores[i];
            scoredDocs.put(docid, score);
        }
    }
    return scoredDocs;
}
Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) Query(org.apache.lucene.search.Query) ScoredDocuments(io.anserini.rerank.ScoredDocuments) EnglishAnalyzer(org.apache.lucene.analysis.en.EnglishAnalyzer) ScoreDoc(org.apache.lucene.search.ScoreDoc) TopDocs(org.apache.lucene.search.TopDocs) QueryParser(org.apache.lucene.queryparser.classic.QueryParser) RerankerContext(io.anserini.rerank.RerankerContext)

Example 7 with ScoredDocuments

use of io.anserini.rerank.ScoredDocuments in project Anserini by castorini.

the class RetrieveSentences method search.

public Map<String, Float> search(SortedMap<Integer, String> topics, int numHits) throws IOException, ParseException {
    IndexSearcher searcher = new IndexSearcher(reader);
    //using BM25 scoring model
    Similarity similarity = new BM25Similarity(0.9f, 0.4f);
    searcher.setSimilarity(similarity);
    EnglishAnalyzer ea = new EnglishAnalyzer();
    QueryParser queryParser = new QueryParser(FIELD_BODY, ea);
    queryParser.setDefaultOperator(QueryParser.Operator.OR);
    Map<String, Float> scoredDocs = new LinkedHashMap<>();
    for (Map.Entry<Integer, String> entry : topics.entrySet()) {
        int qID = entry.getKey();
        String queryString = entry.getValue();
        Query query = AnalyzerUtils.buildBagOfWordsQuery(FIELD_BODY, ea, queryString);
        TopDocs rs = searcher.search(query, numHits);
        ScoreDoc[] hits = rs.scoreDocs;
        ScoredDocuments docs = ScoredDocuments.fromTopDocs(rs, searcher);
        for (int i = 0; i < docs.documents.length; i++) {
            scoredDocs.put(docs.documents[i].getField(FIELD_ID).stringValue(), docs.scores[i]);
        }
    }
    return scoredDocs;
}
Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) Similarity(org.apache.lucene.search.similarities.Similarity) BM25Similarity(org.apache.lucene.search.similarities.BM25Similarity) Query(org.apache.lucene.search.Query) ScoredDocuments(io.anserini.rerank.ScoredDocuments) EnglishAnalyzer(org.apache.lucene.analysis.en.EnglishAnalyzer) ScoreDoc(org.apache.lucene.search.ScoreDoc) TopDocs(org.apache.lucene.search.TopDocs) QueryParser(org.apache.lucene.queryparser.classic.QueryParser) BM25Similarity(org.apache.lucene.search.similarities.BM25Similarity)

Aggregations

ScoredDocuments (io.anserini.rerank.ScoredDocuments)7 Query (org.apache.lucene.search.Query)5 TopDocs (org.apache.lucene.search.TopDocs)5 QueryParser (org.apache.lucene.queryparser.classic.QueryParser)4 IndexSearcher (org.apache.lucene.search.IndexSearcher)4 RerankerContext (io.anserini.rerank.RerankerContext)3 EnglishAnalyzer (org.apache.lucene.analysis.en.EnglishAnalyzer)3 ScoreDoc (org.apache.lucene.search.ScoreDoc)3 BM25Similarity (org.apache.lucene.search.similarities.BM25Similarity)3 Similarity (org.apache.lucene.search.similarities.Similarity)2 TweetsLtrDataGenerator (io.anserini.ltr.TweetsLtrDataGenerator)1 FeatureExtractors (io.anserini.ltr.feature.FeatureExtractors)1 RankLibReranker (io.anserini.rerank.RankLibReranker)1 RerankerCascade (io.anserini.rerank.RerankerCascade)1 Rm3Reranker (io.anserini.rerank.rm3.Rm3Reranker)1 RemoveRetweetsTemporalTiebreakReranker (io.anserini.rerank.twitter.RemoveRetweetsTemporalTiebreakReranker)1 Qrels (io.anserini.util.Qrels)1 File (java.io.File)1 FileOutputStream (java.io.FileOutputStream)1 PrintStream (java.io.PrintStream)1