Search in sources :

Example 1 with NewsBackgroundLinkingReranker

use of io.anserini.rerank.lib.NewsBackgroundLinkingReranker in project Anserini by castorini.

the class SearchCollection method searchBackgroundLinking.

public <K> ScoredDocuments searchBackgroundLinking(IndexSearcher searcher, K qid, String docid, RerankerCascade cascade) throws IOException {
    // Extract a list of analyzed terms from the document to compose a query.
    List<String> terms = BackgroundLinkingTopicReader.extractTerms(reader, docid, args.backgroundlinking_k, analyzer);
    // Since the terms are already analyzed, we just join them together and use the StandardQueryParser.
    Query docQuery;
    try {
        docQuery = new StandardQueryParser().parse(StringUtils.join(terms, " "), IndexArgs.CONTENTS);
    } catch (QueryNodeException e) {
        throw new RuntimeException("Unable to create a Lucene query comprised of terms extracted from query document!");
    }
    // Per track guidelines, no opinion or editorials. Filter out articles of these types.
    Query filter = new TermInSetQuery(WashingtonPostGenerator.WashingtonPostField.KICKER.name, new BytesRef("Opinions"), new BytesRef("Letters to the Editor"), new BytesRef("The Post's View"));
    BooleanQuery.Builder builder = new BooleanQuery.Builder();
    builder.add(filter, BooleanClause.Occur.MUST_NOT);
    builder.add(docQuery, BooleanClause.Occur.MUST);
    Query query = builder.build();
    // Search using constructed query.
    TopDocs rs;
    if (args.arbitraryScoreTieBreak) {
        rs = searcher.search(query, (isRerank && args.rf_qrels == null) ? args.rerankcutoff : args.hits);
    } else {
        rs = searcher.search(query, (isRerank && args.rf_qrels == null) ? args.rerankcutoff : args.hits, BREAK_SCORE_TIES_BY_DOCID, true);
    }
    RerankerContext context = new RerankerContext<>(searcher, qid, query, docid, StringUtils.join(", ", terms), terms, null, args);
    // Run the existing cascade.
    ScoredDocuments docs = cascade.run(ScoredDocuments.fromTopDocs(rs, searcher), context);
    // Perform post-processing (e.g., date filter, dedupping, etc.) as a final step.
    return new NewsBackgroundLinkingReranker().rerank(docs, context);
}
Also used : QueryNodeException(org.apache.lucene.queryparser.flexible.core.QueryNodeException) NewsBackgroundLinkingReranker(io.anserini.rerank.lib.NewsBackgroundLinkingReranker) BooleanQuery(org.apache.lucene.search.BooleanQuery) Query(org.apache.lucene.search.Query) TermInSetQuery(org.apache.lucene.search.TermInSetQuery) BooleanQuery(org.apache.lucene.search.BooleanQuery) ScoredDocuments(io.anserini.rerank.ScoredDocuments) TopDocs(org.apache.lucene.search.TopDocs) TermInSetQuery(org.apache.lucene.search.TermInSetQuery) StandardQueryParser(org.apache.lucene.queryparser.flexible.standard.StandardQueryParser) BytesRef(org.apache.lucene.util.BytesRef) RerankerContext(io.anserini.rerank.RerankerContext)

Aggregations

RerankerContext (io.anserini.rerank.RerankerContext)1 ScoredDocuments (io.anserini.rerank.ScoredDocuments)1 NewsBackgroundLinkingReranker (io.anserini.rerank.lib.NewsBackgroundLinkingReranker)1 QueryNodeException (org.apache.lucene.queryparser.flexible.core.QueryNodeException)1 StandardQueryParser (org.apache.lucene.queryparser.flexible.standard.StandardQueryParser)1 BooleanQuery (org.apache.lucene.search.BooleanQuery)1 Query (org.apache.lucene.search.Query)1 TermInSetQuery (org.apache.lucene.search.TermInSetQuery)1 TopDocs (org.apache.lucene.search.TopDocs)1 BytesRef (org.apache.lucene.util.BytesRef)1