Search in sources :

Example 1 with BM25Similarity

use of org.apache.lucene.search.similarities.BM25Similarity in project lucene-solr by apache.

the class BM25SimilarityFactory method getSimilarity.

@Override
public Similarity getSimilarity() {
    BM25Similarity sim = new BM25Similarity(k1, b);
    sim.setDiscountOverlaps(discountOverlaps);
    return sim;
}
Also used : BM25Similarity(org.apache.lucene.search.similarities.BM25Similarity)

Example 2 with BM25Similarity

use of org.apache.lucene.search.similarities.BM25Similarity in project lucene-solr by apache.

the class SchemaSimilarityFactory method getSimilarity.

@Override
public Similarity getSimilarity() {
    if (null == core) {
        throw new IllegalStateException("SchemaSimilarityFactory can not be used until SolrCoreAware.inform has been called");
    }
    if (null == similarity) {
        // Need to instantiate lazily, can't do this in inform(SolrCore) because of chicken/egg
        // circular initialization hell with core.getLatestSchema() to lookup defaultSimFromFieldType
        Similarity defaultSim = null;
        if (null == defaultSimFromFieldType) {
            // nothing configured, choose a sensible implicit default...
            defaultSim = this.core.getSolrConfig().luceneMatchVersion.onOrAfter(Version.LUCENE_6_0_0) ? new BM25Similarity() : new ClassicSimilarity();
        } else {
            FieldType defSimFT = core.getLatestSchema().getFieldTypeByName(defaultSimFromFieldType);
            if (null == defSimFT) {
                throw new SolrException(ErrorCode.SERVER_ERROR, "SchemaSimilarityFactory configured with " + INIT_OPT + "='" + defaultSimFromFieldType + "' but that <fieldType> does not exist");
            }
            defaultSim = defSimFT.getSimilarity();
            if (null == defaultSim) {
                throw new SolrException(ErrorCode.SERVER_ERROR, "SchemaSimilarityFactory configured with " + INIT_OPT + "='" + defaultSimFromFieldType + "' but that <fieldType> does not define a <similarity>");
            }
        }
        similarity = new SchemaSimilarity(defaultSim);
    }
    return similarity;
}
Also used : ClassicSimilarity(org.apache.lucene.search.similarities.ClassicSimilarity) ClassicSimilarity(org.apache.lucene.search.similarities.ClassicSimilarity) Similarity(org.apache.lucene.search.similarities.Similarity) BM25Similarity(org.apache.lucene.search.similarities.BM25Similarity) BM25Similarity(org.apache.lucene.search.similarities.BM25Similarity) SolrException(org.apache.solr.common.SolrException) FieldType(org.apache.solr.schema.FieldType)

Example 3 with BM25Similarity

use of org.apache.lucene.search.similarities.BM25Similarity in project lucene-solr by apache.

the class CommonTermsQueryTest method testExtend.

@Test
public void testExtend() throws IOException {
    Directory dir = newDirectory();
    MockAnalyzer analyzer = new MockAnalyzer(random());
    RandomIndexWriter w = new RandomIndexWriter(random(), dir, analyzer);
    String[] docs = new String[] { "this is the end of the world right", "is this it or maybe not", "this is the end of the universe as we know it", "there is the famous restaurant at the end of the universe" };
    for (int i = 0; i < docs.length; i++) {
        Document doc = new Document();
        doc.add(newStringField("id", "" + i, Field.Store.YES));
        doc.add(newTextField("field", docs[i], Field.Store.NO));
        w.addDocument(doc);
    }
    IndexReader r = w.getReader();
    IndexSearcher s = newSearcher(r);
    // don't use a randomized similarity, e.g. stopwords for DFI can get scored as 0,
    // so boosting them is kind of crazy
    s.setSimilarity(new BM25Similarity());
    {
        CommonTermsQuery query = new CommonTermsQuery(Occur.SHOULD, Occur.SHOULD, random().nextBoolean() ? 2.0f : 0.5f);
        query.add(new Term("field", "is"));
        query.add(new Term("field", "this"));
        query.add(new Term("field", "end"));
        query.add(new Term("field", "world"));
        query.add(new Term("field", "universe"));
        query.add(new Term("field", "right"));
        TopDocs search = s.search(query, 10);
        assertEquals(search.totalHits, 3);
        assertEquals("0", r.document(search.scoreDocs[0].doc).get("id"));
        assertEquals("2", r.document(search.scoreDocs[1].doc).get("id"));
        assertEquals("3", r.document(search.scoreDocs[2].doc).get("id"));
    }
    {
        // this one boosts the termQuery("field" "universe") by 10x
        CommonTermsQuery query = new ExtendedCommonTermsQuery(Occur.SHOULD, Occur.SHOULD, random().nextBoolean() ? 2.0f : 0.5f);
        query.add(new Term("field", "is"));
        query.add(new Term("field", "this"));
        query.add(new Term("field", "end"));
        query.add(new Term("field", "world"));
        query.add(new Term("field", "universe"));
        query.add(new Term("field", "right"));
        TopDocs search = s.search(query, 10);
        assertEquals(search.totalHits, 3);
        assertEquals("2", r.document(search.scoreDocs[0].doc).get("id"));
        assertEquals("3", r.document(search.scoreDocs[1].doc).get("id"));
        assertEquals("0", r.document(search.scoreDocs[2].doc).get("id"));
    }
    IOUtils.close(r, w, dir, analyzer);
}
Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) Term(org.apache.lucene.index.Term) Document(org.apache.lucene.document.Document) TopDocs(org.apache.lucene.search.TopDocs) MockAnalyzer(org.apache.lucene.analysis.MockAnalyzer) IndexReader(org.apache.lucene.index.IndexReader) BM25Similarity(org.apache.lucene.search.similarities.BM25Similarity) RandomIndexWriter(org.apache.lucene.index.RandomIndexWriter) Directory(org.apache.lucene.store.Directory) Test(org.junit.Test)

Example 4 with BM25Similarity

use of org.apache.lucene.search.similarities.BM25Similarity in project lucene-solr by apache.

the class TestCustomScoreExplanations method testSubExplanations.

public void testSubExplanations() throws IOException {
    Query query = new FunctionQuery(new ConstValueSource(5));
    IndexSearcher searcher = newSearcher(BaseExplanationTestCase.searcher.getIndexReader());
    searcher.setSimilarity(new BM25Similarity());
    Explanation expl = searcher.explain(query, 0);
    assertEquals(2, expl.getDetails().length);
    // function
    assertEquals(5f, expl.getDetails()[0].getValue(), 0f);
    // boost
    assertEquals("boost", expl.getDetails()[1].getDescription());
    assertEquals(1f, expl.getDetails()[1].getValue(), 0f);
    query = new BoostQuery(query, 2);
    expl = searcher.explain(query, 0);
    assertEquals(2, expl.getDetails().length);
    // function
    assertEquals(5f, expl.getDetails()[0].getValue(), 0f);
    // boost
    assertEquals("boost", expl.getDetails()[1].getDescription());
    assertEquals(2f, expl.getDetails()[1].getValue(), 0f);
    // in order to have a queryNorm != 1
    searcher.setSimilarity(new ClassicSimilarity());
    expl = searcher.explain(query, 0);
    assertEquals(2, expl.getDetails().length);
    // function
    assertEquals(5f, expl.getDetails()[0].getValue(), 0f);
    // boost
    assertEquals("boost", expl.getDetails()[1].getDescription());
    assertEquals(2f, expl.getDetails()[1].getValue(), 0f);
}
Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) ClassicSimilarity(org.apache.lucene.search.similarities.ClassicSimilarity) FunctionQuery(org.apache.lucene.queries.function.FunctionQuery) Query(org.apache.lucene.search.Query) FunctionQuery(org.apache.lucene.queries.function.FunctionQuery) MatchAllDocsQuery(org.apache.lucene.search.MatchAllDocsQuery) TermQuery(org.apache.lucene.search.TermQuery) BooleanQuery(org.apache.lucene.search.BooleanQuery) BoostQuery(org.apache.lucene.search.BoostQuery) Explanation(org.apache.lucene.search.Explanation) BM25Similarity(org.apache.lucene.search.similarities.BM25Similarity) ConstValueSource(org.apache.lucene.queries.function.valuesource.ConstValueSource) BoostQuery(org.apache.lucene.search.BoostQuery)

Example 5 with BM25Similarity

use of org.apache.lucene.search.similarities.BM25Similarity in project lucene-solr by apache.

the class TestFunctionScoreExplanations method testSubExplanations.

public void testSubExplanations() throws IOException {
    Query query = new FunctionScoreQuery(new MatchAllDocsQuery(), DoubleValuesSource.constant(5));
    IndexSearcher searcher = newSearcher(BaseExplanationTestCase.searcher.getIndexReader());
    searcher.setSimilarity(new BM25Similarity());
    Explanation expl = searcher.explain(query, 0);
    assertEquals("constant(5.0)", expl.getDescription());
    assertEquals(0, expl.getDetails().length);
    query = new BoostQuery(query, 2);
    expl = searcher.explain(query, 0);
    assertEquals(2, expl.getDetails().length);
    // function
    assertEquals(5f, expl.getDetails()[1].getValue(), 0f);
    // boost
    assertEquals("boost", expl.getDetails()[0].getDescription());
    assertEquals(2f, expl.getDetails()[0].getValue(), 0f);
    // in order to have a queryNorm != 1
    searcher.setSimilarity(new ClassicSimilarity());
    expl = searcher.explain(query, 0);
    assertEquals(2, expl.getDetails().length);
    // function
    assertEquals(5f, expl.getDetails()[1].getValue(), 0f);
    // boost
    assertEquals("boost", expl.getDetails()[0].getDescription());
    assertEquals(2f, expl.getDetails()[0].getValue(), 0f);
}
Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) ClassicSimilarity(org.apache.lucene.search.similarities.ClassicSimilarity) Query(org.apache.lucene.search.Query) MatchAllDocsQuery(org.apache.lucene.search.MatchAllDocsQuery) TermQuery(org.apache.lucene.search.TermQuery) BooleanQuery(org.apache.lucene.search.BooleanQuery) BoostQuery(org.apache.lucene.search.BoostQuery) Explanation(org.apache.lucene.search.Explanation) BM25Similarity(org.apache.lucene.search.similarities.BM25Similarity) MatchAllDocsQuery(org.apache.lucene.search.MatchAllDocsQuery) BoostQuery(org.apache.lucene.search.BoostQuery)

Aggregations

BM25Similarity (org.apache.lucene.search.similarities.BM25Similarity)29 Directory (org.apache.lucene.store.Directory)12 IndexSearcher (org.apache.lucene.search.IndexSearcher)11 IndexReader (org.apache.lucene.index.IndexReader)10 Similarity (org.apache.lucene.search.similarities.Similarity)9 FSDirectory (org.apache.lucene.store.FSDirectory)9 Query (org.apache.lucene.search.Query)8 TopDocs (org.apache.lucene.search.TopDocs)8 TermQuery (org.apache.lucene.search.TermQuery)7 ClassicSimilarity (org.apache.lucene.search.similarities.ClassicSimilarity)7 Test (org.junit.Test)7 Term (org.apache.lucene.index.Term)6 RerankerCascade (io.anserini.rerank.RerankerCascade)5 BooleanQuery (org.apache.lucene.search.BooleanQuery)5 MockAnalyzer (org.apache.lucene.analysis.MockAnalyzer)4 FeatureExtractors (io.anserini.ltr.feature.FeatureExtractors)3 IdentityReranker (io.anserini.rerank.IdentityReranker)3 ScoredDocuments (io.anserini.rerank.ScoredDocuments)3 Qrels (io.anserini.util.Qrels)3 PrintStream (java.io.PrintStream)3