Search in sources :

Example 1 with ClassicSimilarity

use of org.apache.lucene.search.similarities.ClassicSimilarity in project elasticsearch by elastic.

the class SimilarityServiceTests method testOverrideDefaultSimilarity.

// Tests #16594
public void testOverrideDefaultSimilarity() {
    Settings settings = Settings.builder().put("index.similarity.default.type", "classic").build();
    IndexSettings indexSettings = IndexSettingsModule.newIndexSettings("test", settings);
    SimilarityService service = new SimilarityService(indexSettings, Collections.emptyMap());
    assertTrue(service.getDefaultSimilarity() instanceof ClassicSimilarity);
}
Also used : ClassicSimilarity(org.apache.lucene.search.similarities.ClassicSimilarity) IndexSettings(org.elasticsearch.index.IndexSettings) Settings(org.elasticsearch.common.settings.Settings) IndexSettings(org.elasticsearch.index.IndexSettings)

Example 2 with ClassicSimilarity

use of org.apache.lucene.search.similarities.ClassicSimilarity in project lucene-solr by apache.

the class TestPhraseQuery method testSlopScoring.

public void testSlopScoring() throws IOException {
    Directory directory = newDirectory();
    RandomIndexWriter writer = new RandomIndexWriter(random(), directory, newIndexWriterConfig(new MockAnalyzer(random())).setMergePolicy(newLogMergePolicy()).setSimilarity(new BM25Similarity()));
    Document doc = new Document();
    doc.add(newTextField("field", "foo firstname lastname foo", Field.Store.YES));
    writer.addDocument(doc);
    Document doc2 = new Document();
    doc2.add(newTextField("field", "foo firstname zzz lastname foo", Field.Store.YES));
    writer.addDocument(doc2);
    Document doc3 = new Document();
    doc3.add(newTextField("field", "foo firstname zzz yyy lastname foo", Field.Store.YES));
    writer.addDocument(doc3);
    IndexReader reader = writer.getReader();
    writer.close();
    IndexSearcher searcher = newSearcher(reader);
    searcher.setSimilarity(new ClassicSimilarity());
    PhraseQuery query = new PhraseQuery(Integer.MAX_VALUE, "field", "firstname", "lastname");
    ScoreDoc[] hits = searcher.search(query, 1000).scoreDocs;
    assertEquals(3, hits.length);
    // Make sure that those matches where the terms appear closer to
    // each other get a higher score:
    assertEquals(1.0, hits[0].score, 0.01);
    assertEquals(0, hits[0].doc);
    assertEquals(0.63, hits[1].score, 0.01);
    assertEquals(1, hits[1].doc);
    assertEquals(0.47, hits[2].score, 0.01);
    assertEquals(2, hits[2].doc);
    QueryUtils.check(random(), query, searcher);
    reader.close();
    directory.close();
}
Also used : ClassicSimilarity(org.apache.lucene.search.similarities.ClassicSimilarity) MockAnalyzer(org.apache.lucene.analysis.MockAnalyzer) BM25Similarity(org.apache.lucene.search.similarities.BM25Similarity) IndexReader(org.apache.lucene.index.IndexReader) Document(org.apache.lucene.document.Document) RandomIndexWriter(org.apache.lucene.index.RandomIndexWriter) Directory(org.apache.lucene.store.Directory)

Example 3 with ClassicSimilarity

use of org.apache.lucene.search.similarities.ClassicSimilarity in project lucene-solr by apache.

the class TestQueryRescorer method testBasic.

public void testBasic() throws Exception {
    Directory dir = newDirectory();
    RandomIndexWriter w = new RandomIndexWriter(random(), dir, newIndexWriterConfig());
    Document doc = new Document();
    doc.add(newStringField("id", "0", Field.Store.YES));
    doc.add(newTextField("field", "wizard the the the the the oz", Field.Store.NO));
    w.addDocument(doc);
    doc = new Document();
    doc.add(newStringField("id", "1", Field.Store.YES));
    // 1 extra token, but wizard and oz are close;
    doc.add(newTextField("field", "wizard oz the the the the the the", Field.Store.NO));
    w.addDocument(doc);
    IndexReader r = w.getReader();
    w.close();
    // Do ordinary BooleanQuery:
    BooleanQuery.Builder bq = new BooleanQuery.Builder();
    bq.add(new TermQuery(new Term("field", "wizard")), Occur.SHOULD);
    bq.add(new TermQuery(new Term("field", "oz")), Occur.SHOULD);
    IndexSearcher searcher = getSearcher(r);
    searcher.setSimilarity(new ClassicSimilarity());
    TopDocs hits = searcher.search(bq.build(), 10);
    assertEquals(2, hits.totalHits);
    assertEquals("0", searcher.doc(hits.scoreDocs[0].doc).get("id"));
    assertEquals("1", searcher.doc(hits.scoreDocs[1].doc).get("id"));
    // Now, resort using PhraseQuery:
    PhraseQuery pq = new PhraseQuery(5, "field", "wizard", "oz");
    TopDocs hits2 = QueryRescorer.rescore(searcher, hits, pq, 2.0, 10);
    // Resorting changed the order:
    assertEquals(2, hits2.totalHits);
    assertEquals("1", searcher.doc(hits2.scoreDocs[0].doc).get("id"));
    assertEquals("0", searcher.doc(hits2.scoreDocs[1].doc).get("id"));
    // Resort using SpanNearQuery:
    SpanTermQuery t1 = new SpanTermQuery(new Term("field", "wizard"));
    SpanTermQuery t2 = new SpanTermQuery(new Term("field", "oz"));
    SpanNearQuery snq = new SpanNearQuery(new SpanQuery[] { t1, t2 }, 0, true);
    TopDocs hits3 = QueryRescorer.rescore(searcher, hits, snq, 2.0, 10);
    // Resorting changed the order:
    assertEquals(2, hits3.totalHits);
    assertEquals("1", searcher.doc(hits3.scoreDocs[0].doc).get("id"));
    assertEquals("0", searcher.doc(hits3.scoreDocs[1].doc).get("id"));
    r.close();
    dir.close();
}
Also used : SpanTermQuery(org.apache.lucene.search.spans.SpanTermQuery) ClassicSimilarity(org.apache.lucene.search.similarities.ClassicSimilarity) Term(org.apache.lucene.index.Term) Document(org.apache.lucene.document.Document) SpanTermQuery(org.apache.lucene.search.spans.SpanTermQuery) IndexReader(org.apache.lucene.index.IndexReader) SpanNearQuery(org.apache.lucene.search.spans.SpanNearQuery) RandomIndexWriter(org.apache.lucene.index.RandomIndexWriter) Directory(org.apache.lucene.store.Directory)

Example 4 with ClassicSimilarity

use of org.apache.lucene.search.similarities.ClassicSimilarity in project lucene-solr by apache.

the class ElevationComparatorSource method testSorting.

//@Test
public void testSorting() throws Throwable {
    Directory directory = newDirectory();
    IndexWriter writer = new IndexWriter(directory, newIndexWriterConfig(new MockAnalyzer(random())).setMaxBufferedDocs(2).setMergePolicy(newLogMergePolicy(1000)).setSimilarity(new ClassicSimilarity()));
    writer.addDocument(adoc(new String[] { "id", "a", "title", "ipod", "str_s", "a" }));
    writer.addDocument(adoc(new String[] { "id", "b", "title", "ipod ipod", "str_s", "b" }));
    writer.addDocument(adoc(new String[] { "id", "c", "title", "ipod ipod ipod", "str_s", "c" }));
    writer.addDocument(adoc(new String[] { "id", "x", "title", "boosted", "str_s", "x" }));
    writer.addDocument(adoc(new String[] { "id", "y", "title", "boosted boosted", "str_s", "y" }));
    writer.addDocument(adoc(new String[] { "id", "z", "title", "boosted boosted boosted", "str_s", "z" }));
    IndexReader r = DirectoryReader.open(writer);
    writer.close();
    IndexSearcher searcher = newSearcher(r);
    searcher.setSimilarity(new BM25Similarity());
    runTest(searcher, true);
    runTest(searcher, false);
    r.close();
    directory.close();
}
Also used : ClassicSimilarity(org.apache.lucene.search.similarities.ClassicSimilarity) MockAnalyzer(org.apache.lucene.analysis.MockAnalyzer) IndexWriter(org.apache.lucene.index.IndexWriter) IndexReader(org.apache.lucene.index.IndexReader) BM25Similarity(org.apache.lucene.search.similarities.BM25Similarity) Directory(org.apache.lucene.store.Directory)

Example 5 with ClassicSimilarity

use of org.apache.lucene.search.similarities.ClassicSimilarity in project lucene-solr by apache.

the class TestFuzzyQuery method testSingleQueryExactMatchScoresHighest.

public void testSingleQueryExactMatchScoresHighest() throws Exception {
    //See issue LUCENE-329 - IDF shouldn't wreck similarity ranking 
    Directory directory = newDirectory();
    RandomIndexWriter writer = new RandomIndexWriter(random(), directory);
    addDoc("smith", writer);
    addDoc("smith", writer);
    addDoc("smith", writer);
    addDoc("smith", writer);
    addDoc("smith", writer);
    addDoc("smith", writer);
    addDoc("smythe", writer);
    addDoc("smdssasd", writer);
    IndexReader reader = writer.getReader();
    IndexSearcher searcher = newSearcher(reader);
    //avoid randomisation of similarity algo by test framework
    searcher.setSimilarity(new ClassicSimilarity());
    writer.close();
    String[] searchTerms = { "smith", "smythe", "smdssasd" };
    for (String searchTerm : searchTerms) {
        FuzzyQuery query = new FuzzyQuery(new Term("field", searchTerm), 2, 1);
        ScoreDoc[] hits = searcher.search(query, 1000).scoreDocs;
        Document bestDoc = searcher.doc(hits[0].doc);
        assertTrue(hits.length > 0);
        String topMatch = bestDoc.get("field");
        assertEquals(searchTerm, topMatch);
        if (hits.length > 1) {
            Document worstDoc = searcher.doc(hits[hits.length - 1].doc);
            String worstMatch = worstDoc.get("field");
            assertNotSame(searchTerm, worstMatch);
        }
    }
    reader.close();
    directory.close();
}
Also used : ClassicSimilarity(org.apache.lucene.search.similarities.ClassicSimilarity) IndexReader(org.apache.lucene.index.IndexReader) Term(org.apache.lucene.index.Term) Document(org.apache.lucene.document.Document) RandomIndexWriter(org.apache.lucene.index.RandomIndexWriter) Directory(org.apache.lucene.store.Directory)

Aggregations

ClassicSimilarity (org.apache.lucene.search.similarities.ClassicSimilarity)43 RandomIndexWriter (org.apache.lucene.index.RandomIndexWriter)14 Document (org.apache.lucene.document.Document)13 Term (org.apache.lucene.index.Term)12 Directory (org.apache.lucene.store.Directory)10 IndexReader (org.apache.lucene.index.IndexReader)9 Similarity (org.apache.lucene.search.similarities.Similarity)9 TermQuery (org.apache.lucene.search.TermQuery)7 BM25Similarity (org.apache.lucene.search.similarities.BM25Similarity)7 MockAnalyzer (org.apache.lucene.analysis.MockAnalyzer)6 ConstValueSource (org.apache.lucene.queries.function.valuesource.ConstValueSource)5 DocFreqValueSource (org.apache.lucene.queries.function.valuesource.DocFreqValueSource)4 DoubleConstValueSource (org.apache.lucene.queries.function.valuesource.DoubleConstValueSource)4 IDFValueSource (org.apache.lucene.queries.function.valuesource.IDFValueSource)4 JoinDocFreqValueSource (org.apache.lucene.queries.function.valuesource.JoinDocFreqValueSource)4 LiteralValueSource (org.apache.lucene.queries.function.valuesource.LiteralValueSource)4 MaxDocValueSource (org.apache.lucene.queries.function.valuesource.MaxDocValueSource)4 IndexSearcher (org.apache.lucene.search.IndexSearcher)4 Query (org.apache.lucene.search.Query)4 IndexWriterConfig (org.apache.lucene.index.IndexWriterConfig)3