Search in sources :

Example 1 with TermStatistics

use of org.apache.lucene.search.TermStatistics in project elasticsearch by elastic.

the class AggregatedDfs method writeTo.

@Override
public void writeTo(final StreamOutput out) throws IOException {
    out.writeVInt(termStatistics.size());
    for (ObjectObjectCursor<Term, TermStatistics> c : termStatistics()) {
        Term term = c.key;
        out.writeString(term.field());
        out.writeBytesRef(term.bytes());
        TermStatistics stats = c.value;
        out.writeBytesRef(stats.term());
        out.writeVLong(stats.docFreq());
        out.writeVLong(DfsSearchResult.addOne(stats.totalTermFreq()));
    }
    DfsSearchResult.writeFieldStats(out, fieldStatistics);
    out.writeVLong(maxDoc);
}
Also used : Term(org.apache.lucene.index.Term) TermStatistics(org.apache.lucene.search.TermStatistics)

Example 2 with TermStatistics

use of org.apache.lucene.search.TermStatistics in project elasticsearch by elastic.

the class DfsSearchResult method readTermStats.

public static TermStatistics[] readTermStats(StreamInput in, Term[] terms) throws IOException {
    int termsStatsSize = in.readVInt();
    final TermStatistics[] termStatistics;
    if (termsStatsSize == 0) {
        termStatistics = EMPTY_TERM_STATS;
    } else {
        termStatistics = new TermStatistics[termsStatsSize];
        assert terms.length == termsStatsSize;
        for (int i = 0; i < termStatistics.length; i++) {
            BytesRef term = terms[i].bytes();
            final long docFreq = in.readVLong();
            assert docFreq >= 0;
            final long totalTermFreq = subOne(in.readVLong());
            termStatistics[i] = new TermStatistics(term, docFreq, totalTermFreq);
        }
    }
    return termStatistics;
}
Also used : TermStatistics(org.apache.lucene.search.TermStatistics) BytesRef(org.apache.lucene.util.BytesRef)

Example 3 with TermStatistics

use of org.apache.lucene.search.TermStatistics in project elasticsearch by elastic.

the class AggregatedDfs method readFrom.

@Override
public void readFrom(StreamInput in) throws IOException {
    int size = in.readVInt();
    termStatistics = HppcMaps.newMap(size);
    for (int i = 0; i < size; i++) {
        Term term = new Term(in.readString(), in.readBytesRef());
        TermStatistics stats = new TermStatistics(in.readBytesRef(), in.readVLong(), DfsSearchResult.subOne(in.readVLong()));
        termStatistics.put(term, stats);
    }
    fieldStatistics = DfsSearchResult.readFieldStats(in);
    maxDoc = in.readVLong();
}
Also used : Term(org.apache.lucene.index.Term) TermStatistics(org.apache.lucene.search.TermStatistics)

Example 4 with TermStatistics

use of org.apache.lucene.search.TermStatistics in project elasticsearch by elastic.

the class SearchPhaseController method aggregateDfs.

public AggregatedDfs aggregateDfs(AtomicArray<DfsSearchResult> results) {
    ObjectObjectHashMap<Term, TermStatistics> termStatistics = HppcMaps.newNoNullKeysMap();
    ObjectObjectHashMap<String, CollectionStatistics> fieldStatistics = HppcMaps.newNoNullKeysMap();
    long aggMaxDoc = 0;
    for (AtomicArray.Entry<DfsSearchResult> lEntry : results.asList()) {
        final Term[] terms = lEntry.value.terms();
        final TermStatistics[] stats = lEntry.value.termStatistics();
        assert terms.length == stats.length;
        for (int i = 0; i < terms.length; i++) {
            assert terms[i] != null;
            TermStatistics existing = termStatistics.get(terms[i]);
            if (existing != null) {
                assert terms[i].bytes().equals(existing.term());
                // totalTermFrequency is an optional statistic we need to check if either one or both
                // are set to -1 which means not present and then set it globally to -1
                termStatistics.put(terms[i], new TermStatistics(existing.term(), existing.docFreq() + stats[i].docFreq(), optionalSum(existing.totalTermFreq(), stats[i].totalTermFreq())));
            } else {
                termStatistics.put(terms[i], stats[i]);
            }
        }
        assert !lEntry.value.fieldStatistics().containsKey(null);
        final Object[] keys = lEntry.value.fieldStatistics().keys;
        final Object[] values = lEntry.value.fieldStatistics().values;
        for (int i = 0; i < keys.length; i++) {
            if (keys[i] != null) {
                String key = (String) keys[i];
                CollectionStatistics value = (CollectionStatistics) values[i];
                assert key != null;
                CollectionStatistics existing = fieldStatistics.get(key);
                if (existing != null) {
                    CollectionStatistics merged = new CollectionStatistics(key, existing.maxDoc() + value.maxDoc(), optionalSum(existing.docCount(), value.docCount()), optionalSum(existing.sumTotalTermFreq(), value.sumTotalTermFreq()), optionalSum(existing.sumDocFreq(), value.sumDocFreq()));
                    fieldStatistics.put(key, merged);
                } else {
                    fieldStatistics.put(key, value);
                }
            }
        }
        aggMaxDoc += lEntry.value.maxDoc();
    }
    return new AggregatedDfs(termStatistics, fieldStatistics, aggMaxDoc);
}
Also used : AtomicArray(org.elasticsearch.common.util.concurrent.AtomicArray) DfsSearchResult(org.elasticsearch.search.dfs.DfsSearchResult) Term(org.apache.lucene.index.Term) TermStatistics(org.apache.lucene.search.TermStatistics) CollectionStatistics(org.apache.lucene.search.CollectionStatistics) AggregatedDfs(org.elasticsearch.search.dfs.AggregatedDfs)

Example 5 with TermStatistics

use of org.apache.lucene.search.TermStatistics in project lucene-solr by apache.

the class TestMemoryIndex method testSimilarities.

@Test
public void testSimilarities() throws IOException {
    MemoryIndex mi = new MemoryIndex();
    mi.addField("f1", "a long text field that contains many many terms", analyzer);
    IndexSearcher searcher = mi.createSearcher();
    LeafReader reader = (LeafReader) searcher.getIndexReader();
    NumericDocValues norms = reader.getNormValues("f1");
    assertEquals(0, norms.nextDoc());
    float n1 = norms.longValue();
    // Norms are re-computed when we change the Similarity
    mi.setSimilarity(new Similarity() {

        @Override
        public long computeNorm(FieldInvertState state) {
            return 74;
        }

        @Override
        public SimWeight computeWeight(float boost, CollectionStatistics collectionStats, TermStatistics... termStats) {
            throw new UnsupportedOperationException();
        }

        @Override
        public SimScorer simScorer(SimWeight weight, LeafReaderContext context) throws IOException {
            throw new UnsupportedOperationException();
        }
    });
    norms = reader.getNormValues("f1");
    assertEquals(0, norms.nextDoc());
    float n2 = norms.longValue();
    assertTrue(n1 != n2);
    TestUtil.checkReader(reader);
}
Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) SortedNumericDocValues(org.apache.lucene.index.SortedNumericDocValues) NumericDocValues(org.apache.lucene.index.NumericDocValues) LeafReader(org.apache.lucene.index.LeafReader) ClassicSimilarity(org.apache.lucene.search.similarities.ClassicSimilarity) BM25Similarity(org.apache.lucene.search.similarities.BM25Similarity) Similarity(org.apache.lucene.search.similarities.Similarity) IOException(java.io.IOException) TermStatistics(org.apache.lucene.search.TermStatistics) CollectionStatistics(org.apache.lucene.search.CollectionStatistics) FieldInvertState(org.apache.lucene.index.FieldInvertState) LeafReaderContext(org.apache.lucene.index.LeafReaderContext) Test(org.junit.Test)

Aggregations

TermStatistics (org.apache.lucene.search.TermStatistics)15 Term (org.apache.lucene.index.Term)7 CollectionStatistics (org.apache.lucene.search.CollectionStatistics)6 ArrayList (java.util.ArrayList)4 Explanation (org.apache.lucene.search.Explanation)4 BytesRef (org.apache.lucene.util.BytesRef)4 IOException (java.io.IOException)3 TermContext (org.apache.lucene.index.TermContext)3 IndexReaderContext (org.apache.lucene.index.IndexReaderContext)2 LeafReaderContext (org.apache.lucene.index.LeafReaderContext)2 PostingsEnum (org.apache.lucene.index.PostingsEnum)2 Terms (org.apache.lucene.index.Terms)2 TermsEnum (org.apache.lucene.index.TermsEnum)2 IndexSearcher (org.apache.lucene.search.IndexSearcher)2 Similarity (org.apache.lucene.search.similarities.Similarity)2 SimScorer (org.apache.lucene.search.similarities.Similarity.SimScorer)2 SimWeight (org.apache.lucene.search.similarities.Similarity.SimWeight)2 ObjectHashSet (com.carrotsearch.hppc.ObjectHashSet)1 UncheckedIOException (java.io.UncheckedIOException)1 HashMap (java.util.HashMap)1