Search in sources :

Example 11 with TermIterator

use of org.apache.lucene.index.PrefixCodedTerms.TermIterator in project lucene-solr by apache.

the class PointInSetQuery method toString.

@Override
public final String toString(String field) {
    final StringBuilder sb = new StringBuilder();
    if (this.field.equals(field) == false) {
        sb.append(this.field);
        sb.append(':');
    }
    sb.append("{");
    TermIterator iterator = sortedPackedPoints.iterator();
    byte[] pointBytes = new byte[numDims * bytesPerDim];
    boolean first = true;
    for (BytesRef point = iterator.next(); point != null; point = iterator.next()) {
        if (first == false) {
            sb.append(" ");
        }
        first = false;
        System.arraycopy(point.bytes, point.offset, pointBytes, 0, pointBytes.length);
        sb.append(toString(pointBytes));
    }
    sb.append("}");
    return sb.toString();
}
Also used : TermIterator(org.apache.lucene.index.PrefixCodedTerms.TermIterator) BytesRef(org.apache.lucene.util.BytesRef)

Example 12 with TermIterator

use of org.apache.lucene.index.PrefixCodedTerms.TermIterator in project lucene-solr by apache.

the class PointInSetQuery method createWeight.

@Override
public final Weight createWeight(IndexSearcher searcher, boolean needsScores, float boost) throws IOException {
    return new ConstantScoreWeight(this, boost) {

        @Override
        public Scorer scorer(LeafReaderContext context) throws IOException {
            LeafReader reader = context.reader();
            PointValues values = reader.getPointValues(field);
            if (values == null) {
                // No docs in this segment/field indexed any points
                return null;
            }
            if (values.getNumDimensions() != numDims) {
                throw new IllegalArgumentException("field=\"" + field + "\" was indexed with numDims=" + values.getNumDimensions() + " but this query has numDims=" + numDims);
            }
            if (values.getBytesPerDimension() != bytesPerDim) {
                throw new IllegalArgumentException("field=\"" + field + "\" was indexed with bytesPerDim=" + values.getBytesPerDimension() + " but this query has bytesPerDim=" + bytesPerDim);
            }
            DocIdSetBuilder result = new DocIdSetBuilder(reader.maxDoc(), values, field);
            if (numDims == 1) {
                // We optimize this common case, effectively doing a merge sort of the indexed values vs the queried set:
                values.intersect(new MergePointVisitor(sortedPackedPoints, result));
            } else {
                // NOTE: this is naive implementation, where for each point we re-walk the KD tree to intersect.  We could instead do a similar
                // optimization as the 1D case, but I think it'd mean building a query-time KD tree so we could efficiently intersect against the
                // index, which is probably tricky!
                SinglePointVisitor visitor = new SinglePointVisitor(result);
                TermIterator iterator = sortedPackedPoints.iterator();
                for (BytesRef point = iterator.next(); point != null; point = iterator.next()) {
                    visitor.setPoint(point);
                    values.intersect(visitor);
                }
            }
            return new ConstantScoreScorer(this, score(), result.build().iterator());
        }
    };
}
Also used : PointValues(org.apache.lucene.index.PointValues) LeafReader(org.apache.lucene.index.LeafReader) TermIterator(org.apache.lucene.index.PrefixCodedTerms.TermIterator) LeafReaderContext(org.apache.lucene.index.LeafReaderContext) DocIdSetBuilder(org.apache.lucene.util.DocIdSetBuilder) BytesRef(org.apache.lucene.util.BytesRef)

Example 13 with TermIterator

use of org.apache.lucene.index.PrefixCodedTerms.TermIterator in project lucene-solr by apache.

the class TermInSetQuery method toString.

@Override
public String toString(String defaultField) {
    StringBuilder builder = new StringBuilder();
    boolean first = true;
    TermIterator iterator = termData.iterator();
    for (BytesRef term = iterator.next(); term != null; term = iterator.next()) {
        if (!first) {
            builder.append(' ');
        }
        first = false;
        builder.append(new Term(iterator.field(), term).toString());
    }
    return builder.toString();
}
Also used : TermIterator(org.apache.lucene.index.PrefixCodedTerms.TermIterator) Term(org.apache.lucene.index.Term) BytesRef(org.apache.lucene.util.BytesRef)

Example 14 with TermIterator

use of org.apache.lucene.index.PrefixCodedTerms.TermIterator in project lucene-solr by apache.

the class TermInSetQuery method createWeight.

@Override
public Weight createWeight(IndexSearcher searcher, boolean needsScores, float boost) throws IOException {
    return new ConstantScoreWeight(this, boost) {

        @Override
        public void extractTerms(Set<Term> terms) {
        // no-op
        // This query is for abuse cases when the number of terms is too high to
        // run efficiently as a BooleanQuery. So likewise we hide its terms in
        // order to protect highlighters
        }

        /**
       * On the given leaf context, try to either rewrite to a disjunction if
       * there are few matching terms, or build a bitset containing matching docs.
       */
        private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOException {
            final LeafReader reader = context.reader();
            final Fields fields = reader.fields();
            Terms terms = fields.terms(field);
            if (terms == null) {
                return null;
            }
            TermsEnum termsEnum = terms.iterator();
            PostingsEnum docs = null;
            TermIterator iterator = termData.iterator();
            // We will first try to collect up to 'threshold' terms into 'matchingTerms'
            // if there are two many terms, we will fall back to building the 'builder'
            final int threshold = Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD, BooleanQuery.getMaxClauseCount());
            assert termData.size() > threshold : "Query should have been rewritten";
            List<TermAndState> matchingTerms = new ArrayList<>(threshold);
            DocIdSetBuilder builder = null;
            for (BytesRef term = iterator.next(); term != null; term = iterator.next()) {
                assert field.equals(iterator.field());
                if (termsEnum.seekExact(term)) {
                    if (matchingTerms == null) {
                        docs = termsEnum.postings(docs, PostingsEnum.NONE);
                        builder.add(docs);
                    } else if (matchingTerms.size() < threshold) {
                        matchingTerms.add(new TermAndState(field, termsEnum));
                    } else {
                        assert matchingTerms.size() == threshold;
                        builder = new DocIdSetBuilder(reader.maxDoc(), terms);
                        docs = termsEnum.postings(docs, PostingsEnum.NONE);
                        builder.add(docs);
                        for (TermAndState t : matchingTerms) {
                            t.termsEnum.seekExact(t.term, t.state);
                            docs = t.termsEnum.postings(docs, PostingsEnum.NONE);
                            builder.add(docs);
                        }
                        matchingTerms = null;
                    }
                }
            }
            if (matchingTerms != null) {
                assert builder == null;
                BooleanQuery.Builder bq = new BooleanQuery.Builder();
                for (TermAndState t : matchingTerms) {
                    final TermContext termContext = new TermContext(searcher.getTopReaderContext());
                    termContext.register(t.state, context.ord, t.docFreq, t.totalTermFreq);
                    bq.add(new TermQuery(new Term(t.field, t.term), termContext), Occur.SHOULD);
                }
                Query q = new ConstantScoreQuery(bq.build());
                final Weight weight = searcher.rewrite(q).createWeight(searcher, needsScores, score());
                return new WeightOrDocIdSet(weight);
            } else {
                assert builder != null;
                return new WeightOrDocIdSet(builder.build());
            }
        }

        private Scorer scorer(DocIdSet set) throws IOException {
            if (set == null) {
                return null;
            }
            final DocIdSetIterator disi = set.iterator();
            if (disi == null) {
                return null;
            }
            return new ConstantScoreScorer(this, score(), disi);
        }

        @Override
        public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {
            final WeightOrDocIdSet weightOrBitSet = rewrite(context);
            if (weightOrBitSet == null) {
                return null;
            } else if (weightOrBitSet.weight != null) {
                return weightOrBitSet.weight.bulkScorer(context);
            } else {
                final Scorer scorer = scorer(weightOrBitSet.set);
                if (scorer == null) {
                    return null;
                }
                return new DefaultBulkScorer(scorer);
            }
        }

        @Override
        public Scorer scorer(LeafReaderContext context) throws IOException {
            final WeightOrDocIdSet weightOrBitSet = rewrite(context);
            if (weightOrBitSet == null) {
                return null;
            } else if (weightOrBitSet.weight != null) {
                return weightOrBitSet.weight.scorer(context);
            } else {
                return scorer(weightOrBitSet.set);
            }
        }
    };
}
Also used : SortedSet(java.util.SortedSet) Set(java.util.Set) DocIdSetBuilder(org.apache.lucene.util.DocIdSetBuilder) BytesRefBuilder(org.apache.lucene.util.BytesRefBuilder) ArrayList(java.util.ArrayList) TermContext(org.apache.lucene.index.TermContext) TermsEnum(org.apache.lucene.index.TermsEnum) LeafReaderContext(org.apache.lucene.index.LeafReaderContext) PostingsEnum(org.apache.lucene.index.PostingsEnum) BytesRef(org.apache.lucene.util.BytesRef) LeafReader(org.apache.lucene.index.LeafReader) PrefixCodedTerms(org.apache.lucene.index.PrefixCodedTerms) Terms(org.apache.lucene.index.Terms) TermIterator(org.apache.lucene.index.PrefixCodedTerms.TermIterator) Term(org.apache.lucene.index.Term) Fields(org.apache.lucene.index.Fields) DocIdSetBuilder(org.apache.lucene.util.DocIdSetBuilder)

Aggregations

TermIterator (org.apache.lucene.index.PrefixCodedTerms.TermIterator)14 BytesRef (org.apache.lucene.util.BytesRef)8 Term (org.apache.lucene.index.Term)4 BytesRefBuilder (org.apache.lucene.util.BytesRefBuilder)4 LeafReaderContext (org.apache.lucene.index.LeafReaderContext)3 DocIdSetBuilder (org.apache.lucene.util.DocIdSetBuilder)3 HashSet (java.util.HashSet)2 AtomicInteger (java.util.concurrent.atomic.AtomicInteger)2 DeleteSlice (org.apache.lucene.index.DocumentsWriterDeleteQueue.DeleteSlice)2 LeafReader (org.apache.lucene.index.LeafReader)2 IOException (java.io.IOException)1 ArrayList (java.util.ArrayList)1 Set (java.util.Set)1 SortedSet (java.util.SortedSet)1 TreeSet (java.util.TreeSet)1 CountDownLatch (java.util.concurrent.CountDownLatch)1 Fields (org.apache.lucene.index.Fields)1 PointValues (org.apache.lucene.index.PointValues)1 PostingsEnum (org.apache.lucene.index.PostingsEnum)1 PrefixCodedTerms (org.apache.lucene.index.PrefixCodedTerms)1