Search in sources :

Example 1 with BKDReader

use of org.apache.lucene.util.bkd.BKDReader in project lucene-solr by apache.

the class LatLonPoint method nearest.

/**
   * Finds the {@code n} nearest indexed points to the provided point, according to Haversine distance.
   * <p>
   * This is functionally equivalent to running {@link MatchAllDocsQuery} with a {@link LatLonDocValuesField#newDistanceSort},
   * but is far more efficient since it takes advantage of properties the indexed BKD tree.  Currently this
   * only works with {@link Lucene60PointsFormat} (used by the default codec).  Multi-valued fields are
   * currently not de-duplicated, so if a document had multiple instances of the specified field that
   * make it into the top n, that document will appear more than once.
   * <p>
   * Documents are ordered by ascending distance from the location. The value returned in {@link FieldDoc} for
   * the hits contains a Double instance with the distance in meters.
   * 
   * @param searcher IndexSearcher to find nearest points from.
   * @param field field name. must not be null.
   * @param latitude latitude at the center: must be within standard +/-90 coordinate bounds.
   * @param longitude longitude at the center: must be within standard +/-180 coordinate bounds.
   * @param n the number of nearest neighbors to retrieve.
   * @return TopFieldDocs containing documents ordered by distance, where the field value for each {@link FieldDoc} is the distance in meters
   * @throws IllegalArgumentException if the underlying PointValues is not a {@code Lucene60PointsReader} (this is a current limitation), or
   *         if {@code field} or {@code searcher} is null, or if {@code latitude}, {@code longitude} or {@code n} are out-of-bounds
   * @throws IOException if an IOException occurs while finding the points.
   */
// TODO: what about multi-valued documents? what happens?
public static TopFieldDocs nearest(IndexSearcher searcher, String field, double latitude, double longitude, int n) throws IOException {
    GeoUtils.checkLatitude(latitude);
    GeoUtils.checkLongitude(longitude);
    if (n < 1) {
        throw new IllegalArgumentException("n must be at least 1; got " + n);
    }
    if (field == null) {
        throw new IllegalArgumentException("field must not be null");
    }
    if (searcher == null) {
        throw new IllegalArgumentException("searcher must not be null");
    }
    List<BKDReader> readers = new ArrayList<>();
    List<Integer> docBases = new ArrayList<>();
    List<Bits> liveDocs = new ArrayList<>();
    int totalHits = 0;
    for (LeafReaderContext leaf : searcher.getIndexReader().leaves()) {
        PointValues points = leaf.reader().getPointValues(field);
        if (points != null) {
            if (points instanceof BKDReader == false) {
                throw new IllegalArgumentException("can only run on Lucene60PointsReader points implementation, but got " + points);
            }
            totalHits += points.getDocCount();
            BKDReader reader = (BKDReader) points;
            if (reader != null) {
                readers.add(reader);
                docBases.add(leaf.docBase);
                liveDocs.add(leaf.reader().getLiveDocs());
            }
        }
    }
    NearestNeighbor.NearestHit[] hits = NearestNeighbor.nearest(latitude, longitude, readers, liveDocs, docBases, n);
    // Convert to TopFieldDocs:
    ScoreDoc[] scoreDocs = new ScoreDoc[hits.length];
    for (int i = 0; i < hits.length; i++) {
        NearestNeighbor.NearestHit hit = hits[i];
        scoreDocs[i] = new FieldDoc(hit.docID, 0.0f, new Object[] { Double.valueOf(hit.distanceMeters) });
    }
    return new TopFieldDocs(totalHits, scoreDocs, null, 0.0f);
}
Also used : FieldDoc(org.apache.lucene.search.FieldDoc) ArrayList(java.util.ArrayList) TopFieldDocs(org.apache.lucene.search.TopFieldDocs) ScoreDoc(org.apache.lucene.search.ScoreDoc) BKDReader(org.apache.lucene.util.bkd.BKDReader) PointValues(org.apache.lucene.index.PointValues) Bits(org.apache.lucene.util.Bits) LeafReaderContext(org.apache.lucene.index.LeafReaderContext)

Example 2 with BKDReader

use of org.apache.lucene.util.bkd.BKDReader in project lucene-solr by apache.

the class NearestNeighbor method nearest.

// TODO: can we somehow share more with, or simply directly use, the LatLonPointDistanceComparator?  It's really doing the same thing as
// our hitQueue...
public static NearestHit[] nearest(double pointLat, double pointLon, List<BKDReader> readers, List<Bits> liveDocs, List<Integer> docBases, final int n) throws IOException {
    //System.out.println("NEAREST: readers=" + readers + " liveDocs=" + liveDocs + " pointLat=" + pointLat + " pointLon=" + pointLon);
    // Holds closest collected points seen so far:
    // TODO: if we used lucene's PQ we could just updateTop instead of poll/offer:
    final PriorityQueue<NearestHit> hitQueue = new PriorityQueue<>(n, new Comparator<NearestHit>() {

        @Override
        public int compare(NearestHit a, NearestHit b) {
            // sort by opposite distanceMeters natural order
            int cmp = Double.compare(a.distanceMeters, b.distanceMeters);
            if (cmp != 0) {
                return -cmp;
            }
            // tie-break by higher docID:
            return b.docID - a.docID;
        }
    });
    // Holds all cells, sorted by closest to the point:
    PriorityQueue<Cell> cellQueue = new PriorityQueue<>();
    NearestVisitor visitor = new NearestVisitor(hitQueue, n, pointLat, pointLon);
    List<BKDReader.IntersectState> states = new ArrayList<>();
    // Add root cell for each reader into the queue:
    int bytesPerDim = -1;
    for (int i = 0; i < readers.size(); i++) {
        BKDReader reader = readers.get(i);
        if (bytesPerDim == -1) {
            bytesPerDim = reader.getBytesPerDimension();
        } else if (bytesPerDim != reader.getBytesPerDimension()) {
            throw new IllegalStateException("bytesPerDim changed from " + bytesPerDim + " to " + reader.getBytesPerDimension() + " across readers");
        }
        byte[] minPackedValue = reader.getMinPackedValue();
        byte[] maxPackedValue = reader.getMaxPackedValue();
        IntersectState state = reader.getIntersectState(visitor);
        states.add(state);
        cellQueue.offer(new Cell(state.index, i, reader.getMinPackedValue(), reader.getMaxPackedValue(), approxBestDistance(minPackedValue, maxPackedValue, pointLat, pointLon)));
    }
    while (cellQueue.size() > 0) {
        Cell cell = cellQueue.poll();
        //System.out.println("  visit " + cell);
        // TODO: if we replace approxBestDistance with actualBestDistance, we can put an opto here to break once this "best" cell is fully outside of the hitQueue bottom's radius:
        BKDReader reader = readers.get(cell.readerIndex);
        if (cell.index.isLeafNode()) {
            //System.out.println("    leaf");
            // Leaf block: visit all points and possibly collect them:
            visitor.curDocBase = docBases.get(cell.readerIndex);
            visitor.curLiveDocs = liveDocs.get(cell.readerIndex);
            reader.visitLeafBlockValues(cell.index, states.get(cell.readerIndex));
        //System.out.println("    now " + hitQueue.size() + " hits");
        } else {
            //System.out.println("    non-leaf");
            // Non-leaf block: split into two cells and put them back into the queue:
            double cellMinLat = decodeLatitude(cell.minPacked, 0);
            double cellMinLon = decodeLongitude(cell.minPacked, Integer.BYTES);
            double cellMaxLat = decodeLatitude(cell.maxPacked, 0);
            double cellMaxLon = decodeLongitude(cell.maxPacked, Integer.BYTES);
            if (cellMaxLat < visitor.minLat || visitor.maxLat < cellMinLat || ((cellMaxLon < visitor.minLon || visitor.maxLon < cellMinLon) && cellMaxLon < visitor.minLon2)) {
                // this cell is outside our search bbox; don't bother exploring any more
                continue;
            }
            BytesRef splitValue = BytesRef.deepCopyOf(cell.index.getSplitDimValue());
            int splitDim = cell.index.getSplitDim();
            // we must clone the index so that we we can recurse left and right "concurrently":
            IndexTree newIndex = cell.index.clone();
            byte[] splitPackedValue = cell.maxPacked.clone();
            System.arraycopy(splitValue.bytes, splitValue.offset, splitPackedValue, splitDim * bytesPerDim, bytesPerDim);
            cell.index.pushLeft();
            cellQueue.offer(new Cell(cell.index, cell.readerIndex, cell.minPacked, splitPackedValue, approxBestDistance(cell.minPacked, splitPackedValue, pointLat, pointLon)));
            splitPackedValue = cell.minPacked.clone();
            System.arraycopy(splitValue.bytes, splitValue.offset, splitPackedValue, splitDim * bytesPerDim, bytesPerDim);
            newIndex.pushRight();
            cellQueue.offer(new Cell(newIndex, cell.readerIndex, splitPackedValue, cell.maxPacked, approxBestDistance(splitPackedValue, cell.maxPacked, pointLat, pointLon)));
        }
    }
    NearestHit[] hits = new NearestHit[hitQueue.size()];
    int downTo = hitQueue.size() - 1;
    while (hitQueue.size() != 0) {
        hits[downTo] = hitQueue.poll();
        downTo--;
    }
    return hits;
}
Also used : IndexTree(org.apache.lucene.util.bkd.BKDReader.IndexTree) IntersectState(org.apache.lucene.util.bkd.BKDReader.IntersectState) ArrayList(java.util.ArrayList) PriorityQueue(java.util.PriorityQueue) BKDReader(org.apache.lucene.util.bkd.BKDReader) BytesRef(org.apache.lucene.util.BytesRef)

Example 3 with BKDReader

use of org.apache.lucene.util.bkd.BKDReader in project lucene-solr by apache.

the class Lucene60PointsWriter method merge.

@Override
public void merge(MergeState mergeState) throws IOException {
    /**
     * If indexSort is activated and some of the leaves are not sorted the next test will catch that and the non-optimized merge will run.
     * If the readers are all sorted then it's safe to perform a bulk merge of the points.
     **/
    for (PointsReader reader : mergeState.pointsReaders) {
        if (reader instanceof Lucene60PointsReader == false) {
            // We can only bulk merge when all to-be-merged segments use our format:
            super.merge(mergeState);
            return;
        }
    }
    for (PointsReader reader : mergeState.pointsReaders) {
        if (reader != null) {
            reader.checkIntegrity();
        }
    }
    for (FieldInfo fieldInfo : mergeState.mergeFieldInfos) {
        if (fieldInfo.getPointDimensionCount() != 0) {
            if (fieldInfo.getPointDimensionCount() == 1) {
                boolean singleValuePerDoc = true;
                // Worst case total maximum size (if none of the points are deleted):
                long totMaxSize = 0;
                for (int i = 0; i < mergeState.pointsReaders.length; i++) {
                    PointsReader reader = mergeState.pointsReaders[i];
                    if (reader != null) {
                        FieldInfos readerFieldInfos = mergeState.fieldInfos[i];
                        FieldInfo readerFieldInfo = readerFieldInfos.fieldInfo(fieldInfo.name);
                        if (readerFieldInfo != null && readerFieldInfo.getPointDimensionCount() > 0) {
                            PointValues values = reader.getValues(fieldInfo.name);
                            if (values != null) {
                                totMaxSize += values.size();
                                singleValuePerDoc &= values.size() == values.getDocCount();
                            }
                        }
                    }
                }
                // we were simply reindexing them:
                try (BKDWriter writer = new BKDWriter(writeState.segmentInfo.maxDoc(), writeState.directory, writeState.segmentInfo.name, fieldInfo.getPointDimensionCount(), fieldInfo.getPointNumBytes(), maxPointsInLeafNode, maxMBSortInHeap, totMaxSize, singleValuePerDoc)) {
                    List<BKDReader> bkdReaders = new ArrayList<>();
                    List<MergeState.DocMap> docMaps = new ArrayList<>();
                    for (int i = 0; i < mergeState.pointsReaders.length; i++) {
                        PointsReader reader = mergeState.pointsReaders[i];
                        if (reader != null) {
                            // we confirmed this up above
                            assert reader instanceof Lucene60PointsReader;
                            Lucene60PointsReader reader60 = (Lucene60PointsReader) reader;
                            // NOTE: we cannot just use the merged fieldInfo.number (instead of resolving to this
                            // reader's FieldInfo as we do below) because field numbers can easily be different
                            // when addIndexes(Directory...) copies over segments from another index:
                            FieldInfos readerFieldInfos = mergeState.fieldInfos[i];
                            FieldInfo readerFieldInfo = readerFieldInfos.fieldInfo(fieldInfo.name);
                            if (readerFieldInfo != null && readerFieldInfo.getPointDimensionCount() > 0) {
                                BKDReader bkdReader = reader60.readers.get(readerFieldInfo.number);
                                if (bkdReader != null) {
                                    bkdReaders.add(bkdReader);
                                    docMaps.add(mergeState.docMaps[i]);
                                }
                            }
                        }
                    }
                    long fp = writer.merge(dataOut, docMaps, bkdReaders);
                    if (fp != -1) {
                        indexFPs.put(fieldInfo.name, fp);
                    }
                }
            } else {
                mergeOneField(mergeState, fieldInfo);
            }
        }
    }
    finish();
}
Also used : ArrayList(java.util.ArrayList) FieldInfos(org.apache.lucene.index.FieldInfos) BKDReader(org.apache.lucene.util.bkd.BKDReader) MutablePointValues(org.apache.lucene.codecs.MutablePointValues) PointValues(org.apache.lucene.index.PointValues) PointsReader(org.apache.lucene.codecs.PointsReader) BKDWriter(org.apache.lucene.util.bkd.BKDWriter) FieldInfo(org.apache.lucene.index.FieldInfo)

Aggregations

ArrayList (java.util.ArrayList)3 BKDReader (org.apache.lucene.util.bkd.BKDReader)3 PointValues (org.apache.lucene.index.PointValues)2 PriorityQueue (java.util.PriorityQueue)1 MutablePointValues (org.apache.lucene.codecs.MutablePointValues)1 PointsReader (org.apache.lucene.codecs.PointsReader)1 FieldInfo (org.apache.lucene.index.FieldInfo)1 FieldInfos (org.apache.lucene.index.FieldInfos)1 LeafReaderContext (org.apache.lucene.index.LeafReaderContext)1 FieldDoc (org.apache.lucene.search.FieldDoc)1 ScoreDoc (org.apache.lucene.search.ScoreDoc)1 TopFieldDocs (org.apache.lucene.search.TopFieldDocs)1 Bits (org.apache.lucene.util.Bits)1 BytesRef (org.apache.lucene.util.BytesRef)1 IndexTree (org.apache.lucene.util.bkd.BKDReader.IndexTree)1 IntersectState (org.apache.lucene.util.bkd.BKDReader.IntersectState)1 BKDWriter (org.apache.lucene.util.bkd.BKDWriter)1