Search in sources :

Example 1 with ByteBuffersDirectory

use of org.apache.lucene.store.ByteBuffersDirectory in project neo4j by neo4j.

the class LuceneSchemaIndexPopulatorTest method before.

@BeforeEach
void before() throws IOException {
    directory = new ByteBuffersDirectory();
    DirectoryFactory directoryFactory = new DirectoryFactory.Single(new DirectoryFactory.UncloseableDirectory(directory));
    provider = new LuceneIndexProvider(fs, directoryFactory, directoriesByProvider(testDir.directory("folder")), new Monitors(), Config.defaults(), writable());
    propertyAccessor = mock(NodePropertyAccessor.class);
    IndexSamplingConfig samplingConfig = new IndexSamplingConfig(Config.defaults());
    index = IndexPrototype.forSchema(forLabel(42, propertyKeyId), provider.getProviderDescriptor()).withName("index").materialise(0);
    indexPopulator = provider.getPopulator(index, samplingConfig, heapBufferFactory(1024), INSTANCE, SIMPLE_TOKEN_LOOKUP);
    indexPopulator.create();
}
Also used : IndexSamplingConfig(org.neo4j.kernel.impl.api.index.IndexSamplingConfig) ByteBuffersDirectory(org.apache.lucene.store.ByteBuffersDirectory) DirectoryFactory(org.neo4j.kernel.api.impl.index.storage.DirectoryFactory) Monitors(org.neo4j.monitoring.Monitors) NodePropertyAccessor(org.neo4j.storageengine.api.NodePropertyAccessor) BeforeEach(org.junit.jupiter.api.BeforeEach)

Example 2 with ByteBuffersDirectory

use of org.apache.lucene.store.ByteBuffersDirectory in project crate by crate.

the class LuceneBatchIteratorBenchmark method createLuceneBatchIterator.

@Setup
public void createLuceneBatchIterator() throws Exception {
    IndexWriter iw = new IndexWriter(new ByteBuffersDirectory(), new IndexWriterConfig(new StandardAnalyzer()));
    String columnName = "x";
    for (int i = 0; i < 10_000_000; i++) {
        Document doc = new Document();
        doc.add(new NumericDocValuesField(columnName, i));
        iw.addDocument(doc);
    }
    iw.commit();
    iw.forceMerge(1, true);
    indexSearcher = new IndexSearcher(DirectoryReader.open(iw));
    IntegerColumnReference columnReference = new IntegerColumnReference(columnName);
    columnRefs = Collections.singletonList(columnReference);
    collectorContext = new CollectorContext();
}
Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) NumericDocValuesField(org.apache.lucene.document.NumericDocValuesField) IndexWriter(org.apache.lucene.index.IndexWriter) ByteBuffersDirectory(org.apache.lucene.store.ByteBuffersDirectory) StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer) IntegerColumnReference(io.crate.expression.reference.doc.lucene.IntegerColumnReference) CollectorContext(io.crate.expression.reference.doc.lucene.CollectorContext) Document(org.apache.lucene.document.Document) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig) Setup(org.openjdk.jmh.annotations.Setup)

Example 3 with ByteBuffersDirectory

use of org.apache.lucene.store.ByteBuffersDirectory in project crate by crate.

the class LuceneOrderedDocCollectorTest method testSearchWithScores.

@Test
public void testSearchWithScores() throws Exception {
    IndexWriter w = new IndexWriter(new ByteBuffersDirectory(), new IndexWriterConfig(new KeywordAnalyzer()));
    FieldType fieldType = KeywordFieldMapper.Defaults.FIELD_TYPE;
    for (int i = 0; i < 3; i++) {
        addDoc(w, "x", fieldType, "Arthur");
    }
    // not "Arthur" to lower score
    addDoc(w, "x", fieldType, "Arthur");
    w.commit();
    IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(w, true, true));
    List<LuceneCollectorExpression<?>> columnReferences = Collections.singletonList(new ScoreCollectorExpression());
    Query query = new ConstantScoreQuery(new TermQuery(new Term("x", new BytesRef("Arthur"))));
    LuceneOrderedDocCollector collector = collector(searcher, columnReferences, query, null, true);
    KeyIterable<ShardId, Row> result = collector.collect();
    assertThat(StreamSupport.stream(result.spliterator(), false).count(), is(2L));
    Iterator<Row> values = result.iterator();
    assertThat(values.next().get(0), Matchers.is(1.0F));
    assertThat(values.next().get(0), Matchers.is(1.0F));
}
Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) KeywordAnalyzer(org.apache.lucene.analysis.core.KeywordAnalyzer) TermQuery(org.apache.lucene.search.TermQuery) Query(org.apache.lucene.search.Query) FuzzyQuery(org.apache.lucene.search.FuzzyQuery) MatchAllDocsQuery(org.apache.lucene.search.MatchAllDocsQuery) ConstantScoreQuery(org.apache.lucene.search.ConstantScoreQuery) TermQuery(org.apache.lucene.search.TermQuery) Term(org.apache.lucene.index.Term) FieldType(org.apache.lucene.document.FieldType) ShardId(org.elasticsearch.index.shard.ShardId) IndexWriter(org.apache.lucene.index.IndexWriter) ByteBuffersDirectory(org.apache.lucene.store.ByteBuffersDirectory) ConstantScoreQuery(org.apache.lucene.search.ConstantScoreQuery) Row(io.crate.data.Row) ScoreCollectorExpression(io.crate.expression.reference.doc.lucene.ScoreCollectorExpression) LuceneCollectorExpression(io.crate.expression.reference.doc.lucene.LuceneCollectorExpression) BytesRef(org.apache.lucene.util.BytesRef) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig) RandomizedTest(com.carrotsearch.randomizedtesting.RandomizedTest) Test(org.junit.Test)

Example 4 with ByteBuffersDirectory

use of org.apache.lucene.store.ByteBuffersDirectory in project crate by crate.

the class LuceneOrderedDocCollectorTest method testSearchMoreAppliesMinScoreFilter.

@Test
public void testSearchMoreAppliesMinScoreFilter() throws Exception {
    IndexWriter w = new IndexWriter(new ByteBuffersDirectory(), new IndexWriterConfig(new KeywordAnalyzer()));
    var keywordFieldType = new KeywordFieldMapper.KeywordFieldType("x");
    var fieldType = KeywordFieldMapper.Defaults.FIELD_TYPE;
    for (int i = 0; i < 3; i++) {
        addDoc(w, "x", fieldType, "Arthur");
    }
    // not "Arthur" to lower score
    addDoc(w, "x", fieldType, "Arthurr");
    w.commit();
    IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(w, true, true));
    List<LuceneCollectorExpression<?>> columnReferences = Collections.singletonList(new ScoreCollectorExpression());
    Query query = new FuzzyQuery(new Term("x", "Arthur"), Fuzziness.AUTO.asDistance("Arthur"), 2, 3, true);
    LuceneOrderedDocCollector collector;
    // without minScore filter we get 2 and 2 docs - this is not necessary for the test but is here
    // to make sure the "FuzzyQuery" matches the right documents
    collector = collector(searcher, columnReferences, query, null, true);
    assertThat(StreamSupport.stream(collector.collect().spliterator(), false).count(), is(2L));
    assertThat(StreamSupport.stream(collector.collect().spliterator(), false).count(), is(2L));
    collector = collector(searcher, columnReferences, query, 0.15f, true);
    int count = 0;
    // initialSearch -> 2 rows
    for (Row row : collector.collect()) {
        assertThat((float) row.get(0), Matchers.greaterThanOrEqualTo(0.15f));
        count++;
    }
    assertThat(count, is(2));
    count = 0;
    // searchMore -> 1 row is below minScore
    for (Row row : collector.collect()) {
        assertThat((float) row.get(0), Matchers.greaterThanOrEqualTo(0.15f));
        count++;
    }
    assertThat(count, is(1));
}
Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) KeywordAnalyzer(org.apache.lucene.analysis.core.KeywordAnalyzer) Query(org.apache.lucene.search.Query) FuzzyQuery(org.apache.lucene.search.FuzzyQuery) MatchAllDocsQuery(org.apache.lucene.search.MatchAllDocsQuery) ConstantScoreQuery(org.apache.lucene.search.ConstantScoreQuery) TermQuery(org.apache.lucene.search.TermQuery) Term(org.apache.lucene.index.Term) IndexWriter(org.apache.lucene.index.IndexWriter) ByteBuffersDirectory(org.apache.lucene.store.ByteBuffersDirectory) FuzzyQuery(org.apache.lucene.search.FuzzyQuery) Row(io.crate.data.Row) ScoreCollectorExpression(io.crate.expression.reference.doc.lucene.ScoreCollectorExpression) LuceneCollectorExpression(io.crate.expression.reference.doc.lucene.LuceneCollectorExpression) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig) RandomizedTest(com.carrotsearch.randomizedtesting.RandomizedTest) Test(org.junit.Test)

Example 5 with ByteBuffersDirectory

use of org.apache.lucene.store.ByteBuffersDirectory in project crate by crate.

the class LuceneOrderedDocCollectorTest method testSearchNoScores.

@Test
public void testSearchNoScores() throws Exception {
    IndexWriter w = new IndexWriter(new ByteBuffersDirectory(), new IndexWriterConfig(new KeywordAnalyzer()));
    String name = "x";
    var keywordFieldType = new KeywordFieldMapper.KeywordFieldType(name);
    var fieldType = KeywordFieldMapper.Defaults.FIELD_TYPE;
    for (int i = 0; i < 3; i++) {
        addDoc(w, name, fieldType, "Arthur");
    }
    // not "Arthur" to lower score
    addDoc(w, name, fieldType, "Arthur");
    w.commit();
    IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(w, true, true));
    List<LuceneCollectorExpression<?>> columnReferences = Collections.singletonList(new ScoreCollectorExpression());
    Query query = new TermQuery(new Term(name, new BytesRef("Arthur")));
    LuceneOrderedDocCollector collector = collector(searcher, columnReferences, query, null, false);
    KeyIterable<ShardId, Row> result = collector.collect();
    assertThat(StreamSupport.stream(result.spliterator(), false).count(), is(2L));
    Iterator<Row> values = result.iterator();
    assertThat(values.next().get(0), Matchers.is(Float.NaN));
    assertThat(values.next().get(0), Matchers.is(Float.NaN));
}
Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) KeywordAnalyzer(org.apache.lucene.analysis.core.KeywordAnalyzer) TermQuery(org.apache.lucene.search.TermQuery) Query(org.apache.lucene.search.Query) FuzzyQuery(org.apache.lucene.search.FuzzyQuery) MatchAllDocsQuery(org.apache.lucene.search.MatchAllDocsQuery) ConstantScoreQuery(org.apache.lucene.search.ConstantScoreQuery) TermQuery(org.apache.lucene.search.TermQuery) Term(org.apache.lucene.index.Term) ShardId(org.elasticsearch.index.shard.ShardId) IndexWriter(org.apache.lucene.index.IndexWriter) ByteBuffersDirectory(org.apache.lucene.store.ByteBuffersDirectory) Row(io.crate.data.Row) ScoreCollectorExpression(io.crate.expression.reference.doc.lucene.ScoreCollectorExpression) LuceneCollectorExpression(io.crate.expression.reference.doc.lucene.LuceneCollectorExpression) BytesRef(org.apache.lucene.util.BytesRef) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig) RandomizedTest(com.carrotsearch.randomizedtesting.RandomizedTest) Test(org.junit.Test)

Aggregations

ByteBuffersDirectory (org.apache.lucene.store.ByteBuffersDirectory)13 IndexWriter (org.apache.lucene.index.IndexWriter)12 IndexWriterConfig (org.apache.lucene.index.IndexWriterConfig)12 IndexSearcher (org.apache.lucene.search.IndexSearcher)12 Document (org.apache.lucene.document.Document)9 StandardAnalyzer (org.apache.lucene.analysis.standard.StandardAnalyzer)8 NumericDocValuesField (org.apache.lucene.document.NumericDocValuesField)6 BytesRef (org.apache.lucene.util.BytesRef)5 Test (org.junit.Test)5 SortedSetDocValuesField (org.apache.lucene.document.SortedSetDocValuesField)4 Before (org.junit.Before)4 RandomizedTest (com.carrotsearch.randomizedtesting.RandomizedTest)3 Row (io.crate.data.Row)3 LuceneCollectorExpression (io.crate.expression.reference.doc.lucene.LuceneCollectorExpression)3 ScoreCollectorExpression (io.crate.expression.reference.doc.lucene.ScoreCollectorExpression)3 KeywordAnalyzer (org.apache.lucene.analysis.core.KeywordAnalyzer)3 Term (org.apache.lucene.index.Term)3 ConstantScoreQuery (org.apache.lucene.search.ConstantScoreQuery)3 FuzzyQuery (org.apache.lucene.search.FuzzyQuery)3 MatchAllDocsQuery (org.apache.lucene.search.MatchAllDocsQuery)3