Examples with RandomIndexWriter - org.apache.lucene.index.RandomIndexWriter

Example 1 with RandomIndexWriter

use of org.apache.lucene.index.RandomIndexWriter in project elasticsearch by elastic.

the class RecoverySourceHandlerTests method testHandleExceptinoOnSendSendFiles.

public void testHandleExceptinoOnSendSendFiles() throws Throwable {
    Settings settings = Settings.builder().put("indices.recovery.concurrent_streams", 1).put("indices.recovery.concurrent_small_file_streams", 1).build();
    final RecoverySettings recoverySettings = new RecoverySettings(settings, service);
    final StartRecoveryRequest request = new StartRecoveryRequest(shardId, new DiscoveryNode("b", buildNewFakeTransportAddress(), emptyMap(), emptySet(), Version.CURRENT), new DiscoveryNode("b", buildNewFakeTransportAddress(), emptyMap(), emptySet(), Version.CURRENT), null, randomBoolean(), randomNonNegativeLong(), randomBoolean() ? SequenceNumbersService.UNASSIGNED_SEQ_NO : 0L);
    Path tempDir = createTempDir();
    Store store = newStore(tempDir, false);
    AtomicBoolean failedEngine = new AtomicBoolean(false);
    RecoverySourceHandler handler = new RecoverySourceHandler(null, null, request, () -> 0L, e -> () -> {
    }, recoverySettings.getChunkSize().bytesAsInt(), Settings.EMPTY) {

        @Override
        protected void failEngine(IOException cause) {
            assertFalse(failedEngine.get());
            failedEngine.set(true);
        }
    };
    Directory dir = store.directory();
    RandomIndexWriter writer = new RandomIndexWriter(random(), dir, newIndexWriterConfig());
    int numDocs = randomIntBetween(10, 100);
    for (int i = 0; i < numDocs; i++) {
        Document document = new Document();
        document.add(new StringField("id", Integer.toString(i), Field.Store.YES));
        document.add(newField("field", randomUnicodeOfCodepointLengthBetween(1, 10), TextField.TYPE_STORED));
        writer.addDocument(document);
    }
    writer.commit();
    writer.close();
    Store.MetadataSnapshot metadata = store.getMetadata(null);
    List<StoreFileMetaData> metas = new ArrayList<>();
    for (StoreFileMetaData md : metadata) {
        metas.add(md);
    }
    final boolean throwCorruptedIndexException = randomBoolean();
    Store targetStore = newStore(createTempDir(), false);
    try {
        handler.sendFiles(store, metas.toArray(new StoreFileMetaData[0]), (md) -> {
            if (throwCorruptedIndexException) {
                throw new RuntimeException(new CorruptIndexException("foo", "bar"));
            } else {
                throw new RuntimeException("boom");
            }
        });
        fail("exception index");
    } catch (RuntimeException ex) {
        assertNull(ExceptionsHelper.unwrapCorruption(ex));
        if (throwCorruptedIndexException) {
            assertEquals(ex.getMessage(), "[File corruption occurred on recovery but checksums are ok]");
        } else {
            assertEquals(ex.getMessage(), "boom");
        }
    } catch (CorruptIndexException ex) {
        fail("not expected here");
    }
    assertFalse(failedEngine.get());
    IOUtils.close(store, targetStore);
}

Also used : Path(java.nio.file.Path) DiscoveryNode(org.elasticsearch.cluster.node.DiscoveryNode) ArrayList(java.util.ArrayList) Store(org.elasticsearch.index.store.Store) CorruptIndexException(org.apache.lucene.index.CorruptIndexException) IOException(java.io.IOException) Document(org.apache.lucene.document.Document) ParsedDocument(org.elasticsearch.index.mapper.ParsedDocument) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) StoreFileMetaData(org.elasticsearch.index.store.StoreFileMetaData) StringField(org.apache.lucene.document.StringField) Settings(org.elasticsearch.common.settings.Settings) IndexSettings(org.elasticsearch.index.IndexSettings) ClusterSettings(org.elasticsearch.common.settings.ClusterSettings) RandomIndexWriter(org.apache.lucene.index.RandomIndexWriter) Directory(org.apache.lucene.store.Directory)

Example 2 with RandomIndexWriter

use of org.apache.lucene.index.RandomIndexWriter in project elasticsearch by elastic.

the class CollapsingTopDocsCollectorTests method assertSearchCollapse.

private <T extends Comparable> void assertSearchCollapse(CollapsingDocValuesProducer<T> dvProducers, boolean numeric, boolean multivalued) throws IOException {
    final int numDocs = randomIntBetween(1000, 2000);
    int maxGroup = randomIntBetween(2, 500);
    final Directory dir = newDirectory();
    final RandomIndexWriter w = new RandomIndexWriter(random(), dir);
    Set<T> values = new HashSet<>();
    int totalHits = 0;
    for (int i = 0; i < numDocs; i++) {
        final T value = dvProducers.randomGroup(maxGroup);
        values.add(value);
        Document doc = new Document();
        dvProducers.add(doc, value, multivalued);
        doc.add(new NumericDocValuesField("sort1", randomIntBetween(0, 10)));
        doc.add(new NumericDocValuesField("sort2", randomLong()));
        w.addDocument(doc);
        totalHits++;
    }
    List<T> valueList = new ArrayList<>(values);
    Collections.sort(valueList);
    final IndexReader reader = w.getReader();
    final IndexSearcher searcher = newSearcher(reader);
    final SortField collapseField = dvProducers.sortField(multivalued);
    final SortField sort1 = new SortField("sort1", SortField.Type.INT);
    final SortField sort2 = new SortField("sort2", SortField.Type.LONG);
    Sort sort = new Sort(sort1, sort2, collapseField);
    int expectedNumGroups = values.size();
    final CollapsingTopDocsCollector collapsingCollector;
    if (numeric) {
        collapsingCollector = CollapsingTopDocsCollector.createNumeric(collapseField.getField(), sort, expectedNumGroups, false);
    } else {
        collapsingCollector = CollapsingTopDocsCollector.createKeyword(collapseField.getField(), sort, expectedNumGroups, false);
    }
    TopFieldCollector topFieldCollector = TopFieldCollector.create(sort, totalHits, true, false, false);
    searcher.search(new MatchAllDocsQuery(), collapsingCollector);
    searcher.search(new MatchAllDocsQuery(), topFieldCollector);
    CollapseTopFieldDocs collapseTopFieldDocs = collapsingCollector.getTopDocs();
    TopFieldDocs topDocs = topFieldCollector.topDocs();
    assertEquals(collapseField.getField(), collapseTopFieldDocs.field);
    assertEquals(expectedNumGroups, collapseTopFieldDocs.scoreDocs.length);
    assertEquals(totalHits, collapseTopFieldDocs.totalHits);
    assertEquals(totalHits, topDocs.scoreDocs.length);
    assertEquals(totalHits, topDocs.totalHits);
    Set<Object> seen = new HashSet<>();
    // collapse field is the last sort
    int collapseIndex = sort.getSort().length - 1;
    int topDocsIndex = 0;
    for (int i = 0; i < expectedNumGroups; i++) {
        FieldDoc fieldDoc = null;
        for (; topDocsIndex < totalHits; topDocsIndex++) {
            fieldDoc = (FieldDoc) topDocs.scoreDocs[topDocsIndex];
            if (seen.contains(fieldDoc.fields[collapseIndex]) == false) {
                break;
            }
        }
        FieldDoc collapseFieldDoc = (FieldDoc) collapseTopFieldDocs.scoreDocs[i];
        assertNotNull(fieldDoc);
        assertEquals(collapseFieldDoc.doc, fieldDoc.doc);
        assertArrayEquals(collapseFieldDoc.fields, fieldDoc.fields);
        seen.add(fieldDoc.fields[fieldDoc.fields.length - 1]);
    }
    for (; topDocsIndex < totalHits; topDocsIndex++) {
        FieldDoc fieldDoc = (FieldDoc) topDocs.scoreDocs[topDocsIndex];
        assertTrue(seen.contains(fieldDoc.fields[collapseIndex]));
    }
    // check merge
    final IndexReaderContext ctx = searcher.getTopReaderContext();
    final SegmentSearcher[] subSearchers;
    final int[] docStarts;
    if (ctx instanceof LeafReaderContext) {
        subSearchers = new SegmentSearcher[1];
        docStarts = new int[1];
        subSearchers[0] = new SegmentSearcher((LeafReaderContext) ctx, ctx);
        docStarts[0] = 0;
    } else {
        final CompositeReaderContext compCTX = (CompositeReaderContext) ctx;
        final int size = compCTX.leaves().size();
        subSearchers = new SegmentSearcher[size];
        docStarts = new int[size];
        int docBase = 0;
        for (int searcherIDX = 0; searcherIDX < subSearchers.length; searcherIDX++) {
            final LeafReaderContext leave = compCTX.leaves().get(searcherIDX);
            subSearchers[searcherIDX] = new SegmentSearcher(leave, compCTX);
            docStarts[searcherIDX] = docBase;
            docBase += leave.reader().maxDoc();
        }
    }
    final CollapseTopFieldDocs[] shardHits = new CollapseTopFieldDocs[subSearchers.length];
    final Weight weight = searcher.createNormalizedWeight(new MatchAllDocsQuery(), false);
    for (int shardIDX = 0; shardIDX < subSearchers.length; shardIDX++) {
        final SegmentSearcher subSearcher = subSearchers[shardIDX];
        final CollapsingTopDocsCollector c;
        if (numeric) {
            c = CollapsingTopDocsCollector.createNumeric(collapseField.getField(), sort, expectedNumGroups, false);
        } else {
            c = CollapsingTopDocsCollector.createKeyword(collapseField.getField(), sort, expectedNumGroups, false);
        }
        subSearcher.search(weight, c);
        shardHits[shardIDX] = c.getTopDocs();
    }
    CollapseTopFieldDocs mergedFieldDocs = CollapseTopFieldDocs.merge(sort, 0, expectedNumGroups, shardHits);
    assertTopDocsEquals(mergedFieldDocs, collapseTopFieldDocs);
    w.close();
    reader.close();
    dir.close();
}

Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) FieldDoc(org.apache.lucene.search.FieldDoc) ArrayList(java.util.ArrayList) CollapseTopFieldDocs(org.apache.lucene.search.grouping.CollapseTopFieldDocs) TopFieldDocs(org.apache.lucene.search.TopFieldDocs) SortField(org.apache.lucene.search.SortField) SortedSetSortField(org.apache.lucene.search.SortedSetSortField) SortedNumericSortField(org.apache.lucene.search.SortedNumericSortField) Document(org.apache.lucene.document.Document) SortedNumericDocValuesField(org.apache.lucene.document.SortedNumericDocValuesField) NumericDocValuesField(org.apache.lucene.document.NumericDocValuesField) Sort(org.apache.lucene.search.Sort) LeafReaderContext(org.apache.lucene.index.LeafReaderContext) TopFieldCollector(org.apache.lucene.search.TopFieldCollector) CollapseTopFieldDocs(org.apache.lucene.search.grouping.CollapseTopFieldDocs) Directory(org.apache.lucene.store.Directory) HashSet(java.util.HashSet) MatchAllDocsQuery(org.apache.lucene.search.MatchAllDocsQuery) IndexReaderContext(org.apache.lucene.index.IndexReaderContext) Weight(org.apache.lucene.search.Weight) CompositeReaderContext(org.apache.lucene.index.CompositeReaderContext) IndexReader(org.apache.lucene.index.IndexReader) CollapsingTopDocsCollector(org.apache.lucene.search.grouping.CollapsingTopDocsCollector) RandomIndexWriter(org.apache.lucene.index.RandomIndexWriter)

Example 3 with RandomIndexWriter

use of org.apache.lucene.index.RandomIndexWriter in project elasticsearch by elastic.

the class CustomUnifiedHighlighterTests method assertHighlightOneDoc.

private void assertHighlightOneDoc(String fieldName, String[] inputs, Analyzer analyzer, Query query, Locale locale, BreakIterator breakIterator, int noMatchSize, String[] expectedPassages) throws Exception {
    Directory dir = newDirectory();
    IndexWriterConfig iwc = newIndexWriterConfig(analyzer);
    iwc.setMergePolicy(newTieredMergePolicy(random()));
    RandomIndexWriter iw = new RandomIndexWriter(random(), dir, iwc);
    FieldType ft = new FieldType(TextField.TYPE_STORED);
    ft.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
    ft.freeze();
    Document doc = new Document();
    for (String input : inputs) {
        Field field = new Field(fieldName, "", ft);
        field.setStringValue(input);
        doc.add(field);
    }
    iw.addDocument(doc);
    DirectoryReader reader = iw.getReader();
    IndexSearcher searcher = newSearcher(reader);
    iw.close();
    TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 1, Sort.INDEXORDER);
    assertThat(topDocs.totalHits, equalTo(1));
    String rawValue = Strings.arrayToDelimitedString(inputs, String.valueOf(MULTIVAL_SEP_CHAR));
    CustomUnifiedHighlighter highlighter = new CustomUnifiedHighlighter(searcher, analyzer, new CustomPassageFormatter("<b>", "</b>", new DefaultEncoder()), locale, breakIterator, rawValue, noMatchSize);
    highlighter.setFieldMatcher((name) -> "text".equals(name));
    final Snippet[] snippets = highlighter.highlightField("text", query, topDocs.scoreDocs[0].doc, expectedPassages.length);
    assertEquals(snippets.length, expectedPassages.length);
    for (int i = 0; i < snippets.length; i++) {
        assertEquals(snippets[i].getText(), expectedPassages[i]);
    }
    reader.close();
    dir.close();
}

Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) DirectoryReader(org.apache.lucene.index.DirectoryReader) Snippet(org.apache.lucene.search.highlight.Snippet) Document(org.apache.lucene.document.Document) MatchAllDocsQuery(org.apache.lucene.search.MatchAllDocsQuery) FieldType(org.apache.lucene.document.FieldType) TopDocs(org.apache.lucene.search.TopDocs) Field(org.apache.lucene.document.Field) TextField(org.apache.lucene.document.TextField) DefaultEncoder(org.apache.lucene.search.highlight.DefaultEncoder) RandomIndexWriter(org.apache.lucene.index.RandomIndexWriter) Directory(org.apache.lucene.store.Directory) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig)

Example 4 with RandomIndexWriter

use of org.apache.lucene.index.RandomIndexWriter in project elasticsearch by elastic.

the class MinDocQueryTests method testRandom.

public void testRandom() throws IOException {
    final int numDocs = randomIntBetween(10, 200);
    final Document doc = new Document();
    final Directory dir = newDirectory();
    final RandomIndexWriter w = new RandomIndexWriter(random(), dir);
    for (int i = 0; i < numDocs; ++i) {
        w.addDocument(doc);
    }
    final IndexReader reader = w.getReader();
    final IndexSearcher searcher = newSearcher(reader);
    for (int i = 0; i <= numDocs; ++i) {
        assertEquals(numDocs - i, searcher.count(new MinDocQuery(i)));
    }
    w.close();
    reader.close();
    dir.close();
}

Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) IndexReader(org.apache.lucene.index.IndexReader) Document(org.apache.lucene.document.Document) RandomIndexWriter(org.apache.lucene.index.RandomIndexWriter) Directory(org.apache.lucene.store.Directory)

Example 5 with RandomIndexWriter

use of org.apache.lucene.index.RandomIndexWriter in project elasticsearch by elastic.

the class CustomPostingsHighlighterTests method testCustomPostingsHighlighter.

public void testCustomPostingsHighlighter() throws Exception {
    Directory dir = newDirectory();
    IndexWriterConfig iwc = newIndexWriterConfig(new MockAnalyzer(random()));
    iwc.setMergePolicy(newLogMergePolicy());
    RandomIndexWriter iw = new RandomIndexWriter(random(), dir, iwc);
    FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
    offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
    //good position but only one match
    final String firstValue = "This is a test. Just a test1 highlighting from postings highlighter.";
    Field body = new Field("body", "", offsetsType);
    Document doc = new Document();
    doc.add(body);
    body.setStringValue(firstValue);
    //two matches, not the best snippet due to its length though
    final String secondValue = "This is the second highlighting value to perform highlighting on a longer text that gets scored lower.";
    Field body2 = new Field("body", "", offsetsType);
    doc.add(body2);
    body2.setStringValue(secondValue);
    //two matches and short, will be scored highest
    final String thirdValue = "This is highlighting the third short highlighting value.";
    Field body3 = new Field("body", "", offsetsType);
    doc.add(body3);
    body3.setStringValue(thirdValue);
    //one match, same as first but at the end, will be scored lower due to its position
    final String fourthValue = "Just a test4 highlighting from postings highlighter.";
    Field body4 = new Field("body", "", offsetsType);
    doc.add(body4);
    body4.setStringValue(fourthValue);
    iw.addDocument(doc);
    IndexReader ir = iw.getReader();
    iw.close();
    String firstHlValue = "Just a test1 <b>highlighting</b> from postings highlighter.";
    String secondHlValue = "This is the second <b>highlighting</b> value to perform <b>highlighting</b> on a longer text that gets scored lower.";
    String thirdHlValue = "This is <b>highlighting</b> the third short <b>highlighting</b> value.";
    String fourthHlValue = "Just a test4 <b>highlighting</b> from postings highlighter.";
    IndexSearcher searcher = newSearcher(ir);
    Query query = new TermQuery(new Term("body", "highlighting"));
    TopDocs topDocs = searcher.search(query, 10, Sort.INDEXORDER);
    assertThat(topDocs.totalHits, equalTo(1));
    int docId = topDocs.scoreDocs[0].doc;
    String fieldValue = firstValue + HighlightUtils.PARAGRAPH_SEPARATOR + secondValue + HighlightUtils.PARAGRAPH_SEPARATOR + thirdValue + HighlightUtils.PARAGRAPH_SEPARATOR + fourthValue;
    CustomPostingsHighlighter highlighter = new CustomPostingsHighlighter(null, new CustomPassageFormatter("<b>", "</b>", new DefaultEncoder()), fieldValue, false);
    Snippet[] snippets = highlighter.highlightField("body", query, searcher, docId, 5);
    assertThat(snippets.length, equalTo(4));
    assertThat(snippets[0].getText(), equalTo(firstHlValue));
    assertThat(snippets[1].getText(), equalTo(secondHlValue));
    assertThat(snippets[2].getText(), equalTo(thirdHlValue));
    assertThat(snippets[3].getText(), equalTo(fourthHlValue));
    ir.close();
    dir.close();
}

Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) TermQuery(org.apache.lucene.search.TermQuery) Query(org.apache.lucene.search.Query) TermQuery(org.apache.lucene.search.TermQuery) Term(org.apache.lucene.index.Term) Snippet(org.apache.lucene.search.highlight.Snippet) Document(org.apache.lucene.document.Document) FieldType(org.apache.lucene.document.FieldType) TopDocs(org.apache.lucene.search.TopDocs) Field(org.apache.lucene.document.Field) TextField(org.apache.lucene.document.TextField) MockAnalyzer(org.apache.lucene.analysis.MockAnalyzer) DefaultEncoder(org.apache.lucene.search.highlight.DefaultEncoder) IndexReader(org.apache.lucene.index.IndexReader) RandomIndexWriter(org.apache.lucene.index.RandomIndexWriter) Directory(org.apache.lucene.store.Directory) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig)

Aggregations

RandomIndexWriter (org.apache.lucene.index.RandomIndexWriter)779 Document (org.apache.lucene.document.Document)679 Directory (org.apache.lucene.store.Directory)588 IndexReader (org.apache.lucene.index.IndexReader)510 Term (org.apache.lucene.index.Term)325 IndexSearcher (org.apache.lucene.search.IndexSearcher)294 MockAnalyzer (org.apache.lucene.analysis.MockAnalyzer)220 BytesRef (org.apache.lucene.util.BytesRef)142 Field (org.apache.lucene.document.Field)141 MatchAllDocsQuery (org.apache.lucene.search.MatchAllDocsQuery)136 TopDocs (org.apache.lucene.search.TopDocs)134 TermQuery (org.apache.lucene.search.TermQuery)121 DirectoryReader (org.apache.lucene.index.DirectoryReader)120 IndexWriterConfig (org.apache.lucene.index.IndexWriterConfig)110 ArrayList (java.util.ArrayList)95 StringField (org.apache.lucene.document.StringField)93 Analyzer (org.apache.lucene.analysis.Analyzer)88 BooleanQuery (org.apache.lucene.search.BooleanQuery)88 NumericDocValuesField (org.apache.lucene.document.NumericDocValuesField)77 Test (org.junit.Test)75