Search in sources :

Example 1 with SoftDeletesRetentionMergePolicy

use of org.apache.lucene.index.SoftDeletesRetentionMergePolicy in project crate by crate.

the class InternalEngine method getIndexWriterConfig.

private IndexWriterConfig getIndexWriterConfig() {
    final IndexWriterConfig iwc = new IndexWriterConfig(engineConfig.getAnalyzer());
    // we by default don't commit on close
    iwc.setCommitOnClose(false);
    iwc.setOpenMode(IndexWriterConfig.OpenMode.APPEND);
    iwc.setIndexDeletionPolicy(combinedDeletionPolicy);
    // with tests.verbose, lucene sets this up: plumb to align with filesystem stream
    boolean verbose = false;
    try {
        verbose = Boolean.parseBoolean(System.getProperty("tests.verbose"));
    } catch (Exception ignore) {
    // ignored
    }
    iwc.setInfoStream(verbose ? InfoStream.getDefault() : new LoggerInfoStream(logger));
    iwc.setMergeScheduler(mergeScheduler);
    // Give us the opportunity to upgrade old segments while performing
    // background merges
    MergePolicy mergePolicy = config().getMergePolicy();
    // always configure soft-deletes field so an engine with soft-deletes disabled can open a Lucene index with soft-deletes.
    iwc.setSoftDeletesField(Lucene.SOFT_DELETES_FIELD);
    if (softDeleteEnabled) {
        mergePolicy = new RecoverySourcePruneMergePolicy(SourceFieldMapper.RECOVERY_SOURCE_NAME, softDeletesPolicy::getRetentionQuery, new SoftDeletesRetentionMergePolicy(Lucene.SOFT_DELETES_FIELD, softDeletesPolicy::getRetentionQuery, new PrunePostingsMergePolicy(mergePolicy, IdFieldMapper.NAME)));
    }
    boolean shuffleForcedMerge = Booleans.parseBoolean(System.getProperty("es.shuffle_forced_merge", Boolean.TRUE.toString()));
    if (shuffleForcedMerge) {
        // We wrap the merge policy for all indices even though it is mostly useful for time-based indices
        // but there should be no overhead for other type of indices so it's simpler than adding a setting
        // to enable it.
        mergePolicy = new ShuffleForcedMergePolicy(mergePolicy);
    }
    iwc.setMergePolicy(new ElasticsearchMergePolicy(mergePolicy));
    iwc.setRAMBufferSizeMB(engineConfig.getIndexingBufferSize().getMbFrac());
    iwc.setCodec(engineConfig.getCodec());
    // always use compound on flush - reduces # of file-handles on refresh
    iwc.setUseCompoundFile(true);
    return iwc;
}
Also used : SoftDeletesRetentionMergePolicy(org.apache.lucene.index.SoftDeletesRetentionMergePolicy) MergePolicy(org.apache.lucene.index.MergePolicy) ElasticsearchMergePolicy(org.elasticsearch.index.shard.ElasticsearchMergePolicy) SoftDeletesRetentionMergePolicy(org.apache.lucene.index.SoftDeletesRetentionMergePolicy) ShuffleForcedMergePolicy(org.apache.lucene.index.ShuffleForcedMergePolicy) ShuffleForcedMergePolicy(org.apache.lucene.index.ShuffleForcedMergePolicy) AlreadyClosedException(org.apache.lucene.store.AlreadyClosedException) LockObtainFailedException(org.apache.lucene.store.LockObtainFailedException) TranslogCorruptedException(org.elasticsearch.index.translog.TranslogCorruptedException) IOException(java.io.IOException) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig) LiveIndexWriterConfig(org.apache.lucene.index.LiveIndexWriterConfig) LoggerInfoStream(org.elasticsearch.common.lucene.LoggerInfoStream) ElasticsearchMergePolicy(org.elasticsearch.index.shard.ElasticsearchMergePolicy)

Example 2 with SoftDeletesRetentionMergePolicy

use of org.apache.lucene.index.SoftDeletesRetentionMergePolicy in project crate by crate.

the class InternalEngineTests method testNoOps.

/*
     * This test tests that a no-op does not generate a new sequence number, that no-ops can advance the local checkpoint, and that no-ops
     * are correctly added to the translog.
     */
@Test
public void testNoOps() throws IOException {
    engine.close();
    InternalEngine noOpEngine = null;
    final int maxSeqNo = randomIntBetween(0, 128);
    final int localCheckpoint = randomIntBetween(0, maxSeqNo);
    try {
        final BiFunction<Long, Long, LocalCheckpointTracker> supplier = (ms, lcp) -> new LocalCheckpointTracker(maxSeqNo, localCheckpoint);
        EngineConfig noopEngineConfig = copy(engine.config(), new SoftDeletesRetentionMergePolicy(Lucene.SOFT_DELETES_FIELD, () -> new MatchAllDocsQuery(), engine.config().getMergePolicy()));
        noOpEngine = new InternalEngine(noopEngineConfig, supplier) {

            @Override
            protected long doGenerateSeqNoForOperation(Operation operation) {
                throw new UnsupportedOperationException();
            }
        };
        noOpEngine.recoverFromTranslog(translogHandler, Long.MAX_VALUE);
        final int gapsFilled = noOpEngine.fillSeqNoGaps(primaryTerm.get());
        final String reason = "filling gaps";
        noOpEngine.noOp(new Engine.NoOp(maxSeqNo + 1, primaryTerm.get(), LOCAL_TRANSLOG_RECOVERY, System.nanoTime(), reason));
        assertThat(noOpEngine.getProcessedLocalCheckpoint(), equalTo((long) (maxSeqNo + 1)));
        assertThat(noOpEngine.getTranslog().stats().getUncommittedOperations(), equalTo(gapsFilled));
        noOpEngine.noOp(new Engine.NoOp(maxSeqNo + 2, primaryTerm.get(), randomFrom(PRIMARY, REPLICA, PEER_RECOVERY), System.nanoTime(), reason));
        assertThat(noOpEngine.getProcessedLocalCheckpoint(), equalTo((long) (maxSeqNo + 2)));
        assertThat(noOpEngine.getTranslog().stats().getUncommittedOperations(), equalTo(gapsFilled + 1));
        // skip to the op that we added to the translog
        Translog.Operation op;
        Translog.Operation last = null;
        try (Translog.Snapshot snapshot = noOpEngine.getTranslog().newSnapshot()) {
            while ((op = snapshot.next()) != null) {
                last = op;
            }
        }
        assertNotNull(last);
        assertThat(last, instanceOf(Translog.NoOp.class));
        final Translog.NoOp noOp = (Translog.NoOp) last;
        assertThat(noOp.seqNo(), equalTo((long) (maxSeqNo + 2)));
        assertThat(noOp.primaryTerm(), equalTo(primaryTerm.get()));
        assertThat(noOp.reason(), equalTo(reason));
        if (engine.engineConfig.getIndexSettings().isSoftDeleteEnabled()) {
            MapperService mapperService = createMapperService("test");
            List<Translog.Operation> operationsFromLucene = readAllOperationsInLucene(noOpEngine, mapperService);
            // fills n gap and 2 manual noop.
            assertThat(operationsFromLucene, hasSize(maxSeqNo + 2 - localCheckpoint));
            for (int i = 0; i < operationsFromLucene.size(); i++) {
                assertThat(operationsFromLucene.get(i), equalTo(new Translog.NoOp(localCheckpoint + 1 + i, primaryTerm.get(), "filling gaps")));
            }
            assertConsistentHistoryBetweenTranslogAndLuceneIndex(noOpEngine, mapperService);
        }
    } finally {
        IOUtils.close(noOpEngine);
    }
}
Also used : ShardId(org.elasticsearch.index.shard.ShardId) NoMergePolicy(org.apache.lucene.index.NoMergePolicy) Arrays(java.util.Arrays) Versions(org.elasticsearch.common.lucene.uid.Versions) LongSupplier(java.util.function.LongSupplier) PEER_RECOVERY(org.elasticsearch.index.engine.Engine.Operation.Origin.PEER_RECOVERY) BigArrays(org.elasticsearch.common.util.BigArrays) IndexSettingsModule(org.elasticsearch.test.IndexSettingsModule) IndexMetadata(org.elasticsearch.cluster.metadata.IndexMetadata) Matchers.not(org.hamcrest.Matchers.not) Term(org.apache.lucene.index.Term) Level(org.apache.logging.log4j.Level) AlreadyClosedException(org.apache.lucene.store.AlreadyClosedException) ElasticsearchDirectoryReader(org.elasticsearch.common.lucene.index.ElasticsearchDirectoryReader) LogEvent(org.apache.logging.log4j.core.LogEvent) ReferenceManager(org.apache.lucene.search.ReferenceManager) ParseContext(org.elasticsearch.index.mapper.ParseContext) CoreMatchers.instanceOf(org.hamcrest.CoreMatchers.instanceOf) SeqNoStats(org.elasticsearch.index.seqno.SeqNoStats) LOCAL_TRANSLOG_RECOVERY(org.elasticsearch.index.engine.Engine.Operation.Origin.LOCAL_TRANSLOG_RECOVERY) RandomNumbers(com.carrotsearch.randomizedtesting.generators.RandomNumbers) MergePolicy(org.apache.lucene.index.MergePolicy) TermsEnum(org.apache.lucene.index.TermsEnum) Matchers.nullValue(org.hamcrest.Matchers.nullValue) Map(java.util.Map) TestTranslog(org.elasticsearch.index.translog.TestTranslog) CheckedRunnable(org.elasticsearch.common.CheckedRunnable) FieldsVisitor(org.elasticsearch.index.fieldvisitor.FieldsVisitor) Path(java.nio.file.Path) REPLICA(org.elasticsearch.index.engine.Engine.Operation.Origin.REPLICA) NumericDocValuesField(org.apache.lucene.document.NumericDocValuesField) Matchers.notNullValue(org.hamcrest.Matchers.notNullValue) SoftDeletesRetentionMergePolicy(org.apache.lucene.index.SoftDeletesRetentionMergePolicy) UUIDs(org.elasticsearch.common.UUIDs) Set(java.util.Set) MatchAllDocsQuery(org.apache.lucene.search.MatchAllDocsQuery) SnapshotMatchers(org.elasticsearch.index.translog.SnapshotMatchers) UncheckedIOException(java.io.UncheckedIOException) PointValues(org.apache.lucene.index.PointValues) CountDownLatch(java.util.concurrent.CountDownLatch) SeqNoFieldMapper(org.elasticsearch.index.mapper.SeqNoFieldMapper) AbstractRunnable(org.elasticsearch.common.util.concurrent.AbstractRunnable) Logger(org.apache.logging.log4j.Logger) Matchers.contains(org.hamcrest.Matchers.contains) Matchers.greaterThan(org.hamcrest.Matchers.greaterThan) ReplicationTracker(org.elasticsearch.index.seqno.ReplicationTracker) Matchers.containsString(org.hamcrest.Matchers.containsString) TestShardRouting(org.elasticsearch.cluster.routing.TestShardRouting) IndexCommit(org.apache.lucene.index.IndexCommit) Tuple(io.crate.common.collections.Tuple) LiveIndexWriterConfig(org.apache.lucene.index.LiveIndexWriterConfig) LogDocMergePolicy(org.apache.lucene.index.LogDocMergePolicy) FixedBitSet(org.apache.lucene.util.FixedBitSet) RegexFilter(org.apache.logging.log4j.core.filter.RegexFilter) CodecService(org.elasticsearch.index.codec.CodecService) Mockito.spy(org.mockito.Mockito.spy) ShardRoutingState(org.elasticsearch.cluster.routing.ShardRoutingState) Supplier(java.util.function.Supplier) CheckedBiConsumer(org.elasticsearch.common.CheckedBiConsumer) ArrayList(java.util.ArrayList) LinkedHashMap(java.util.LinkedHashMap) ToLongBiFunction(java.util.function.ToLongBiFunction) BytesArray(org.elasticsearch.common.bytes.BytesArray) RetentionLease(org.elasticsearch.index.seqno.RetentionLease) RetentionLeases(org.elasticsearch.index.seqno.RetentionLeases) Lock(org.apache.lucene.store.Lock) Store(org.elasticsearch.index.store.Store) Matchers.hasSize(org.hamcrest.Matchers.hasSize) Matchers.isIn(org.hamcrest.Matchers.isIn) Bits(org.apache.lucene.util.Bits) TranslogConfig(org.elasticsearch.index.translog.TranslogConfig) TieredMergePolicy(org.apache.lucene.index.TieredMergePolicy) Loggers(org.elasticsearch.common.logging.Loggers) TopDocs(org.apache.lucene.search.TopDocs) Matchers.greaterThanOrEqualTo(org.hamcrest.Matchers.greaterThanOrEqualTo) LongStream(java.util.stream.LongStream) SequenceNumbers(org.elasticsearch.index.seqno.SequenceNumbers) Files(java.nio.file.Files) IdFieldMapper(org.elasticsearch.index.mapper.IdFieldMapper) AbstractAppender(org.apache.logging.log4j.core.appender.AbstractAppender) DocIdAndSeqNo(org.elasticsearch.common.lucene.uid.VersionsAndSeqNoResolver.DocIdAndSeqNo) IOUtils(io.crate.common.io.IOUtils) SequentialStoredFieldsLeafReader(org.elasticsearch.common.lucene.index.SequentialStoredFieldsLeafReader) IOException(java.io.IOException) BrokenBarrierException(java.util.concurrent.BrokenBarrierException) Test(org.junit.Test) ParsedDocument(org.elasticsearch.index.mapper.ParsedDocument) AtomicLong(java.util.concurrent.atomic.AtomicLong) SourceFieldMapper(org.elasticsearch.index.mapper.SourceFieldMapper) VersionFieldMapper(org.elasticsearch.index.mapper.VersionFieldMapper) Matchers.hasItem(org.hamcrest.Matchers.hasItem) Matchers.sameInstance(org.hamcrest.Matchers.sameInstance) Phaser(java.util.concurrent.Phaser) Matcher(org.hamcrest.Matcher) TextField(org.apache.lucene.document.TextField) TranslogDeletionPolicies.createTranslogDeletionPolicy(org.elasticsearch.index.translog.TranslogDeletionPolicies.createTranslogDeletionPolicy) Lucene87StoredFieldsFormat(org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat) ActionListener(org.elasticsearch.action.ActionListener) Randomness(org.elasticsearch.common.Randomness) Collections.shuffle(java.util.Collections.shuffle) CoreMatchers.is(org.hamcrest.CoreMatchers.is) ElasticsearchException(org.elasticsearch.ElasticsearchException) BiFunction(java.util.function.BiFunction) IndexableField(org.apache.lucene.index.IndexableField) ConcurrentCollections(org.elasticsearch.common.util.concurrent.ConcurrentCollections) StoredField(org.apache.lucene.document.StoredField) Matchers.hasKey(org.hamcrest.Matchers.hasKey) VersionType(org.elasticsearch.index.VersionType) Settings(org.elasticsearch.common.settings.Settings) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) Directory(org.apache.lucene.store.Directory) ThreadPool(org.elasticsearch.threadpool.ThreadPool) LeafReaderContext(org.apache.lucene.index.LeafReaderContext) ShardUtils(org.elasticsearch.index.shard.ShardUtils) TotalHitCountCollector(org.apache.lucene.search.TotalHitCountCollector) ByteSizeValue(org.elasticsearch.common.unit.ByteSizeValue) CyclicBarrier(java.util.concurrent.CyclicBarrier) Terms(org.apache.lucene.index.Terms) Matchers.lessThanOrEqualTo(org.hamcrest.Matchers.lessThanOrEqualTo) BytesRef(org.apache.lucene.util.BytesRef) DirectoryReader(org.apache.lucene.index.DirectoryReader) UNASSIGNED_PRIMARY_TERM(org.elasticsearch.index.seqno.SequenceNumbers.UNASSIGNED_PRIMARY_TERM) UNASSIGNED_SEQ_NO(org.elasticsearch.index.seqno.SequenceNumbers.UNASSIGNED_SEQ_NO) IndexShardRoutingTable(org.elasticsearch.cluster.routing.IndexShardRoutingTable) BytesReference(org.elasticsearch.common.bytes.BytesReference) Collectors(java.util.stream.Collectors) SegmentInfos(org.apache.lucene.index.SegmentInfos) Searcher(org.elasticsearch.index.engine.Engine.Searcher) MapperService(org.elasticsearch.index.mapper.MapperService) Base64(java.util.Base64) List(java.util.List) IndexWriter(org.apache.lucene.index.IndexWriter) Version(org.elasticsearch.Version) MatcherAssert(org.hamcrest.MatcherAssert) Matchers.containsInAnyOrder(org.hamcrest.Matchers.containsInAnyOrder) TriFunction(org.elasticsearch.common.TriFunction) Matchers.equalTo(org.hamcrest.Matchers.equalTo) LeafReader(org.apache.lucene.index.LeafReader) TimeValue(io.crate.common.unit.TimeValue) Queue(java.util.Queue) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig) LogByteSizeMergePolicy(org.apache.lucene.index.LogByteSizeMergePolicy) MockDirectoryWrapper(org.apache.lucene.store.MockDirectoryWrapper) IndexReader(org.apache.lucene.index.IndexReader) IndexSearcher(org.apache.lucene.search.IndexSearcher) ShardRouting(org.elasticsearch.cluster.routing.ShardRouting) LongPoint(org.apache.lucene.document.LongPoint) NumericDocValues(org.apache.lucene.index.NumericDocValues) PRIMARY(org.elasticsearch.index.engine.Engine.Operation.Origin.PRIMARY) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) Document(org.elasticsearch.index.mapper.ParseContext.Document) HashMap(java.util.HashMap) VersionsAndSeqNoResolver(org.elasticsearch.common.lucene.uid.VersionsAndSeqNoResolver) Lucene(org.elasticsearch.common.lucene.Lucene) AtomicReference(java.util.concurrent.atomic.AtomicReference) Function(java.util.function.Function) TransportActions(org.elasticsearch.action.support.TransportActions) Strings(org.elasticsearch.common.Strings) HashSet(java.util.HashSet) LOCAL_RESET(org.elasticsearch.index.engine.Engine.Operation.Origin.LOCAL_RESET) Charset(java.nio.charset.Charset) NoneCircuitBreakerService(org.elasticsearch.indices.breaker.NoneCircuitBreakerService) IndexSettings(org.elasticsearch.index.IndexSettings) LocalCheckpointTracker(org.elasticsearch.index.seqno.LocalCheckpointTracker) IntSupplier(java.util.function.IntSupplier) Matchers.empty(org.hamcrest.Matchers.empty) Iterator(java.util.Iterator) Uid(org.elasticsearch.index.mapper.Uid) Matchers(org.hamcrest.Matchers) Mockito.when(org.mockito.Mockito.when) VersionUtils(org.elasticsearch.test.VersionUtils) TimeUnit(java.util.concurrent.TimeUnit) TermQuery(org.apache.lucene.search.TermQuery) NO_OPS_PERFORMED(org.elasticsearch.index.seqno.SequenceNumbers.NO_OPS_PERFORMED) Field(org.apache.lucene.document.Field) Closeable(java.io.Closeable) Translog(org.elasticsearch.index.translog.Translog) Comparator(java.util.Comparator) Collections(java.util.Collections) LogManager(org.apache.logging.log4j.LogManager) Matchers.containsString(org.hamcrest.Matchers.containsString) MatchAllDocsQuery(org.apache.lucene.search.MatchAllDocsQuery) LongPoint(org.apache.lucene.document.LongPoint) TestTranslog(org.elasticsearch.index.translog.TestTranslog) Translog(org.elasticsearch.index.translog.Translog) LocalCheckpointTracker(org.elasticsearch.index.seqno.LocalCheckpointTracker) SoftDeletesRetentionMergePolicy(org.apache.lucene.index.SoftDeletesRetentionMergePolicy) AtomicLong(java.util.concurrent.atomic.AtomicLong) MapperService(org.elasticsearch.index.mapper.MapperService) Test(org.junit.Test)

Example 3 with SoftDeletesRetentionMergePolicy

use of org.apache.lucene.index.SoftDeletesRetentionMergePolicy in project crate by crate.

the class InternalEngineTests method assertOperationHistoryInLucene.

private void assertOperationHistoryInLucene(List<Engine.Operation> operations) throws IOException {
    final MergePolicy keepSoftDeleteDocsMP = new SoftDeletesRetentionMergePolicy(Lucene.SOFT_DELETES_FIELD, MatchAllDocsQuery::new, engine.config().getMergePolicy());
    Settings.Builder settings = Settings.builder().put(defaultSettings.getSettings()).put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true).put(IndexSettings.INDEX_SOFT_DELETES_RETENTION_OPERATIONS_SETTING.getKey(), randomLongBetween(0, 10));
    final IndexMetadata indexMetadata = IndexMetadata.builder(defaultSettings.getIndexMetadata()).settings(settings).build();
    final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings(indexMetadata);
    Set<Long> expectedSeqNos = new HashSet<>();
    try (Store store = createStore();
        Engine engine = createEngine(config(indexSettings, store, createTempDir(), keepSoftDeleteDocsMP, null))) {
        for (Engine.Operation op : operations) {
            if (op instanceof Engine.Index) {
                Engine.IndexResult indexResult = engine.index((Engine.Index) op);
                assertThat(indexResult.getFailure(), nullValue());
                expectedSeqNos.add(indexResult.getSeqNo());
            } else {
                Engine.DeleteResult deleteResult = engine.delete((Engine.Delete) op);
                assertThat(deleteResult.getFailure(), nullValue());
                expectedSeqNos.add(deleteResult.getSeqNo());
            }
            if (rarely()) {
                engine.refresh("test");
            }
            if (rarely()) {
                engine.flush();
            }
            if (rarely()) {
                engine.forceMerge(true, 1, false, false, false, UUIDs.randomBase64UUID());
            }
        }
        MapperService mapperService = createMapperService("test");
        List<Translog.Operation> actualOps = readAllOperationsInLucene(engine, mapperService);
        assertThat(actualOps.stream().map(o -> o.seqNo()).collect(Collectors.toList()), containsInAnyOrder(expectedSeqNos.toArray()));
        assertConsistentHistoryBetweenTranslogAndLuceneIndex(engine, mapperService);
    }
}
Also used : IndexSettings(org.elasticsearch.index.IndexSettings) Store(org.elasticsearch.index.store.Store) MatchAllDocsQuery(org.apache.lucene.search.MatchAllDocsQuery) SoftDeletesRetentionMergePolicy(org.apache.lucene.index.SoftDeletesRetentionMergePolicy) NoMergePolicy(org.apache.lucene.index.NoMergePolicy) MergePolicy(org.apache.lucene.index.MergePolicy) SoftDeletesRetentionMergePolicy(org.apache.lucene.index.SoftDeletesRetentionMergePolicy) LogDocMergePolicy(org.apache.lucene.index.LogDocMergePolicy) TieredMergePolicy(org.apache.lucene.index.TieredMergePolicy) LogByteSizeMergePolicy(org.apache.lucene.index.LogByteSizeMergePolicy) AtomicLong(java.util.concurrent.atomic.AtomicLong) IndexMetadata(org.elasticsearch.cluster.metadata.IndexMetadata) Settings(org.elasticsearch.common.settings.Settings) IndexSettings(org.elasticsearch.index.IndexSettings) MapperService(org.elasticsearch.index.mapper.MapperService) HashSet(java.util.HashSet)

Example 4 with SoftDeletesRetentionMergePolicy

use of org.apache.lucene.index.SoftDeletesRetentionMergePolicy in project crate by crate.

the class InternalEngineTests method testLookupVersionWithPrunedAwayIds.

/*
     * we are testing an edge case here where we have a fully deleted segment that is retained but has all it's IDs pruned away.
     */
@Test
public void testLookupVersionWithPrunedAwayIds() throws IOException {
    try (Directory dir = newDirectory()) {
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Lucene.STANDARD_ANALYZER);
        indexWriterConfig.setSoftDeletesField(Lucene.SOFT_DELETES_FIELD);
        try (IndexWriter writer = new IndexWriter(dir, indexWriterConfig.setMergePolicy(new SoftDeletesRetentionMergePolicy(Lucene.SOFT_DELETES_FIELD, MatchAllDocsQuery::new, new PrunePostingsMergePolicy(indexWriterConfig.getMergePolicy(), "_id"))))) {
            org.apache.lucene.document.Document doc = new org.apache.lucene.document.Document();
            doc.add(new Field(IdFieldMapper.NAME, "1", IdFieldMapper.Defaults.FIELD_TYPE));
            doc.add(new NumericDocValuesField(VersionFieldMapper.NAME, -1));
            doc.add(new NumericDocValuesField(SeqNoFieldMapper.NAME, 1));
            doc.add(new NumericDocValuesField(SeqNoFieldMapper.PRIMARY_TERM_NAME, 1));
            writer.addDocument(doc);
            writer.flush();
            writer.softUpdateDocument(new Term(IdFieldMapper.NAME, "1"), doc, new NumericDocValuesField(Lucene.SOFT_DELETES_FIELD, 1));
            writer.updateNumericDocValue(new Term(IdFieldMapper.NAME, "1"), Lucene.SOFT_DELETES_FIELD, 1);
            writer.forceMerge(1);
            try (DirectoryReader reader = DirectoryReader.open(writer)) {
                assertEquals(1, reader.leaves().size());
                assertNull(VersionsAndSeqNoResolver.loadDocIdAndVersion(reader, new Term(IdFieldMapper.NAME, "1"), false));
            }
        }
    }
}
Also used : ElasticsearchDirectoryReader(org.elasticsearch.common.lucene.index.ElasticsearchDirectoryReader) DirectoryReader(org.apache.lucene.index.DirectoryReader) Term(org.apache.lucene.index.Term) MatchAllDocsQuery(org.apache.lucene.search.MatchAllDocsQuery) ParsedDocument(org.elasticsearch.index.mapper.ParsedDocument) Document(org.elasticsearch.index.mapper.ParseContext.Document) NumericDocValuesField(org.apache.lucene.document.NumericDocValuesField) TextField(org.apache.lucene.document.TextField) IndexableField(org.apache.lucene.index.IndexableField) StoredField(org.apache.lucene.document.StoredField) Field(org.apache.lucene.document.Field) NumericDocValuesField(org.apache.lucene.document.NumericDocValuesField) IndexWriter(org.apache.lucene.index.IndexWriter) SoftDeletesRetentionMergePolicy(org.apache.lucene.index.SoftDeletesRetentionMergePolicy) Directory(org.apache.lucene.store.Directory) LiveIndexWriterConfig(org.apache.lucene.index.LiveIndexWriterConfig) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig) Test(org.junit.Test)

Example 5 with SoftDeletesRetentionMergePolicy

use of org.apache.lucene.index.SoftDeletesRetentionMergePolicy in project crate by crate.

the class PrunePostingsMergePolicyTests method testPrune.

@Test
public void testPrune() throws IOException {
    try (Directory dir = newDirectory()) {
        IndexWriterConfig iwc = newIndexWriterConfig();
        iwc.setSoftDeletesField("_soft_deletes");
        MergePolicy mp = new SoftDeletesRetentionMergePolicy("_soft_deletes", MatchAllDocsQuery::new, new PrunePostingsMergePolicy(newLogMergePolicy(), "id"));
        iwc.setMergePolicy(new ShuffleForcedMergePolicy(mp));
        boolean sorted = randomBoolean();
        if (sorted) {
            iwc.setIndexSort(new Sort(new SortField("sort", SortField.Type.INT)));
        }
        int numUniqueDocs = randomIntBetween(1, 100);
        int numDocs = randomIntBetween(numUniqueDocs, numUniqueDocs * 5);
        try (IndexWriter writer = new IndexWriter(dir, iwc)) {
            for (int i = 0; i < numDocs; i++) {
                if (rarely()) {
                    writer.flush();
                }
                if (rarely()) {
                    writer.forceMerge(1, false);
                }
                int id = i % numUniqueDocs;
                Document doc = new Document();
                doc.add(new StringField("id", "" + id, Field.Store.NO));
                doc.add(newTextField("text", "the quick brown fox", Field.Store.YES));
                doc.add(new NumericDocValuesField("sort", i));
                writer.softUpdateDocument(new Term("id", "" + id), doc, new NumericDocValuesField("_soft_deletes", 1));
                if (i == 0) {
                    // make sure we have at least 2 segments to ensure we do an actual merge to kick out all postings for
                    // soft deletes
                    writer.flush();
                }
            }
            writer.forceMerge(1);
            try (DirectoryReader reader = DirectoryReader.open(writer)) {
                LeafReader leafReader = reader.leaves().get(0).reader();
                assertEquals(numDocs, leafReader.maxDoc());
                Terms id = leafReader.terms("id");
                TermsEnum iterator = id.iterator();
                for (int i = 0; i < numUniqueDocs; i++) {
                    assertTrue(iterator.seekExact(new BytesRef("" + i)));
                    assertEquals(1, iterator.docFreq());
                }
                iterator = leafReader.terms("text").iterator();
                assertTrue(iterator.seekExact(new BytesRef("quick")));
                assertEquals(leafReader.maxDoc(), iterator.docFreq());
                int numValues = 0;
                NumericDocValues sort = leafReader.getNumericDocValues("sort");
                while (sort.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
                    if (sorted) {
                        assertEquals(sort.docID(), sort.longValue());
                    } else {
                        assertTrue(sort.longValue() >= 0);
                        assertTrue(sort.longValue() < numDocs);
                    }
                    numValues++;
                }
                assertEquals(numValues, numDocs);
            }
            {
                // prune away a single ID
                Document doc = new Document();
                doc.add(new StringField("id", "test", Field.Store.NO));
                writer.deleteDocuments(new Term("id", "test"));
                writer.flush();
                writer.forceMerge(1);
                // delete it
                writer.updateNumericDocValue(new Term("id", "test"), "_soft_deletes", 1);
                writer.flush();
                writer.forceMerge(1);
                try (DirectoryReader reader = DirectoryReader.open(writer)) {
                    LeafReader leafReader = reader.leaves().get(0).reader();
                    assertEquals(numDocs, leafReader.maxDoc());
                    Terms id = leafReader.terms("id");
                    TermsEnum iterator = id.iterator();
                    assertEquals(numUniqueDocs, id.size());
                    for (int i = 0; i < numUniqueDocs; i++) {
                        assertTrue(iterator.seekExact(new BytesRef("" + i)));
                        assertEquals(1, iterator.docFreq());
                    }
                    assertFalse(iterator.seekExact(new BytesRef("test")));
                    iterator = leafReader.terms("text").iterator();
                    assertTrue(iterator.seekExact(new BytesRef("quick")));
                    assertEquals(leafReader.maxDoc(), iterator.docFreq());
                }
            }
            {
                // drop all ids
                // first add a doc such that we can force merge
                Document doc = new Document();
                doc.add(new StringField("id", "" + 0, Field.Store.NO));
                doc.add(newTextField("text", "the quick brown fox", Field.Store.YES));
                doc.add(new NumericDocValuesField("sort", 0));
                writer.softUpdateDocument(new Term("id", "" + 0), doc, new NumericDocValuesField("_soft_deletes", 1));
                for (int i = 0; i < numUniqueDocs; i++) {
                    writer.updateNumericDocValue(new Term("id", "" + i), "_soft_deletes", 1);
                }
                writer.flush();
                writer.forceMerge(1);
                try (DirectoryReader reader = DirectoryReader.open(writer)) {
                    LeafReader leafReader = reader.leaves().get(0).reader();
                    assertEquals(numDocs + 1, leafReader.maxDoc());
                    assertEquals(0, leafReader.numDocs());
                    assertNull(leafReader.terms("id"));
                    TermsEnum iterator = leafReader.terms("text").iterator();
                    assertTrue(iterator.seekExact(new BytesRef("quick")));
                    assertEquals(leafReader.maxDoc(), iterator.docFreq());
                }
            }
        }
    }
}
Also used : NumericDocValues(org.apache.lucene.index.NumericDocValues) LeafReader(org.apache.lucene.index.LeafReader) DirectoryReader(org.apache.lucene.index.DirectoryReader) Terms(org.apache.lucene.index.Terms) SortField(org.apache.lucene.search.SortField) Term(org.apache.lucene.index.Term) MatchAllDocsQuery(org.apache.lucene.search.MatchAllDocsQuery) Document(org.apache.lucene.document.Document) TermsEnum(org.apache.lucene.index.TermsEnum) NumericDocValuesField(org.apache.lucene.document.NumericDocValuesField) IndexWriter(org.apache.lucene.index.IndexWriter) SoftDeletesRetentionMergePolicy(org.apache.lucene.index.SoftDeletesRetentionMergePolicy) SoftDeletesRetentionMergePolicy(org.apache.lucene.index.SoftDeletesRetentionMergePolicy) MergePolicy(org.apache.lucene.index.MergePolicy) ShuffleForcedMergePolicy(org.apache.lucene.index.ShuffleForcedMergePolicy) StringField(org.apache.lucene.document.StringField) Sort(org.apache.lucene.search.Sort) ShuffleForcedMergePolicy(org.apache.lucene.index.ShuffleForcedMergePolicy) BytesRef(org.apache.lucene.util.BytesRef) Directory(org.apache.lucene.store.Directory) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig) Test(org.junit.Test)

Aggregations

SoftDeletesRetentionMergePolicy (org.apache.lucene.index.SoftDeletesRetentionMergePolicy)5 IndexWriterConfig (org.apache.lucene.index.IndexWriterConfig)4 MergePolicy (org.apache.lucene.index.MergePolicy)4 MatchAllDocsQuery (org.apache.lucene.search.MatchAllDocsQuery)4 NumericDocValuesField (org.apache.lucene.document.NumericDocValuesField)3 DirectoryReader (org.apache.lucene.index.DirectoryReader)3 IndexWriter (org.apache.lucene.index.IndexWriter)3 LiveIndexWriterConfig (org.apache.lucene.index.LiveIndexWriterConfig)3 Term (org.apache.lucene.index.Term)3 Directory (org.apache.lucene.store.Directory)3 Test (org.junit.Test)3 IOException (java.io.IOException)2 HashSet (java.util.HashSet)2 AtomicLong (java.util.concurrent.atomic.AtomicLong)2 Field (org.apache.lucene.document.Field)2 StoredField (org.apache.lucene.document.StoredField)2 TextField (org.apache.lucene.document.TextField)2 IndexableField (org.apache.lucene.index.IndexableField)2 LeafReader (org.apache.lucene.index.LeafReader)2 LogByteSizeMergePolicy (org.apache.lucene.index.LogByteSizeMergePolicy)2