Examples with DocValuesFormat - org.apache.lucene.codecs.DocValuesFormat

Example 11 with DocValuesFormat

use of org.apache.lucene.codecs.DocValuesFormat in project lucene-solr by apache.

the class TestPerFieldDocValuesFormat method testTwoFieldsTwoFormats.

// just a simple trivial test
// TODO: we should come up with a test that somehow checks that segment suffix
// is respected by all codec apis (not just docvalues and postings)
public void testTwoFieldsTwoFormats() throws IOException {
    Analyzer analyzer = new MockAnalyzer(random());
    Directory directory = newDirectory();
    // we don't use RandomIndexWriter because it might add more docvalues than we expect !!!!1
    IndexWriterConfig iwc = newIndexWriterConfig(analyzer);
    final DocValuesFormat fast = TestUtil.getDefaultDocValuesFormat();
    final DocValuesFormat slow = DocValuesFormat.forName("Memory");
    iwc.setCodec(new AssertingCodec() {

        @Override
        public DocValuesFormat getDocValuesFormatForField(String field) {
            if ("dv1".equals(field)) {
                return fast;
            } else {
                return slow;
            }
        }
    });
    IndexWriter iwriter = new IndexWriter(directory, iwc);
    Document doc = new Document();
    String longTerm = "longtermlongtermlongtermlongtermlongtermlongtermlongtermlongtermlongtermlongtermlongtermlongtermlongtermlongtermlongtermlongtermlongtermlongterm";
    String text = "This is the text to be indexed. " + longTerm;
    doc.add(newTextField("fieldname", text, Field.Store.YES));
    doc.add(new NumericDocValuesField("dv1", 5));
    doc.add(new BinaryDocValuesField("dv2", new BytesRef("hello world")));
    iwriter.addDocument(doc);
    iwriter.close();
    // Now search the index:
    // read-only=true
    IndexReader ireader = DirectoryReader.open(directory);
    IndexSearcher isearcher = newSearcher(ireader);
    assertEquals(1, isearcher.search(new TermQuery(new Term("fieldname", longTerm)), 1).totalHits);
    Query query = new TermQuery(new Term("fieldname", "text"));
    TopDocs hits = isearcher.search(query, 1);
    assertEquals(1, hits.totalHits);
    // Iterate through the results:
    for (int i = 0; i < hits.scoreDocs.length; i++) {
        int hitDocID = hits.scoreDocs[i].doc;
        Document hitDoc = isearcher.doc(hitDocID);
        assertEquals(text, hitDoc.get("fieldname"));
        assert ireader.leaves().size() == 1;
        NumericDocValues dv = ireader.leaves().get(0).reader().getNumericDocValues("dv1");
        assertEquals(hitDocID, dv.advance(hitDocID));
        assertEquals(5, dv.longValue());
        BinaryDocValues dv2 = ireader.leaves().get(0).reader().getBinaryDocValues("dv2");
        assertEquals(hitDocID, dv2.advance(hitDocID));
        final BytesRef term = dv2.binaryValue();
        assertEquals(new BytesRef("hello world"), term);
    }
    ireader.close();
    directory.close();
}

Also used : AssertingCodec(org.apache.lucene.codecs.asserting.AssertingCodec) IndexSearcher(org.apache.lucene.search.IndexSearcher) TermQuery(org.apache.lucene.search.TermQuery) NumericDocValues(org.apache.lucene.index.NumericDocValues) Query(org.apache.lucene.search.Query) TermQuery(org.apache.lucene.search.TermQuery) Term(org.apache.lucene.index.Term) Analyzer(org.apache.lucene.analysis.Analyzer) MockAnalyzer(org.apache.lucene.analysis.MockAnalyzer) Document(org.apache.lucene.document.Document) BinaryDocValuesField(org.apache.lucene.document.BinaryDocValuesField) BinaryDocValues(org.apache.lucene.index.BinaryDocValues) DocValuesFormat(org.apache.lucene.codecs.DocValuesFormat) TopDocs(org.apache.lucene.search.TopDocs) MockAnalyzer(org.apache.lucene.analysis.MockAnalyzer) NumericDocValuesField(org.apache.lucene.document.NumericDocValuesField) IndexWriter(org.apache.lucene.index.IndexWriter) IndexReader(org.apache.lucene.index.IndexReader) BytesRef(org.apache.lucene.util.BytesRef) Directory(org.apache.lucene.store.Directory) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig)

Example 12 with DocValuesFormat

use of org.apache.lucene.codecs.DocValuesFormat in project lucene-solr by apache.

the class TestNumericDocValuesUpdates method testDifferentDVFormatPerField.

@Test
public void testDifferentDVFormatPerField() throws Exception {
    // test relies on separate instances of the "same thing"
    assert TestUtil.getDefaultDocValuesFormat() != TestUtil.getDefaultDocValuesFormat();
    Directory dir = newDirectory();
    IndexWriterConfig conf = newIndexWriterConfig(new MockAnalyzer(random()));
    conf.setCodec(new AssertingCodec() {

        @Override
        public DocValuesFormat getDocValuesFormatForField(String field) {
            return TestUtil.getDefaultDocValuesFormat();
        }
    });
    IndexWriter writer = new IndexWriter(dir, conf);
    Document doc = new Document();
    doc.add(new StringField("key", "doc", Store.NO));
    doc.add(new NumericDocValuesField("ndv", 5));
    doc.add(new SortedDocValuesField("sorted", new BytesRef("value")));
    // flushed document
    writer.addDocument(doc);
    writer.commit();
    // in-memory document
    writer.addDocument(doc);
    writer.updateNumericDocValue(new Term("key", "doc"), "ndv", 17L);
    writer.close();
    final DirectoryReader reader = DirectoryReader.open(dir);
    NumericDocValues ndv = MultiDocValues.getNumericValues(reader, "ndv");
    SortedDocValues sdv = MultiDocValues.getSortedValues(reader, "sorted");
    for (int i = 0; i < reader.maxDoc(); i++) {
        assertEquals(i, ndv.nextDoc());
        assertEquals(17, ndv.longValue());
        assertEquals(i, sdv.nextDoc());
        final BytesRef term = sdv.binaryValue();
        assertEquals(new BytesRef("value"), term);
    }
    reader.close();
    dir.close();
}

Also used : AssertingCodec(org.apache.lucene.codecs.asserting.AssertingCodec) Document(org.apache.lucene.document.Document) DocValuesFormat(org.apache.lucene.codecs.DocValuesFormat) AssertingDocValuesFormat(org.apache.lucene.codecs.asserting.AssertingDocValuesFormat) MockAnalyzer(org.apache.lucene.analysis.MockAnalyzer) NumericDocValuesField(org.apache.lucene.document.NumericDocValuesField) StringField(org.apache.lucene.document.StringField) SortedDocValuesField(org.apache.lucene.document.SortedDocValuesField) BytesRef(org.apache.lucene.util.BytesRef) Directory(org.apache.lucene.store.Directory) NRTCachingDirectory(org.apache.lucene.store.NRTCachingDirectory) Test(org.junit.Test)

Example 13 with DocValuesFormat

use of org.apache.lucene.codecs.DocValuesFormat in project lucene-solr by apache.

the class TestRuleSetupAndRestoreClassEnv method before.

@Override
protected void before() throws Exception {
    // we do this in beforeClass, because some tests currently disable it
    if (System.getProperty("solr.directoryFactory") == null) {
        System.setProperty("solr.directoryFactory", "org.apache.solr.core.MockDirectoryFactory");
    }
    // if verbose: print some debugging stuff about which codecs are loaded.
    if (VERBOSE) {
        System.out.println("Loaded codecs: " + Codec.availableCodecs());
        System.out.println("Loaded postingsFormats: " + PostingsFormat.availablePostingsFormats());
    }
    savedInfoStream = InfoStream.getDefault();
    final Random random = RandomizedContext.current().getRandom();
    final boolean v = random.nextBoolean();
    if (INFOSTREAM) {
        InfoStream.setDefault(new ThreadNameFixingPrintStreamInfoStream(System.out));
    } else if (v) {
        InfoStream.setDefault(new NullInfoStream());
    }
    Class<?> targetClass = RandomizedContext.current().getTargetClass();
    avoidCodecs = new HashSet<>();
    if (targetClass.isAnnotationPresent(SuppressCodecs.class)) {
        SuppressCodecs a = targetClass.getAnnotation(SuppressCodecs.class);
        avoidCodecs.addAll(Arrays.asList(a.value()));
    }
    savedCodec = Codec.getDefault();
    int randomVal = random.nextInt(11);
    if ("default".equals(TEST_CODEC)) {
        // just use the default, don't randomize
        codec = savedCodec;
    } else if (("random".equals(TEST_POSTINGSFORMAT) == false) || ("random".equals(TEST_DOCVALUESFORMAT) == false)) {
        // the user wired postings or DV: this is messy
        // refactor into RandomCodec....
        final PostingsFormat format;
        if ("random".equals(TEST_POSTINGSFORMAT)) {
            format = new AssertingPostingsFormat();
        } else if ("MockRandom".equals(TEST_POSTINGSFORMAT)) {
            format = new MockRandomPostingsFormat(new Random(random.nextLong()));
        } else {
            format = PostingsFormat.forName(TEST_POSTINGSFORMAT);
        }
        final DocValuesFormat dvFormat;
        if ("random".equals(TEST_DOCVALUESFORMAT)) {
            dvFormat = new AssertingDocValuesFormat();
        } else {
            dvFormat = DocValuesFormat.forName(TEST_DOCVALUESFORMAT);
        }
        codec = new AssertingCodec() {

            @Override
            public PostingsFormat getPostingsFormatForField(String field) {
                return format;
            }

            @Override
            public DocValuesFormat getDocValuesFormatForField(String field) {
                return dvFormat;
            }

            @Override
            public String toString() {
                return super.toString() + ": " + format.toString() + ", " + dvFormat.toString();
            }
        };
    } else if ("SimpleText".equals(TEST_CODEC) || ("random".equals(TEST_CODEC) && randomVal == 9 && LuceneTestCase.rarely(random) && !shouldAvoidCodec("SimpleText"))) {
        codec = new SimpleTextCodec();
    } else if ("CheapBastard".equals(TEST_CODEC) || ("random".equals(TEST_CODEC) && randomVal == 8 && !shouldAvoidCodec("CheapBastard") && !shouldAvoidCodec("Lucene41"))) {
        // we also avoid this codec if Lucene41 is avoided, since thats the postings format it uses.
        codec = new CheapBastardCodec();
    } else if ("Asserting".equals(TEST_CODEC) || ("random".equals(TEST_CODEC) && randomVal == 7 && !shouldAvoidCodec("Asserting"))) {
        codec = new AssertingCodec();
    } else if ("Compressing".equals(TEST_CODEC) || ("random".equals(TEST_CODEC) && randomVal == 6 && !shouldAvoidCodec("Compressing"))) {
        codec = CompressingCodec.randomInstance(random);
    } else if ("Lucene70".equals(TEST_CODEC) || ("random".equals(TEST_CODEC) && randomVal == 5 && !shouldAvoidCodec("Lucene70"))) {
        codec = new Lucene70Codec(RandomPicks.randomFrom(random, Lucene50StoredFieldsFormat.Mode.values()));
    } else if (!"random".equals(TEST_CODEC)) {
        codec = Codec.forName(TEST_CODEC);
    } else if ("random".equals(TEST_POSTINGSFORMAT)) {
        codec = new RandomCodec(random, avoidCodecs);
    } else {
        assert false;
    }
    Codec.setDefault(codec);
    // Initialize locale/ timezone.
    String testLocale = System.getProperty("tests.locale", "random");
    String testTimeZone = System.getProperty("tests.timezone", "random");
    // Always pick a random one for consistency (whether tests.locale was specified or not).
    savedLocale = Locale.getDefault();
    Locale randomLocale = randomLocale(random);
    locale = testLocale.equals("random") ? randomLocale : localeForLanguageTag(testLocale);
    Locale.setDefault(locale);
    savedTimeZone = TimeZone.getDefault();
    TimeZone randomTimeZone = randomTimeZone(random());
    timeZone = testTimeZone.equals("random") ? randomTimeZone : TimeZone.getTimeZone(testTimeZone);
    TimeZone.setDefault(timeZone);
    similarity = new RandomSimilarity(random());
    // Check codec restrictions once at class level.
    try {
        checkCodecRestrictions(codec);
    } catch (AssumptionViolatedException e) {
        System.err.println("NOTE: " + e.getMessage() + " Suppressed codecs: " + Arrays.toString(avoidCodecs.toArray()));
        throw e;
    }
    // We have "stickiness" so that sometimes all we do is vary the RAM buffer size, other times just the doc count to flush by, else both.
    // This way the assertMemory in DocumentsWriterFlushControl sometimes runs (when we always flush by RAM).
    LiveIWCFlushMode flushMode;
    switch(random().nextInt(3)) {
        case 0:
            flushMode = LiveIWCFlushMode.BY_RAM;
            break;
        case 1:
            flushMode = LiveIWCFlushMode.BY_DOCS;
            break;
        case 2:
            flushMode = LiveIWCFlushMode.EITHER;
            break;
        default:
            throw new AssertionError();
    }
    LuceneTestCase.setLiveIWCFlushMode(flushMode);
    initialized = true;
}

Also used : Locale(java.util.Locale) LuceneTestCase.randomLocale(org.apache.lucene.util.LuceneTestCase.randomLocale) AssumptionViolatedException(org.junit.internal.AssumptionViolatedException) CheapBastardCodec(org.apache.lucene.codecs.cheapbastard.CheapBastardCodec) DocValuesFormat(org.apache.lucene.codecs.DocValuesFormat) AssertingDocValuesFormat(org.apache.lucene.codecs.asserting.AssertingDocValuesFormat) AssertingPostingsFormat(org.apache.lucene.codecs.asserting.AssertingPostingsFormat) Random(java.util.Random) AssertingDocValuesFormat(org.apache.lucene.codecs.asserting.AssertingDocValuesFormat) RandomCodec(org.apache.lucene.index.RandomCodec) AssertingCodec(org.apache.lucene.codecs.asserting.AssertingCodec) LiveIWCFlushMode(org.apache.lucene.util.LuceneTestCase.LiveIWCFlushMode) RandomSimilarity(org.apache.lucene.search.similarities.RandomSimilarity) SimpleTextCodec(org.apache.lucene.codecs.simpletext.SimpleTextCodec) SuppressCodecs(org.apache.lucene.util.LuceneTestCase.SuppressCodecs) MockRandomPostingsFormat(org.apache.lucene.codecs.mockrandom.MockRandomPostingsFormat) LuceneTestCase.randomTimeZone(org.apache.lucene.util.LuceneTestCase.randomTimeZone) TimeZone(java.util.TimeZone) AssertingPostingsFormat(org.apache.lucene.codecs.asserting.AssertingPostingsFormat) MockRandomPostingsFormat(org.apache.lucene.codecs.mockrandom.MockRandomPostingsFormat) PostingsFormat(org.apache.lucene.codecs.PostingsFormat) Lucene70Codec(org.apache.lucene.codecs.lucene70.Lucene70Codec)

Example 14 with DocValuesFormat

use of org.apache.lucene.codecs.DocValuesFormat in project lucene-solr by apache.

the class TestBinaryDocValuesUpdates method testDifferentDVFormatPerField.

public void testDifferentDVFormatPerField() throws Exception {
    // test relies on separate instances of "same thing"
    assert TestUtil.getDefaultDocValuesFormat() != TestUtil.getDefaultDocValuesFormat();
    Directory dir = newDirectory();
    IndexWriterConfig conf = newIndexWriterConfig(new MockAnalyzer(random()));
    conf.setCodec(new AssertingCodec() {

        @Override
        public DocValuesFormat getDocValuesFormatForField(String field) {
            return TestUtil.getDefaultDocValuesFormat();
        }
    });
    IndexWriter writer = new IndexWriter(dir, conf);
    Document doc = new Document();
    doc.add(new StringField("key", "doc", Store.NO));
    doc.add(new BinaryDocValuesField("bdv", toBytes(5L)));
    doc.add(new SortedDocValuesField("sorted", new BytesRef("value")));
    // flushed document
    writer.addDocument(doc);
    writer.commit();
    // in-memory document
    writer.addDocument(doc);
    writer.updateBinaryDocValue(new Term("key", "doc"), "bdv", toBytes(17L));
    writer.close();
    final DirectoryReader reader = DirectoryReader.open(dir);
    BinaryDocValues bdv = MultiDocValues.getBinaryValues(reader, "bdv");
    SortedDocValues sdv = MultiDocValues.getSortedValues(reader, "sorted");
    for (int i = 0; i < reader.maxDoc(); i++) {
        assertEquals(i, bdv.nextDoc());
        assertEquals(17, getValue(bdv));
        assertEquals(i, sdv.nextDoc());
        BytesRef term = sdv.binaryValue();
        assertEquals(new BytesRef("value"), term);
    }
    reader.close();
    dir.close();
}

Also used : AssertingCodec(org.apache.lucene.codecs.asserting.AssertingCodec) Document(org.apache.lucene.document.Document) BinaryDocValuesField(org.apache.lucene.document.BinaryDocValuesField) DocValuesFormat(org.apache.lucene.codecs.DocValuesFormat) AssertingDocValuesFormat(org.apache.lucene.codecs.asserting.AssertingDocValuesFormat) MockAnalyzer(org.apache.lucene.analysis.MockAnalyzer) StringField(org.apache.lucene.document.StringField) SortedDocValuesField(org.apache.lucene.document.SortedDocValuesField) BytesRef(org.apache.lucene.util.BytesRef) Directory(org.apache.lucene.store.Directory) NRTCachingDirectory(org.apache.lucene.store.NRTCachingDirectory)

Example 15 with DocValuesFormat

use of org.apache.lucene.codecs.DocValuesFormat in project lucene-solr by apache.

the class DefaultIndexingChain method writeDocValues.

/** Writes all buffered doc values (called from {@link #flush}). */
private void writeDocValues(SegmentWriteState state, Sorter.DocMap sortMap) throws IOException {
    int maxDoc = state.segmentInfo.maxDoc();
    DocValuesConsumer dvConsumer = null;
    boolean success = false;
    try {
        for (int i = 0; i < fieldHash.length; i++) {
            PerField perField = fieldHash[i];
            while (perField != null) {
                if (perField.docValuesWriter != null) {
                    if (perField.fieldInfo.getDocValuesType() == DocValuesType.NONE) {
                        // BUG
                        throw new AssertionError("segment=" + state.segmentInfo + ": field=\"" + perField.fieldInfo.name + "\" has no docValues but wrote them");
                    }
                    if (dvConsumer == null) {
                        // lazy init
                        DocValuesFormat fmt = state.segmentInfo.getCodec().docValuesFormat();
                        dvConsumer = fmt.fieldsConsumer(state);
                    }
                    if (finishedDocValues.contains(perField.fieldInfo.name) == false) {
                        perField.docValuesWriter.finish(maxDoc);
                    }
                    perField.docValuesWriter.flush(state, sortMap, dvConsumer);
                    perField.docValuesWriter = null;
                } else if (perField.fieldInfo.getDocValuesType() != DocValuesType.NONE) {
                    // BUG
                    throw new AssertionError("segment=" + state.segmentInfo + ": field=\"" + perField.fieldInfo.name + "\" has docValues but did not write them");
                }
                perField = perField.next;
            }
        }
        // TODO: catch missing DV fields here?  else we have
        // null/"" depending on how docs landed in segments?
        // but we can't detect all cases, and we should leave
        // this behavior undefined. dv is not "schemaless": it's column-stride.
        success = true;
    } finally {
        if (success) {
            IOUtils.close(dvConsumer);
        } else {
            IOUtils.closeWhileHandlingException(dvConsumer);
        }
    }
    if (state.fieldInfos.hasDocValues() == false) {
        if (dvConsumer != null) {
            // BUG
            throw new AssertionError("segment=" + state.segmentInfo + ": fieldInfos has no docValues but wrote them");
        }
    } else if (dvConsumer == null) {
        // BUG
        throw new AssertionError("segment=" + state.segmentInfo + ": fieldInfos has docValues but did not wrote them");
    }
}

Also used : DocValuesConsumer(org.apache.lucene.codecs.DocValuesConsumer) DocValuesFormat(org.apache.lucene.codecs.DocValuesFormat)

Aggregations

DocValuesFormat (org.apache.lucene.codecs.DocValuesFormat)15 AssertingCodec (org.apache.lucene.codecs.asserting.AssertingCodec)9 Directory (org.apache.lucene.store.Directory)9 Document (org.apache.lucene.document.Document)8 MockAnalyzer (org.apache.lucene.analysis.MockAnalyzer)7 AssertingDocValuesFormat (org.apache.lucene.codecs.asserting.AssertingDocValuesFormat)7 BinaryDocValuesField (org.apache.lucene.document.BinaryDocValuesField)6 NumericDocValuesField (org.apache.lucene.document.NumericDocValuesField)6 StringField (org.apache.lucene.document.StringField)6 BytesRef (org.apache.lucene.util.BytesRef)6 PostingsFormat (org.apache.lucene.codecs.PostingsFormat)4 SortedDocValuesField (org.apache.lucene.document.SortedDocValuesField)4 IndexWriterConfig (org.apache.lucene.index.IndexWriterConfig)4 NRTCachingDirectory (org.apache.lucene.store.NRTCachingDirectory)4 Term (org.apache.lucene.index.Term)3 ArrayList (java.util.ArrayList)2 Lucene70Codec (org.apache.lucene.codecs.lucene70.Lucene70Codec)2 DirectDocValuesFormat (org.apache.lucene.codecs.memory.DirectDocValuesFormat)2 MemoryDocValuesFormat (org.apache.lucene.codecs.memory.MemoryDocValuesFormat)2 Field (org.apache.lucene.document.Field)2