Examples with StandardAnalyzer - org.apache.lucene.analysis.standard.StandardAnalyzer

Example 96 with StandardAnalyzer

use of org.apache.lucene.analysis.standard.StandardAnalyzer in project zm-mailbox by Zimbra.

the class AbstractIndexStoreTest method booleanQuery.

@Test
public void booleanQuery() throws Exception {
    ZimbraLog.test.debug("--->TEST booleanQuery");
    Mailbox mbox = MailboxManager.getInstance().getMailboxByAccountId(MockProvisioning.DEFAULT_ACCOUNT_ID);
    Contact contact = createContact(mbox, "First", "Last", "f.last@zimbra.com", "Software Development Engineer");
    createContact(mbox, "Given", "Surname", "GiV.SurN@zimbra.com");
    // Make sure all indexing has been done
    mbox.index.indexDeferredItems();
    IndexStore index = mbox.index.getIndexStore();
    ZimbraIndexSearcher searcher = index.openSearcher();
    // This seems to be the supported way of enabling leading wildcard queries
    QueryParser queryParser = new QueryParser(LuceneIndex.VERSION, LuceneFields.L_CONTACT_DATA, new StandardAnalyzer(LuceneIndex.VERSION));
    queryParser.setAllowLeadingWildcard(true);
    Query wquery = queryParser.parse("*irst");
    Query tquery = new TermQuery(new Term(LuceneFields.L_CONTACT_DATA, "absent"));
    Query tquery2 = new TermQuery(new Term(LuceneFields.L_CONTACT_DATA, "Last"));
    BooleanQuery bquery = new BooleanQuery();
    bquery.add(wquery, Occur.MUST);
    bquery.add(tquery, Occur.MUST_NOT);
    bquery.add(tquery2, Occur.SHOULD);
    ZimbraTopDocs result = searcher.search(bquery, 100);
    Assert.assertNotNull("searcher.search result object", result);
    ZimbraLog.test.debug("Result for search [hits=%d]:%s", result.getTotalHits(), result.toString());
    Assert.assertEquals("Number of hits", 1, result.getTotalHits());
    String expected1Id = String.valueOf(contact.getId());
    String match1Id = searcher.doc(result.getScoreDoc(0).getDocumentID()).get(LuceneFields.L_MAILBOX_BLOB_ID);
    Assert.assertEquals("Mailbox Blob ID of match", expected1Id, match1Id);
}

Also used : TermQuery(org.apache.lucene.search.TermQuery) BooleanQuery(org.apache.lucene.search.BooleanQuery) QueryParser(org.apache.lucene.queryParser.QueryParser) Mailbox(com.zimbra.cs.mailbox.Mailbox) Query(org.apache.lucene.search.Query) PhraseQuery(org.apache.lucene.search.PhraseQuery) MultiPhraseQuery(org.apache.lucene.search.MultiPhraseQuery) PrefixQuery(org.apache.lucene.search.PrefixQuery) TermQuery(org.apache.lucene.search.TermQuery) BooleanQuery(org.apache.lucene.search.BooleanQuery) TermRangeQuery(org.apache.lucene.search.TermRangeQuery) StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer) Term(org.apache.lucene.index.Term) Contact(com.zimbra.cs.mailbox.Contact) ParsedContact(com.zimbra.cs.mime.ParsedContact) Test(org.junit.Test)

Example 97 with StandardAnalyzer

use of org.apache.lucene.analysis.standard.StandardAnalyzer in project zm-mailbox by Zimbra.

the class AbstractIndexStoreTest method leadingWildcardQuery.

@Test
public void leadingWildcardQuery() throws Exception {
    ZimbraLog.test.debug("--->TEST leadingWildcardQuery");
    Mailbox mbox = MailboxManager.getInstance().getMailboxByAccountId(MockProvisioning.DEFAULT_ACCOUNT_ID);
    Contact contact = createContact(mbox, "First", "Last", "f.last@zimbra.com", "Leading Wildcard");
    createContact(mbox, "Grand", "Piano", "grand@vmware.com");
    // Make sure all indexing has been done
    mbox.index.indexDeferredItems();
    IndexStore index = mbox.index.getIndexStore();
    ZimbraIndexSearcher searcher = index.openSearcher();
    // This seems to be the supported way of enabling leading wildcard queries for Lucene
    QueryParser queryParser = new QueryParser(LuceneIndex.VERSION, LuceneFields.L_CONTACT_DATA, new StandardAnalyzer(LuceneIndex.VERSION));
    queryParser.setAllowLeadingWildcard(true);
    Query query = queryParser.parse("*irst");
    ZimbraTopDocs result = searcher.search(query, 100);
    Assert.assertNotNull("searcher.search result object - searching for *irst", result);
    ZimbraLog.test.debug("Result for search for '*irst'\n" + result.toString());
    Assert.assertEquals("Number of hits searching for *irst", 1, result.getTotalHits());
    String expected1Id = String.valueOf(contact.getId());
    String match1Id = searcher.doc(result.getScoreDoc(0).getDocumentID()).get(LuceneFields.L_MAILBOX_BLOB_ID);
    Assert.assertEquals("Mailbox Blob ID of match", expected1Id, match1Id);
}

Also used : QueryParser(org.apache.lucene.queryParser.QueryParser) Mailbox(com.zimbra.cs.mailbox.Mailbox) Query(org.apache.lucene.search.Query) PhraseQuery(org.apache.lucene.search.PhraseQuery) MultiPhraseQuery(org.apache.lucene.search.MultiPhraseQuery) PrefixQuery(org.apache.lucene.search.PrefixQuery) TermQuery(org.apache.lucene.search.TermQuery) BooleanQuery(org.apache.lucene.search.BooleanQuery) TermRangeQuery(org.apache.lucene.search.TermRangeQuery) StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer) Contact(com.zimbra.cs.mailbox.Contact) ParsedContact(com.zimbra.cs.mime.ParsedContact) Test(org.junit.Test)

Example 98 with StandardAnalyzer

use of org.apache.lucene.analysis.standard.StandardAnalyzer in project zm-mailbox by Zimbra.

the class RemoteMailQueue method clearIndexInternal.

void clearIndexInternal() throws IOException {
    IndexWriter writer = null;
    try {
        if (ZimbraLog.rmgmt.isDebugEnabled()) {
            ZimbraLog.rmgmt.debug("clearing index (" + mIndexPath + ") for " + this);
        }
        writer = new IndexWriter(LuceneDirectory.open(mIndexPath), new StandardAnalyzer(LuceneIndex.VERSION), true, IndexWriter.MaxFieldLength.LIMITED);
        mNumMessages.set(0);
    } finally {
        if (writer != null) {
            writer.close();
        }
    }
}

Also used : IndexWriter(org.apache.lucene.index.IndexWriter) StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer)

Example 99 with StandardAnalyzer

use of org.apache.lucene.analysis.standard.StandardAnalyzer in project zm-mailbox by Zimbra.

the class RemoteMailQueue method reopenIndexWriter.

void reopenIndexWriter() throws IOException {
    if (ZimbraLog.rmgmt.isDebugEnabled()) {
        ZimbraLog.rmgmt.debug("reopening indexwriter " + this);
    }
    mIndexWriter.close();
    mIndexWriter = new IndexWriter(LuceneDirectory.open(mIndexPath), new StandardAnalyzer(LuceneIndex.VERSION), false, IndexWriter.MaxFieldLength.LIMITED);
}

Also used : IndexWriter(org.apache.lucene.index.IndexWriter) StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer)

Example 100 with StandardAnalyzer

use of org.apache.lucene.analysis.standard.StandardAnalyzer in project geode by apache.

the class LuceneQueriesIntegrationTest method shouldNotTokenizeWordsWithKeywordAnalyzer.

@Test()
public void shouldNotTokenizeWordsWithKeywordAnalyzer() throws Exception {
    Map<String, Analyzer> fields = new HashMap<String, Analyzer>();
    fields.put("field1", new StandardAnalyzer());
    fields.put("field2", new KeywordAnalyzer());
    luceneService.createIndexFactory().setFields(fields).create(INDEX_NAME, REGION_NAME);
    Region region = cache.createRegionFactory(RegionShortcut.PARTITION).create(REGION_NAME);
    final LuceneIndex index = luceneService.getIndex(INDEX_NAME, REGION_NAME);
    // Put two values with some of the same tokens
    String value1 = "one three";
    String value2 = "one two three";
    String value3 = "one@three";
    region.put("A", new TestObject(value1, value1));
    region.put("B", new TestObject(value2, value2));
    region.put("C", new TestObject(value3, value3));
    // The value will be tokenized into following documents using the analyzers:
    // <field1:one three> <field2:one three>
    // <field1:one two three> <field2:one two three>
    // <field1:one@three> <field2:one@three>
    luceneService.waitUntilFlushed(INDEX_NAME, REGION_NAME, 60000, TimeUnit.MILLISECONDS);
    // standard analyzer with double quote
    // this query string will be parsed as "one three"
    // but standard analyzer will parse value "one@three" to be "one three"
    // query will be--fields1:"one three"
    // so C will be hit by query
    verifyQuery("field1:\"one three\"", DEFAULT_FIELD, "A", "C");
    // standard analyzer will not tokenize by '_'
    // this query string will be parsed as "one_three"
    // query will be--field1:one_three
    verifyQuery("field1:one_three", DEFAULT_FIELD);
    // standard analyzer will tokenize by '@'
    // this query string will be parsed as "one" "three"
    // query will be--field1:one field1:three
    verifyQuery("field1:one@three", DEFAULT_FIELD, "A", "B", "C");
    HashMap expectedResults = new HashMap();
    expectedResults.put("A", new TestObject(value1, value1));
    expectedResults.put("B", new TestObject(value2, value2));
    expectedResults.put("C", new TestObject(value3, value3));
    verifyQuery("field1:one@three", DEFAULT_FIELD, expectedResults);
    // keyword analyzer, this query will only match the entry that exactly matches
    // this query string will be parsed as "one three"
    // but keyword analyzer will parse one@three to be "one three"
    // query will be--field2:one three
    verifyQuery("field2:\"one three\"", DEFAULT_FIELD, "A");
    // keyword analyzer without double quote. It should be the same as
    // with double quote
    // query will be--field2:one@three
    verifyQuery("field2:one@three", DEFAULT_FIELD, "C");
}

Also used : KeywordAnalyzer(org.apache.lucene.analysis.core.KeywordAnalyzer) HashMap(java.util.HashMap) StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer) Region(org.apache.geode.cache.Region) TestObject(org.apache.geode.cache.lucene.test.TestObject) KeywordAnalyzer(org.apache.lucene.analysis.core.KeywordAnalyzer) Analyzer(org.apache.lucene.analysis.Analyzer) StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer) Test(org.junit.Test) IntegrationTest(org.apache.geode.test.junit.categories.IntegrationTest)

Aggregations

StandardAnalyzer (org.apache.lucene.analysis.standard.StandardAnalyzer)112 Analyzer (org.apache.lucene.analysis.Analyzer)37 IndexWriter (org.apache.lucene.index.IndexWriter)36 Document (org.apache.lucene.document.Document)29 IndexWriterConfig (org.apache.lucene.index.IndexWriterConfig)29 IndexSearcher (org.apache.lucene.search.IndexSearcher)24 Term (org.apache.lucene.index.Term)22 RAMDirectory (org.apache.lucene.store.RAMDirectory)21 Test (org.junit.Test)21 Query (org.apache.lucene.search.Query)20 BooleanQuery (org.apache.lucene.search.BooleanQuery)19 TermQuery (org.apache.lucene.search.TermQuery)19 IOException (java.io.IOException)16 Before (org.junit.Before)15 IndexReader (org.apache.lucene.index.IndexReader)14 HashMap (java.util.HashMap)13 Field (org.apache.lucene.document.Field)13 ArrayList (java.util.ArrayList)12 QueryParser (org.apache.lucene.queryparser.classic.QueryParser)12 Directory (org.apache.lucene.store.Directory)12