Examples with StandardAnalyzer - org.apache.lucene.analysis.standard.StandardAnalyzer

Example 1 with StandardAnalyzer

use of org.apache.lucene.analysis.standard.StandardAnalyzer in project zeppelin by apache.

the class LuceneSearch method query.

/* (non-Javadoc)
   * @see org.apache.zeppelin.search.Search#query(java.lang.String)
   */
@Override
public List<Map<String, String>> query(String queryStr) {
    if (null == ramDirectory) {
        throw new IllegalStateException("Something went wrong on instance creation time, index dir is null");
    }
    List<Map<String, String>> result = Collections.emptyList();
    try (IndexReader indexReader = DirectoryReader.open(ramDirectory)) {
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
        Analyzer analyzer = new StandardAnalyzer();
        MultiFieldQueryParser parser = new MultiFieldQueryParser(new String[] { SEARCH_FIELD_TEXT, SEARCH_FIELD_TITLE }, analyzer);
        Query query = parser.parse(queryStr);
        LOG.debug("Searching for: " + query.toString(SEARCH_FIELD_TEXT));
        SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter();
        Highlighter highlighter = new Highlighter(htmlFormatter, new QueryScorer(query));
        result = doSearch(indexSearcher, query, analyzer, highlighter);
        indexReader.close();
    } catch (IOException e) {
        LOG.error("Failed to open index dir {}, make sure indexing finished OK", ramDirectory, e);
    } catch (ParseException e) {
        LOG.error("Failed to parse query " + queryStr, e);
    }
    return result;
}

Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) MultiFieldQueryParser(org.apache.lucene.queryparser.classic.MultiFieldQueryParser) Query(org.apache.lucene.search.Query) WildcardQuery(org.apache.lucene.search.WildcardQuery) QueryScorer(org.apache.lucene.search.highlight.QueryScorer) IOException(java.io.IOException) Analyzer(org.apache.lucene.analysis.Analyzer) StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer) StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer) IndexReader(org.apache.lucene.index.IndexReader) ParseException(org.apache.lucene.queryparser.classic.ParseException) SimpleHTMLFormatter(org.apache.lucene.search.highlight.SimpleHTMLFormatter) Map(java.util.Map) ImmutableMap(com.google.common.collect.ImmutableMap) Highlighter(org.apache.lucene.search.highlight.Highlighter)

Example 2 with StandardAnalyzer

use of org.apache.lucene.analysis.standard.StandardAnalyzer in project crate by crate.

the class OrderedLuceneBatchIteratorBenchmark method createLuceneBatchIterator.

@Setup
public void createLuceneBatchIterator() throws Exception {
    IndexWriter iw = new IndexWriter(new RAMDirectory(), new IndexWriterConfig(new StandardAnalyzer()));
    dummyShardId = new ShardId("dummy", 1);
    columnName = "x";
    for (int i = 0; i < 10_000_000; i++) {
        Document doc = new Document();
        doc.add(new NumericDocValuesField(columnName, i));
        iw.addDocument(doc);
    }
    iw.commit();
    iw.forceMerge(1, true);
    indexSearcher = new IndexSearcher(DirectoryReader.open(iw, true));
    collectorContext = new CollectorContext(mock(IndexFieldDataService.class), new CollectorFieldsVisitor(0));
    fieldTypeLookup = column -> {
        IntegerFieldMapper.IntegerFieldType integerFieldType = new IntegerFieldMapper.IntegerFieldType();
        integerFieldType.setNames(new MappedFieldType.Names(column));
        return integerFieldType;
    };
    reference = new Reference(new ReferenceIdent(new TableIdent(null, "dummyTable"), columnName), RowGranularity.DOC, DataTypes.INTEGER);
    orderBy = new OrderBy(Collections.singletonList(reference), reverseFlags, nullsFirst);
}

Also used : OrderBy(io.crate.analyze.OrderBy) Reference(io.crate.metadata.Reference) TableIdent(io.crate.metadata.TableIdent) Document(org.apache.lucene.document.Document) IntegerFieldMapper(org.elasticsearch.index.mapper.core.IntegerFieldMapper) RAMDirectory(org.apache.lucene.store.RAMDirectory) ReferenceIdent(io.crate.metadata.ReferenceIdent) ShardId(org.elasticsearch.index.shard.ShardId) NumericDocValuesField(org.apache.lucene.document.NumericDocValuesField) IndexWriter(org.apache.lucene.index.IndexWriter) StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer) MappedFieldType(org.elasticsearch.index.mapper.MappedFieldType) CollectorContext(io.crate.operation.reference.doc.lucene.CollectorContext) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig)

Example 3 with StandardAnalyzer

use of org.apache.lucene.analysis.standard.StandardAnalyzer in project crate by crate.

the class DocLevelExpressionsTest method prepare.

@Before
public void prepare() throws Exception {
    Settings settings = Settings.builder().put("index.fielddata.cache", "none").build();
    IndexService indexService = createIndex("test", settings);
    ifd = indexService.fieldData();
    writer = new IndexWriter(new RAMDirectory(), new IndexWriterConfig(new StandardAnalyzer()).setMergePolicy(new LogByteSizeMergePolicy()));
    insertValues(writer);
    DirectoryReader directoryReader = DirectoryReader.open(writer, true);
    readerContext = directoryReader.leaves().get(0);
    ctx = new CollectorContext(ifd, null);
}

Also used : IndexService(org.elasticsearch.index.IndexService) StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer) CollectorContext(io.crate.operation.reference.doc.lucene.CollectorContext) Settings(org.elasticsearch.common.settings.Settings) RAMDirectory(org.apache.lucene.store.RAMDirectory) Before(org.junit.Before)

Example 4 with StandardAnalyzer

use of org.apache.lucene.analysis.standard.StandardAnalyzer in project crate by crate.

the class LuceneOrderedDocCollectorTest method createLuceneIndex.

private Directory createLuceneIndex() throws IOException {
    Path tmpDir = newTempDir();
    Directory index = FSDirectory.open(tmpDir);
    StandardAnalyzer analyzer = new StandardAnalyzer();
    IndexWriterConfig cfg = new IndexWriterConfig(analyzer);
    IndexWriter w = new IndexWriter(index, cfg);
    for (Long i = 0L; i < 4; i++) {
        if (i < 2) {
            addDocToLucene(w, i + 1);
        } else {
            addDocToLucene(w, null);
        }
        w.commit();
    }
    w.close();
    return index;
}

Also used : Path(java.nio.file.Path) IndexWriter(org.apache.lucene.index.IndexWriter) StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer) Directory(org.apache.lucene.store.Directory) FSDirectory(org.apache.lucene.store.FSDirectory) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig)

Example 5 with StandardAnalyzer

use of org.apache.lucene.analysis.standard.StandardAnalyzer in project elasticsearch by elastic.

the class PlainHighlighterTests method checkGeoQueryHighlighting.

public void checkGeoQueryHighlighting(Query geoQuery) throws IOException, InvalidTokenOffsetsException {
    Map analysers = new HashMap<String, Analyzer>();
    analysers.put("text", new StandardAnalyzer());
    FieldNameAnalyzer fieldNameAnalyzer = new FieldNameAnalyzer(analysers);
    Query termQuery = new TermQuery(new Term("text", "failure"));
    Query boolQuery = new BooleanQuery.Builder().add(new BooleanClause(geoQuery, BooleanClause.Occur.SHOULD)).add(new BooleanClause(termQuery, BooleanClause.Occur.SHOULD)).build();
    org.apache.lucene.search.highlight.Highlighter highlighter = new org.apache.lucene.search.highlight.Highlighter(new CustomQueryScorer(boolQuery));
    String fragment = highlighter.getBestFragment(fieldNameAnalyzer.tokenStream("text", "Arbitrary text field which should not cause " + "a failure"), "Arbitrary text field which should not cause a failure");
    assertThat(fragment, equalTo("Arbitrary text field which should not cause a <B>failure</B>"));
    Query rewritten = boolQuery.rewrite(null);
    highlighter = new org.apache.lucene.search.highlight.Highlighter(new CustomQueryScorer(rewritten));
    fragment = highlighter.getBestFragment(fieldNameAnalyzer.tokenStream("text", "Arbitrary text field which should not cause " + "a failure"), "Arbitrary text field which should not cause a failure");
    assertThat(fragment, equalTo("Arbitrary text field which should not cause a <B>failure</B>"));
}

Also used : TermQuery(org.apache.lucene.search.TermQuery) BooleanQuery(org.apache.lucene.search.BooleanQuery) Query(org.apache.lucene.search.Query) PhraseQuery(org.apache.lucene.search.PhraseQuery) GeoPointDistanceQuery(org.apache.lucene.spatial.geopoint.search.GeoPointDistanceQuery) TermQuery(org.apache.lucene.search.TermQuery) BooleanQuery(org.apache.lucene.search.BooleanQuery) GeoPointInBBoxQuery(org.apache.lucene.spatial.geopoint.search.GeoPointInBBoxQuery) GeoPointInPolygonQuery(org.apache.lucene.spatial.geopoint.search.GeoPointInPolygonQuery) HashMap(java.util.HashMap) FieldNameAnalyzer(org.elasticsearch.index.analysis.FieldNameAnalyzer) CustomQueryScorer(org.elasticsearch.search.fetch.subphase.highlight.CustomQueryScorer) Term(org.apache.lucene.index.Term) BooleanClause(org.apache.lucene.search.BooleanClause) StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer) HashMap(java.util.HashMap) Map(java.util.Map)

Aggregations

StandardAnalyzer (org.apache.lucene.analysis.standard.StandardAnalyzer)112 Analyzer (org.apache.lucene.analysis.Analyzer)37 IndexWriter (org.apache.lucene.index.IndexWriter)36 Document (org.apache.lucene.document.Document)29 IndexWriterConfig (org.apache.lucene.index.IndexWriterConfig)29 IndexSearcher (org.apache.lucene.search.IndexSearcher)24 Term (org.apache.lucene.index.Term)22 RAMDirectory (org.apache.lucene.store.RAMDirectory)21 Test (org.junit.Test)21 Query (org.apache.lucene.search.Query)20 BooleanQuery (org.apache.lucene.search.BooleanQuery)19 TermQuery (org.apache.lucene.search.TermQuery)19 IOException (java.io.IOException)16 Before (org.junit.Before)15 IndexReader (org.apache.lucene.index.IndexReader)14 HashMap (java.util.HashMap)13 Field (org.apache.lucene.document.Field)13 ArrayList (java.util.ArrayList)12 QueryParser (org.apache.lucene.queryparser.classic.QueryParser)12 Directory (org.apache.lucene.store.Directory)12