Examples with WhitespaceAnalyzer - org.apache.lucene.analysis.core.WhitespaceAnalyzer

Example 21 with WhitespaceAnalyzer

use of org.apache.lucene.analysis.core.WhitespaceAnalyzer in project orientdb by orientechnologies.

the class LuceneNativeFacet method index.

/**
   * Build the example index.
   */
private void index() throws IOException {
    IndexWriter indexWriter = new IndexWriter(indexDir, new IndexWriterConfig(new WhitespaceAnalyzer()).setOpenMode(OpenMode.CREATE));
    // Writes facet ords to a separate directory from the main index
    DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir);
    Document doc = new Document();
    doc.add(new FacetField("Author", "Bob"));
    doc.add(new FacetField("Publish Date", "2010", "10", "15"));
    indexWriter.addDocument(config.build(taxoWriter, doc));
    doc = new Document();
    doc.add(new FacetField("Author", "Lisa"));
    doc.add(new FacetField("Publish Date", "2010", "10", "20"));
    indexWriter.addDocument(config.build(taxoWriter, doc));
    doc = new Document();
    doc.add(new FacetField("Author", "Lisa"));
    doc.add(new FacetField("Publish Date", "2012", "1", "1"));
    indexWriter.addDocument(config.build(taxoWriter, doc));
    doc = new Document();
    doc.add(new FacetField("Author", "Susan"));
    doc.add(new FacetField("Publish Date", "2012", "1", "7"));
    indexWriter.addDocument(config.build(taxoWriter, doc));
    doc = new Document();
    doc.add(new FacetField("Author", "Frank"));
    doc.add(new FacetField("Publish Date", "1999", "5", "5"));
    indexWriter.addDocument(config.build(taxoWriter, doc));
    indexWriter.close();
    taxoWriter.close();
}

Also used : WhitespaceAnalyzer(org.apache.lucene.analysis.core.WhitespaceAnalyzer) DirectoryTaxonomyWriter(org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter) IndexWriter(org.apache.lucene.index.IndexWriter) FacetField(org.apache.lucene.facet.FacetField) Document(org.apache.lucene.document.Document) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig)

Example 22 with WhitespaceAnalyzer

use of org.apache.lucene.analysis.core.WhitespaceAnalyzer in project camel by apache.

the class LuceneIndexAndQueryProducerTest method createRegistry.

@Override
protected JndiRegistry createRegistry() throws Exception {
    JndiRegistry registry = new JndiRegistry(createJndiContext());
    registry.bind("std", new File("target/stdindexDir"));
    registry.bind("load_dir", new File("src/test/resources/sources"));
    registry.bind("stdAnalyzer", new StandardAnalyzer());
    registry.bind("simple", new File("target/simpleindexDir"));
    registry.bind("simpleAnalyzer", new SimpleAnalyzer());
    registry.bind("whitespace", new File("target/whitespaceindexDir"));
    registry.bind("whitespaceAnalyzer", new WhitespaceAnalyzer());
    return registry;
}

Also used : JndiRegistry(org.apache.camel.impl.JndiRegistry) WhitespaceAnalyzer(org.apache.lucene.analysis.core.WhitespaceAnalyzer) SimpleAnalyzer(org.apache.lucene.analysis.core.SimpleAnalyzer) StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer) File(java.io.File)

Example 23 with WhitespaceAnalyzer

use of org.apache.lucene.analysis.core.WhitespaceAnalyzer in project camel by apache.

the class LuceneQueryProcessorTest method testWildcardSearcher.

@Test
public void testWildcardSearcher() throws Exception {
    final WhitespaceAnalyzer analyzer = new WhitespaceAnalyzer();
    MockEndpoint mockSearchEndpoint = getMockEndpoint("mock:searchResult");
    context.stop();
    context.addRoutes(new RouteBuilder() {

        public void configure() {
            try {
                from("direct:start").setHeader("QUERY", constant("Carl*")).process(new LuceneQueryProcessor("target/simpleindexDir", analyzer, null, 20)).to("direct:next");
            } catch (Exception e) {
                e.printStackTrace();
            }
            from("direct:next").process(new Processor() {

                public void process(Exchange exchange) throws Exception {
                    Hits hits = exchange.getIn().getBody(Hits.class);
                    printResults(hits);
                }

                private void printResults(Hits hits) {
                    LOG.debug("Number of hits: " + hits.getNumberOfHits());
                    for (int i = 0; i < hits.getNumberOfHits(); i++) {
                        LOG.debug("Hit " + i + " Index Location:" + hits.getHit().get(i).getHitLocation());
                        LOG.debug("Hit " + i + " Score:" + hits.getHit().get(i).getScore());
                        LOG.debug("Hit " + i + " Data:" + hits.getHit().get(i).getData());
                    }
                }
            }).to("mock:searchResult");
        }
    });
    context.start();
    LOG.debug("------------Beginning Wildcard + Simple Analyzer Phrase Searcher Test---------------");
    sendRequest();
    mockSearchEndpoint.assertIsSatisfied();
    LOG.debug("------------Completed Wildcard + Simple Analyzer Phrase Searcher Test---------------");
    context.stop();
}

Also used : WhitespaceAnalyzer(org.apache.lucene.analysis.core.WhitespaceAnalyzer) Exchange(org.apache.camel.Exchange) Hits(org.apache.camel.processor.lucene.support.Hits) Processor(org.apache.camel.Processor) RouteBuilder(org.apache.camel.builder.RouteBuilder) MockEndpoint(org.apache.camel.component.mock.MockEndpoint) Test(org.junit.Test)

Example 24 with WhitespaceAnalyzer

use of org.apache.lucene.analysis.core.WhitespaceAnalyzer in project lucene-solr by apache.

the class SimpleQueryConverter method convert.

@Override
public Collection<Token> convert(String origQuery) {
    Collection<Token> result = new HashSet<>();
    WhitespaceAnalyzer analyzer = new WhitespaceAnalyzer();
    try (TokenStream ts = analyzer.tokenStream("", origQuery)) {
        // TODO: support custom attributes
        CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
        OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
        TypeAttribute typeAtt = ts.addAttribute(TypeAttribute.class);
        FlagsAttribute flagsAtt = ts.addAttribute(FlagsAttribute.class);
        PayloadAttribute payloadAtt = ts.addAttribute(PayloadAttribute.class);
        PositionIncrementAttribute posIncAtt = ts.addAttribute(PositionIncrementAttribute.class);
        ts.reset();
        while (ts.incrementToken()) {
            Token tok = new Token();
            tok.copyBuffer(termAtt.buffer(), 0, termAtt.length());
            tok.setOffset(offsetAtt.startOffset(), offsetAtt.endOffset());
            tok.setFlags(flagsAtt.getFlags());
            tok.setPayload(payloadAtt.getPayload());
            tok.setPositionIncrement(posIncAtt.getPositionIncrement());
            tok.setType(typeAtt.type());
            result.add(tok);
        }
        ts.end();
        return result;
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

Also used : WhitespaceAnalyzer(org.apache.lucene.analysis.core.WhitespaceAnalyzer) TokenStream(org.apache.lucene.analysis.TokenStream) FlagsAttribute(org.apache.lucene.analysis.tokenattributes.FlagsAttribute) PayloadAttribute(org.apache.lucene.analysis.tokenattributes.PayloadAttribute) Token(org.apache.lucene.analysis.Token) IOException(java.io.IOException) PositionIncrementAttribute(org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute) CharTermAttribute(org.apache.lucene.analysis.tokenattributes.CharTermAttribute) TypeAttribute(org.apache.lucene.analysis.tokenattributes.TypeAttribute) OffsetAttribute(org.apache.lucene.analysis.tokenattributes.OffsetAttribute) HashSet(java.util.HashSet)

Example 25 with WhitespaceAnalyzer

use of org.apache.lucene.analysis.core.WhitespaceAnalyzer in project lucene-solr by apache.

the class SpellingQueryConverterTest method testSpecialChars.

@Test
public void testSpecialChars() {
    SpellingQueryConverter converter = new SpellingQueryConverter();
    converter.init(new NamedList());
    converter.setAnalyzer(new WhitespaceAnalyzer());
    String original = "field_with_underscore:value_with_underscore";
    Collection<Token> tokens = converter.convert(original);
    assertTrue("tokens is null and it shouldn't be", tokens != null);
    assertEquals("tokens Size: " + tokens.size() + " is not 1", 1, tokens.size());
    assertTrue("Token offsets do not match", isOffsetCorrect(original, tokens));
    original = "field_with_digits123:value_with_digits123";
    tokens = converter.convert(original);
    assertTrue("tokens is null and it shouldn't be", tokens != null);
    assertEquals("tokens Size: " + tokens.size() + " is not 1", 1, tokens.size());
    assertTrue("Token offsets do not match", isOffsetCorrect(original, tokens));
    original = "field-with-hyphens:value-with-hyphens";
    tokens = converter.convert(original);
    assertTrue("tokens is null and it shouldn't be", tokens != null);
    assertEquals("tokens Size: " + tokens.size() + " is not 1", 1, tokens.size());
    assertTrue("Token offsets do not match", isOffsetCorrect(original, tokens));
    // mix 'em up and add some to the value
    //    original = "field_with-123s:value_,.|with-hyphens";
    //    tokens = converter.convert(original);
    //    assertTrue("tokens is null and it shouldn't be", tokens != null);
    //    assertEquals("tokens Size: " + tokens.size() + " is not 1", 1, tokens.size());
    //    assertTrue("Token offsets do not match", isOffsetCorrect(original, tokens));
    original = "foo:bar^5.0";
    tokens = converter.convert(original);
    assertTrue("tokens is null and it shouldn't be", tokens != null);
    assertEquals("tokens Size: " + tokens.size() + " is not 1", 1, tokens.size());
    assertTrue("Token offsets do not match", isOffsetCorrect(original, tokens));
    String firstKeyword = "value1";
    String secondKeyword = "value2";
    original = "field-with-parenthesis:(" + firstKeyword + " " + secondKeyword + ")";
    tokens = converter.convert(original);
    assertTrue("tokens is null and it shouldn't be", tokens != null);
    assertEquals("tokens Size: " + tokens.size() + " is not 2", 2, tokens.size());
    assertTrue("Token offsets do not match", isOffsetCorrect(original, tokens));
    assertTrue("first Token is not " + firstKeyword, new ArrayList<>(tokens).get(0).toString().equals(firstKeyword));
    assertTrue("second Token is not " + secondKeyword, new ArrayList<>(tokens).get(1).toString().equals(secondKeyword));
}

Also used : WhitespaceAnalyzer(org.apache.lucene.analysis.core.WhitespaceAnalyzer) NamedList(org.apache.solr.common.util.NamedList) ArrayList(java.util.ArrayList) Token(org.apache.lucene.analysis.Token) Test(org.junit.Test)

Aggregations

WhitespaceAnalyzer (org.apache.lucene.analysis.core.WhitespaceAnalyzer)37 IndexWriter (org.apache.lucene.index.IndexWriter)17 IndexWriterConfig (org.apache.lucene.index.IndexWriterConfig)17 Document (org.apache.lucene.document.Document)16 Analyzer (org.apache.lucene.analysis.Analyzer)9 Test (org.junit.Test)9 NamedList (org.apache.solr.common.util.NamedList)8 ArrayList (java.util.ArrayList)7 Token (org.apache.lucene.analysis.Token)7 TextField (org.apache.lucene.document.TextField)7 IndexSearcher (org.apache.lucene.search.IndexSearcher)6 IOException (java.io.IOException)5 HashMap (java.util.HashMap)5 Field (org.apache.lucene.document.Field)5 DirectoryTaxonomyWriter (org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter)5 DirectoryReader (org.apache.lucene.index.DirectoryReader)5 TokenStream (org.apache.lucene.analysis.TokenStream)4 PerFieldAnalyzerWrapper (org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper)4 LongPoint (org.apache.lucene.document.LongPoint)4 BooleanQuery (org.apache.lucene.search.BooleanQuery)4