Search in sources :

Example 1 with AnalyzerWrapper

use of org.apache.lucene.analysis.AnalyzerWrapper in project lucene-solr by apache.

the class FreeTextSuggester method addShingles.

private Analyzer addShingles(final Analyzer other) {
    if (grams == 1) {
        return other;
    } else {
        // Tack on ShingleFilter to the end, to generate token ngrams:
        return new AnalyzerWrapper(other.getReuseStrategy()) {

            @Override
            protected Analyzer getWrappedAnalyzer(String fieldName) {
                return other;
            }

            @Override
            protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
                ShingleFilter shingles = new ShingleFilter(components.getTokenStream(), 2, grams);
                shingles.setTokenSeparator(Character.toString((char) separator));
                return new TokenStreamComponents(components.getTokenizer(), shingles);
            }
        };
    }
}
Also used : ShingleFilter(org.apache.lucene.analysis.shingle.ShingleFilter) AnalyzerWrapper(org.apache.lucene.analysis.AnalyzerWrapper)

Example 2 with AnalyzerWrapper

use of org.apache.lucene.analysis.AnalyzerWrapper in project lucene-solr by apache.

the class TestPerFieldAnalyzerWrapper method testReuseWrapped.

public void testReuseWrapped() throws Exception {
    final String text = "Qwerty";
    final Analyzer specialAnalyzer = new SimpleAnalyzer();
    final Analyzer defaultAnalyzer = new WhitespaceAnalyzer();
    TokenStream ts1, ts2, ts3, ts4;
    final PerFieldAnalyzerWrapper wrapper1 = new PerFieldAnalyzerWrapper(defaultAnalyzer, Collections.<String, Analyzer>singletonMap("special", specialAnalyzer));
    // test that the PerFieldWrapper returns the same instance as original Analyzer:
    ts1 = defaultAnalyzer.tokenStream("something", text);
    ts2 = wrapper1.tokenStream("something", text);
    assertSame(ts1, ts2);
    ts1 = specialAnalyzer.tokenStream("special", text);
    ts2 = wrapper1.tokenStream("special", text);
    assertSame(ts1, ts2);
    // Wrap with another wrapper, which does *not* extend DelegatingAnalyzerWrapper:
    final AnalyzerWrapper wrapper2 = new AnalyzerWrapper(wrapper1.getReuseStrategy()) {

        @Override
        protected Analyzer getWrappedAnalyzer(String fieldName) {
            return wrapper1;
        }

        @Override
        protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
            assertNotSame(specialAnalyzer.tokenStream("special", text), components.getTokenStream());
            TokenFilter filter = new ASCIIFoldingFilter(components.getTokenStream());
            return new TokenStreamComponents(components.getTokenizer(), filter);
        }
    };
    ts3 = wrapper2.tokenStream("special", text);
    assertNotSame(ts1, ts3);
    assertTrue(ts3 instanceof ASCIIFoldingFilter);
    // check that cache did not get corrumpted:
    ts2 = wrapper1.tokenStream("special", text);
    assertSame(ts1, ts2);
    // Wrap PerField with another PerField. In that case all TokenStreams returned must be the same:
    final PerFieldAnalyzerWrapper wrapper3 = new PerFieldAnalyzerWrapper(wrapper1, Collections.<String, Analyzer>singletonMap("moreSpecial", specialAnalyzer));
    ts1 = specialAnalyzer.tokenStream("special", text);
    ts2 = wrapper3.tokenStream("special", text);
    assertSame(ts1, ts2);
    ts3 = specialAnalyzer.tokenStream("moreSpecial", text);
    ts4 = wrapper3.tokenStream("moreSpecial", text);
    assertSame(ts3, ts4);
    assertSame(ts2, ts3);
    IOUtils.close(wrapper3, wrapper2, wrapper1, specialAnalyzer, defaultAnalyzer);
}
Also used : WhitespaceAnalyzer(org.apache.lucene.analysis.core.WhitespaceAnalyzer) TokenStream(org.apache.lucene.analysis.TokenStream) SimpleAnalyzer(org.apache.lucene.analysis.core.SimpleAnalyzer) AnalyzerWrapper(org.apache.lucene.analysis.AnalyzerWrapper) Analyzer(org.apache.lucene.analysis.Analyzer) SimpleAnalyzer(org.apache.lucene.analysis.core.SimpleAnalyzer) WhitespaceAnalyzer(org.apache.lucene.analysis.core.WhitespaceAnalyzer) TokenFilter(org.apache.lucene.analysis.TokenFilter)

Aggregations

AnalyzerWrapper (org.apache.lucene.analysis.AnalyzerWrapper)2 Analyzer (org.apache.lucene.analysis.Analyzer)1 TokenFilter (org.apache.lucene.analysis.TokenFilter)1 TokenStream (org.apache.lucene.analysis.TokenStream)1 SimpleAnalyzer (org.apache.lucene.analysis.core.SimpleAnalyzer)1 WhitespaceAnalyzer (org.apache.lucene.analysis.core.WhitespaceAnalyzer)1 ShingleFilter (org.apache.lucene.analysis.shingle.ShingleFilter)1