Examples with Analyzer - org.apache.lucene.analysis.Analyzer

Example 11 with Analyzer

use of org.apache.lucene.analysis.Analyzer in project elasticsearch by elastic.

the class PreBuiltAnalyzerTests method testThatDefaultAndStandardAnalyzerAreTheSameInstance.

public void testThatDefaultAndStandardAnalyzerAreTheSameInstance() {
    Analyzer currentStandardAnalyzer = PreBuiltAnalyzers.STANDARD.getAnalyzer(Version.CURRENT);
    Analyzer currentDefaultAnalyzer = PreBuiltAnalyzers.DEFAULT.getAnalyzer(Version.CURRENT);
    // special case, these two are the same instance
    assertThat(currentDefaultAnalyzer, is(currentStandardAnalyzer));
}

Also used : Analyzer(org.apache.lucene.analysis.Analyzer)

Example 12 with Analyzer

use of org.apache.lucene.analysis.Analyzer in project elasticsearch by elastic.

the class CompoundAnalysisTests method analyze.

private List<String> analyze(Settings settings, String analyzerName, String text) throws IOException {
    IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("test", settings);
    AnalysisModule analysisModule = new AnalysisModule(new Environment(settings), singletonList(new AnalysisPlugin() {

        @Override
        public Map<String, AnalysisProvider<TokenFilterFactory>> getTokenFilters() {
            return singletonMap("myfilter", MyFilterTokenFilterFactory::new);
        }
    }));
    IndexAnalyzers indexAnalyzers = analysisModule.getAnalysisRegistry().build(idxSettings);
    Analyzer analyzer = indexAnalyzers.get(analyzerName).analyzer();
    AllEntries allEntries = new AllEntries();
    allEntries.addText("field1", text, 1.0f);
    TokenStream stream = AllTokenStream.allTokenStream("_all", text, 1.0f, analyzer);
    stream.reset();
    CharTermAttribute termAtt = stream.addAttribute(CharTermAttribute.class);
    List<String> terms = new ArrayList<>();
    while (stream.incrementToken()) {
        String tokText = termAtt.toString();
        terms.add(tokText);
    }
    return terms;
}

Also used : AllTokenStream(org.elasticsearch.common.lucene.all.AllTokenStream) TokenStream(org.apache.lucene.analysis.TokenStream) IndexSettings(org.elasticsearch.index.IndexSettings) ArrayList(java.util.ArrayList) Analyzer(org.apache.lucene.analysis.Analyzer) AllEntries(org.elasticsearch.common.lucene.all.AllEntries) CharTermAttribute(org.apache.lucene.analysis.tokenattributes.CharTermAttribute) Environment(org.elasticsearch.env.Environment) AnalysisModule(org.elasticsearch.indices.analysis.AnalysisModule) MyFilterTokenFilterFactory(org.elasticsearch.index.analysis.filter1.MyFilterTokenFilterFactory) AnalysisProvider(org.elasticsearch.indices.analysis.AnalysisModule.AnalysisProvider) AnalysisPlugin(org.elasticsearch.plugins.AnalysisPlugin)

Example 13 with Analyzer

use of org.apache.lucene.analysis.Analyzer in project elasticsearch by elastic.

the class SnowballAnalyzerTests method testReusableTokenStream.

public void testReusableTokenStream() throws Exception {
    Analyzer a = new SnowballAnalyzer("English");
    assertAnalyzesTo(a, "he abhorred accents", new String[] { "he", "abhor", "accent" });
    assertAnalyzesTo(a, "she abhorred him", new String[] { "she", "abhor", "him" });
}

Also used : StandardAnalyzer(org.apache.lucene.analysis.standard.StandardAnalyzer) Analyzer(org.apache.lucene.analysis.Analyzer)

Example 14 with Analyzer

use of org.apache.lucene.analysis.Analyzer in project elasticsearch by elastic.

the class FingerprintAnalyzerTests method testLimit.

public void testLimit() throws Exception {
    Analyzer a = new FingerprintAnalyzer(CharArraySet.EMPTY_SET, ' ', 3);
    assertAnalyzesTo(a, "e d c b a", new String[] {});
    assertAnalyzesTo(a, "b a", new String[] { "a b" });
}

Also used : Analyzer(org.apache.lucene.analysis.Analyzer)

Example 15 with Analyzer

use of org.apache.lucene.analysis.Analyzer in project elasticsearch by elastic.

the class FingerprintAnalyzerTests method testAsciifolding.

public void testAsciifolding() throws Exception {
    Analyzer a = new FingerprintAnalyzer(CharArraySet.EMPTY_SET, ' ', 255);
    assertAnalyzesTo(a, "gödel escher bach", new String[] { "bach escher godel" });
    assertAnalyzesTo(a, "gödel godel escher bach", new String[] { "bach escher godel" });
}

Also used : Analyzer(org.apache.lucene.analysis.Analyzer)

Aggregations

Analyzer (org.apache.lucene.analysis.Analyzer)1020 MockAnalyzer (org.apache.lucene.analysis.MockAnalyzer)396 Tokenizer (org.apache.lucene.analysis.Tokenizer)265 MockTokenizer (org.apache.lucene.analysis.MockTokenizer)228 Document (org.apache.lucene.document.Document)207 Directory (org.apache.lucene.store.Directory)192 KeywordTokenizer (org.apache.lucene.analysis.core.KeywordTokenizer)176 BytesRef (org.apache.lucene.util.BytesRef)122 Test (org.junit.Test)119 TokenStream (org.apache.lucene.analysis.TokenStream)107 RandomIndexWriter (org.apache.lucene.index.RandomIndexWriter)92 Term (org.apache.lucene.index.Term)92 IndexReader (org.apache.lucene.index.IndexReader)67 InputArrayIterator (org.apache.lucene.search.suggest.InputArrayIterator)65 StandardAnalyzer (org.apache.lucene.analysis.standard.StandardAnalyzer)64 Input (org.apache.lucene.search.suggest.Input)63 CharArraySet (org.apache.lucene.analysis.CharArraySet)58 ArrayList (java.util.ArrayList)57 IndexWriterConfig (org.apache.lucene.index.IndexWriterConfig)57 TextField (org.apache.lucene.document.TextField)55