Search in sources :

Example 6 with SynonymFilter

use of org.apache.lucene.analysis.synonym.SynonymFilter in project lucene-solr by apache.

the class TestLimitTokenPositionFilter method testMaxPosition3WithSynomyms.

public void testMaxPosition3WithSynomyms() throws IOException {
    for (final boolean consumeAll : new boolean[] { true, false }) {
        MockTokenizer tokenizer = whitespaceMockTokenizer("one two three four five");
        // if we are consuming all tokens, we can use the checks, otherwise we can't
        tokenizer.setEnableChecks(consumeAll);
        SynonymMap.Builder builder = new SynonymMap.Builder(true);
        builder.add(new CharsRef("one"), new CharsRef("first"), true);
        builder.add(new CharsRef("one"), new CharsRef("alpha"), true);
        builder.add(new CharsRef("one"), new CharsRef("beguine"), true);
        CharsRefBuilder multiWordCharsRef = new CharsRefBuilder();
        SynonymMap.Builder.join(new String[] { "and", "indubitably", "single", "only" }, multiWordCharsRef);
        builder.add(new CharsRef("one"), multiWordCharsRef.get(), true);
        SynonymMap.Builder.join(new String[] { "dopple", "ganger" }, multiWordCharsRef);
        builder.add(new CharsRef("two"), multiWordCharsRef.get(), true);
        SynonymMap synonymMap = builder.build();
        TokenStream stream = new SynonymFilter(tokenizer, synonymMap, true);
        stream = new LimitTokenPositionFilter(stream, 3, consumeAll);
        // "only", the 4th word of multi-word synonym "and indubitably single only" is not emitted, since its position is greater than 3.
        assertTokenStreamContents(stream, new String[] { "one", "first", "alpha", "beguine", "and", "two", "indubitably", "dopple", "three", "single", "ganger" }, new int[] { 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0 });
    }
}
Also used : MockTokenizer(org.apache.lucene.analysis.MockTokenizer) TokenStream(org.apache.lucene.analysis.TokenStream) CharsRefBuilder(org.apache.lucene.util.CharsRefBuilder) SynonymFilter(org.apache.lucene.analysis.synonym.SynonymFilter) CharsRefBuilder(org.apache.lucene.util.CharsRefBuilder) CharsRef(org.apache.lucene.util.CharsRef) SynonymMap(org.apache.lucene.analysis.synonym.SynonymMap)

Example 7 with SynonymFilter

use of org.apache.lucene.analysis.synonym.SynonymFilter in project lucene-solr by apache.

the class TestRemoveDuplicatesTokenFilter method testRandomStrings.

/** blast some random strings through the analyzer */
public void testRandomStrings() throws Exception {
    final int numIters = atLeast(10);
    for (int i = 0; i < numIters; i++) {
        SynonymMap.Builder b = new SynonymMap.Builder(random().nextBoolean());
        final int numEntries = atLeast(10);
        for (int j = 0; j < numEntries; j++) {
            add(b, randomNonEmptyString(), randomNonEmptyString(), random().nextBoolean());
        }
        final SynonymMap map = b.build();
        final boolean ignoreCase = random().nextBoolean();
        final Analyzer analyzer = new Analyzer() {

            @Override
            protected TokenStreamComponents createComponents(String fieldName) {
                Tokenizer tokenizer = new MockTokenizer(MockTokenizer.SIMPLE, true);
                TokenStream stream = new SynonymFilter(tokenizer, map, ignoreCase);
                return new TokenStreamComponents(tokenizer, new RemoveDuplicatesTokenFilter(stream));
            }
        };
        checkRandomData(random(), analyzer, 200);
        analyzer.close();
    }
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) SynonymFilter(org.apache.lucene.analysis.synonym.SynonymFilter) Analyzer(org.apache.lucene.analysis.Analyzer) SynonymMap(org.apache.lucene.analysis.synonym.SynonymMap) MockTokenizer(org.apache.lucene.analysis.MockTokenizer) Tokenizer(org.apache.lucene.analysis.Tokenizer) MockTokenizer(org.apache.lucene.analysis.MockTokenizer) KeywordTokenizer(org.apache.lucene.analysis.core.KeywordTokenizer)

Example 8 with SynonymFilter

use of org.apache.lucene.analysis.synonym.SynonymFilter in project crate by crate.

the class SynonymTokenFilterFactory method getChainAwareTokenFilterFactory.

@Override
public TokenFilterFactory getChainAwareTokenFilterFactory(TokenizerFactory tokenizer, List<CharFilterFactory> charFilters, List<TokenFilterFactory> previousTokenFilters, Function<String, TokenFilterFactory> allFilters) {
    final Analyzer analyzer = buildSynonymAnalyzer(tokenizer, charFilters, previousTokenFilters);
    final SynonymMap synonyms = buildSynonyms(analyzer, getRulesFromSettings(environment));
    final String name = name();
    return new TokenFilterFactory() {

        @Override
        public String name() {
            return name;
        }

        @Override
        public TokenStream create(TokenStream tokenStream) {
            return synonyms.fst == null ? tokenStream : new SynonymFilter(tokenStream, synonyms, false);
        }
    };
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) SynonymFilter(org.apache.lucene.analysis.synonym.SynonymFilter) Analyzer(org.apache.lucene.analysis.Analyzer) SynonymMap(org.apache.lucene.analysis.synonym.SynonymMap)

Aggregations

SynonymFilter (org.apache.lucene.analysis.synonym.SynonymFilter)8 SynonymMap (org.apache.lucene.analysis.synonym.SynonymMap)6 StringReader (java.io.StringReader)5 MockTokenizer (org.apache.lucene.analysis.MockTokenizer)5 Tokenizer (org.apache.lucene.analysis.Tokenizer)5 BytesRef (org.apache.lucene.util.BytesRef)5 Analyzer (org.apache.lucene.analysis.Analyzer)4 CharsRef (org.apache.lucene.util.CharsRef)4 CharsRefBuilder (org.apache.lucene.util.CharsRefBuilder)4 TokenStream (org.apache.lucene.analysis.TokenStream)3 Test (org.junit.Test)3 IOException (java.io.IOException)2 HashMap (java.util.HashMap)2 LowerCaseFilter (org.apache.lucene.analysis.LowerCaseFilter)2 TokenFilter (org.apache.lucene.analysis.TokenFilter)2 WhitespaceAnalyzer (org.apache.lucene.analysis.core.WhitespaceAnalyzer)2 PerFieldAnalyzerWrapper (org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper)2 ShingleFilter (org.apache.lucene.analysis.shingle.ShingleFilter)2 StandardTokenizer (org.apache.lucene.analysis.standard.StandardTokenizer)2 SolrSynonymParser (org.apache.lucene.analysis.synonym.SolrSynonymParser)2