Search in sources :

Example 1 with TokenFilterFactory

use of org.opensearch.index.analysis.TokenFilterFactory in project OpenSearch by opensearch-project.

the class ConcatenateGraphTokenFilterFactoryTests method testOldLuceneVersionNoSeparator.

public void testOldLuceneVersionNoSeparator() throws IOException {
    OpenSearchTestCase.TestAnalysis analysis = AnalysisTestsHelper.createTestAnalysisFromSettings(Settings.builder().put(IndexMetadata.SETTING_VERSION_CREATED, VersionUtils.randomVersionBetween(random(), LegacyESVersion.V_7_0_0, LegacyESVersion.V_7_5_2)).put(Environment.PATH_HOME_SETTING.getKey(), createTempDir().toString()).put("index.analysis.filter.my_concatenate_graph.type", "concatenate_graph").put("index.analysis.filter.my_concatenate_graph.token_separator", // this will be ignored
    "+").put("index.analysis.filter.my_concatenate_graph.preserve_separator", "false").build(), new CommonAnalysisPlugin());
    TokenFilterFactory tokenFilter = analysis.tokenFilter.get("my_concatenate_graph");
    String source = "PowerShot Is AweSome";
    Tokenizer tokenizer = new WhitespaceTokenizer();
    tokenizer.setReader(new StringReader(source));
    // earlier Lucene version will not add separator if preserve_separator is false
    assertTokenStreamContents(tokenFilter.create(tokenizer), new String[] { "PowerShotIsAweSome" });
}
Also used : WhitespaceTokenizer(org.apache.lucene.analysis.core.WhitespaceTokenizer) OpenSearchTestCase(org.opensearch.test.OpenSearchTestCase) StringReader(java.io.StringReader) WhitespaceTokenizer(org.apache.lucene.analysis.core.WhitespaceTokenizer) Tokenizer(org.apache.lucene.analysis.Tokenizer) TokenFilterFactory(org.opensearch.index.analysis.TokenFilterFactory)

Example 2 with TokenFilterFactory

use of org.opensearch.index.analysis.TokenFilterFactory in project OpenSearch by opensearch-project.

the class ConcatenateGraphTokenFilterFactoryTests method testOldLuceneVersionSeparator.

public void testOldLuceneVersionSeparator() throws IOException {
    OpenSearchTestCase.TestAnalysis analysis = AnalysisTestsHelper.createTestAnalysisFromSettings(Settings.builder().put(IndexMetadata.SETTING_VERSION_CREATED, VersionUtils.randomVersionBetween(random(), LegacyESVersion.V_7_0_0, LegacyESVersion.V_7_5_2)).put(Environment.PATH_HOME_SETTING.getKey(), createTempDir().toString()).put("index.analysis.filter.my_concatenate_graph.type", "concatenate_graph").put("index.analysis.filter.my_concatenate_graph.token_separator", // this will be ignored
    "+").build(), new CommonAnalysisPlugin());
    TokenFilterFactory tokenFilter = analysis.tokenFilter.get("my_concatenate_graph");
    String source = "PowerShot Is AweSome";
    Tokenizer tokenizer = new WhitespaceTokenizer();
    tokenizer.setReader(new StringReader(source));
    // earlier Lucene version will only use Lucene's default separator
    assertTokenStreamContents(tokenFilter.create(tokenizer), new String[] { "PowerShot" + ConcatenateGraphFilter.DEFAULT_TOKEN_SEPARATOR + "Is" + ConcatenateGraphFilter.DEFAULT_TOKEN_SEPARATOR + "AweSome" });
}
Also used : WhitespaceTokenizer(org.apache.lucene.analysis.core.WhitespaceTokenizer) OpenSearchTestCase(org.opensearch.test.OpenSearchTestCase) StringReader(java.io.StringReader) WhitespaceTokenizer(org.apache.lucene.analysis.core.WhitespaceTokenizer) Tokenizer(org.apache.lucene.analysis.Tokenizer) TokenFilterFactory(org.opensearch.index.analysis.TokenFilterFactory)

Example 3 with TokenFilterFactory

use of org.opensearch.index.analysis.TokenFilterFactory in project OpenSearch by opensearch-project.

the class EdgeNGramTokenFilterFactoryTests method testPreserveOriginal.

public void testPreserveOriginal() throws IOException {
    OpenSearchTestCase.TestAnalysis analysis = AnalysisTestsHelper.createTestAnalysisFromSettings(Settings.builder().put(Environment.PATH_HOME_SETTING.getKey(), createTempDir().toString()).put("index.analysis.filter.my_edge_ngram.type", "edge_ngram").put("index.analysis.filter.my_edge_ngram.preserve_original", true).build(), new CommonAnalysisPlugin());
    TokenFilterFactory tokenFilter = analysis.tokenFilter.get("my_edge_ngram");
    String source = "foo";
    String[] expected = new String[] { "f", "fo", "foo" };
    Tokenizer tokenizer = new StandardTokenizer();
    tokenizer.setReader(new StringReader(source));
    assertTokenStreamContents(tokenFilter.create(tokenizer), expected);
}
Also used : OpenSearchTestCase(org.opensearch.test.OpenSearchTestCase) StandardTokenizer(org.apache.lucene.analysis.standard.StandardTokenizer) StringReader(java.io.StringReader) Tokenizer(org.apache.lucene.analysis.Tokenizer) StandardTokenizer(org.apache.lucene.analysis.standard.StandardTokenizer) TokenFilterFactory(org.opensearch.index.analysis.TokenFilterFactory)

Example 4 with TokenFilterFactory

use of org.opensearch.index.analysis.TokenFilterFactory in project OpenSearch by opensearch-project.

the class KeepFilterFactoryTests method testLoadWithoutSettings.

public void testLoadWithoutSettings() throws IOException {
    OpenSearchTestCase.TestAnalysis analysis = AnalysisTestsHelper.createTestAnalysisFromClassPath(createTempDir(), RESOURCE, new CommonAnalysisPlugin());
    TokenFilterFactory tokenFilter = analysis.tokenFilter.get("keep");
    Assert.assertNull(tokenFilter);
}
Also used : OpenSearchTestCase(org.opensearch.test.OpenSearchTestCase) TokenFilterFactory(org.opensearch.index.analysis.TokenFilterFactory)

Example 5 with TokenFilterFactory

use of org.opensearch.index.analysis.TokenFilterFactory in project OpenSearch by opensearch-project.

the class ScriptedConditionTokenFilterFactory method getChainAwareTokenFilterFactory.

@Override
public TokenFilterFactory getChainAwareTokenFilterFactory(TokenizerFactory tokenizer, List<CharFilterFactory> charFilters, List<TokenFilterFactory> previousTokenFilters, Function<String, TokenFilterFactory> allFilters) {
    List<TokenFilterFactory> filters = new ArrayList<>();
    List<TokenFilterFactory> existingChain = new ArrayList<>(previousTokenFilters);
    for (String filter : filterNames) {
        TokenFilterFactory tff = allFilters.apply(filter);
        if (tff == null) {
            throw new IllegalArgumentException("ScriptedConditionTokenFilter [" + name() + "] refers to undefined token filter [" + filter + "]");
        }
        tff = tff.getChainAwareTokenFilterFactory(tokenizer, charFilters, existingChain, allFilters);
        filters.add(tff);
        existingChain.add(tff);
    }
    return new TokenFilterFactory() {

        @Override
        public String name() {
            return ScriptedConditionTokenFilterFactory.this.name();
        }

        @Override
        public TokenStream create(TokenStream tokenStream) {
            Function<TokenStream, TokenStream> filter = in -> {
                for (TokenFilterFactory tff : filters) {
                    in = tff.create(in);
                }
                return in;
            };
            return new ScriptedConditionTokenFilter(tokenStream, filter, factory.newInstance());
        }
    };
}
Also used : ScriptService(org.opensearch.script.ScriptService) TokenizerFactory(org.opensearch.index.analysis.TokenizerFactory) TokenStream(org.apache.lucene.analysis.TokenStream) AbstractTokenFilterFactory(org.opensearch.index.analysis.AbstractTokenFilterFactory) Script(org.opensearch.script.Script) TokenFilterFactory(org.opensearch.index.analysis.TokenFilterFactory) Settings(org.opensearch.common.settings.Settings) IOException(java.io.IOException) Function(java.util.function.Function) ConditionalTokenFilter(org.apache.lucene.analysis.miscellaneous.ConditionalTokenFilter) ArrayList(java.util.ArrayList) ScriptType(org.opensearch.script.ScriptType) List(java.util.List) CharFilterFactory(org.opensearch.index.analysis.CharFilterFactory) IndexSettings(org.opensearch.index.IndexSettings) TokenStream(org.apache.lucene.analysis.TokenStream) ArrayList(java.util.ArrayList) AbstractTokenFilterFactory(org.opensearch.index.analysis.AbstractTokenFilterFactory) TokenFilterFactory(org.opensearch.index.analysis.TokenFilterFactory)

Aggregations

TokenFilterFactory (org.opensearch.index.analysis.TokenFilterFactory)60 StringReader (java.io.StringReader)40 Tokenizer (org.apache.lucene.analysis.Tokenizer)40 OpenSearchTestCase (org.opensearch.test.OpenSearchTestCase)37 WhitespaceTokenizer (org.apache.lucene.analysis.core.WhitespaceTokenizer)30 Settings (org.opensearch.common.settings.Settings)19 TokenStream (org.apache.lucene.analysis.TokenStream)16 StandardTokenizer (org.apache.lucene.analysis.standard.StandardTokenizer)11 IndexSettings (org.opensearch.index.IndexSettings)8 NamedAnalyzer (org.opensearch.index.analysis.NamedAnalyzer)8 CharFilterFactory (org.opensearch.index.analysis.CharFilterFactory)7 TokenizerFactory (org.opensearch.index.analysis.TokenizerFactory)7 IndexAnalyzers (org.opensearch.index.analysis.IndexAnalyzers)6 AbstractTokenFilterFactory (org.opensearch.index.analysis.AbstractTokenFilterFactory)5 Analyzer (org.apache.lucene.analysis.Analyzer)4 CannedTokenStream (org.apache.lucene.analysis.CannedTokenStream)4 Version (org.opensearch.Version)4 IOException (java.io.IOException)3 ArrayList (java.util.ArrayList)3 List (java.util.List)3