Search in sources :

Example 1 with AbstractTokenFilterFactory

use of org.opensearch.index.analysis.AbstractTokenFilterFactory in project OpenSearch by opensearch-project.

the class AnalysisModule method setupTokenFilters.

private NamedRegistry<AnalysisProvider<TokenFilterFactory>> setupTokenFilters(List<AnalysisPlugin> plugins, HunspellService hunspellService) {
    NamedRegistry<AnalysisProvider<TokenFilterFactory>> tokenFilters = new NamedRegistry<>("token_filter");
    tokenFilters.register("stop", StopTokenFilterFactory::new);
    // Add "standard" for old indices (bwc)
    tokenFilters.register("standard", new AnalysisProvider<TokenFilterFactory>() {

        @Override
        public TokenFilterFactory get(IndexSettings indexSettings, Environment environment, String name, Settings settings) {
            if (indexSettings.getIndexVersionCreated().before(LegacyESVersion.V_7_0_0)) {
                deprecationLogger.deprecate("standard_deprecation", "The [standard] token filter name is deprecated and will be removed in a future version.");
            } else {
                throw new IllegalArgumentException("The [standard] token filter has been removed.");
            }
            return new AbstractTokenFilterFactory(indexSettings, name, settings) {

                @Override
                public TokenStream create(TokenStream tokenStream) {
                    return tokenStream;
                }
            };
        }

        @Override
        public boolean requiresAnalysisSettings() {
            return false;
        }
    });
    tokenFilters.register("shingle", ShingleTokenFilterFactory::new);
    tokenFilters.register("hunspell", requiresAnalysisSettings((indexSettings, env, name, settings) -> new HunspellTokenFilterFactory(indexSettings, name, settings, hunspellService)));
    tokenFilters.extractAndRegister(plugins, AnalysisPlugin::getTokenFilters);
    return tokenFilters;
}
Also used : TokenizerFactory(org.opensearch.index.analysis.TokenizerFactory) IndexMetadata(org.opensearch.cluster.metadata.IndexMetadata) PreBuiltAnalyzerProviderFactory(org.opensearch.index.analysis.PreBuiltAnalyzerProviderFactory) StopAnalyzerProvider(org.opensearch.index.analysis.StopAnalyzerProvider) TokenFilterFactory(org.opensearch.index.analysis.TokenFilterFactory) Version(org.opensearch.Version) StopTokenFilterFactory(org.opensearch.index.analysis.StopTokenFilterFactory) DeprecationLogger(org.opensearch.common.logging.DeprecationLogger) SimpleAnalyzerProvider(org.opensearch.index.analysis.SimpleAnalyzerProvider) AnalysisRegistry(org.opensearch.index.analysis.AnalysisRegistry) LegacyESVersion(org.opensearch.LegacyESVersion) KeywordAnalyzerProvider(org.opensearch.index.analysis.KeywordAnalyzerProvider) AnalysisPlugin.requiresAnalysisSettings(org.opensearch.plugins.AnalysisPlugin.requiresAnalysisSettings) CharFilterFactory(org.opensearch.index.analysis.CharFilterFactory) Locale(java.util.Locale) Map(java.util.Map) StandardTokenizerFactory(org.opensearch.index.analysis.StandardTokenizerFactory) PreConfiguredTokenizer(org.opensearch.index.analysis.PreConfiguredTokenizer) Environment(org.opensearch.env.Environment) LowerCaseFilter(org.apache.lucene.analysis.LowerCaseFilter) TokenStream(org.apache.lucene.analysis.TokenStream) PreConfiguredTokenFilter(org.opensearch.index.analysis.PreConfiguredTokenFilter) AbstractTokenFilterFactory(org.opensearch.index.analysis.AbstractTokenFilterFactory) Settings(org.opensearch.common.settings.Settings) IOException(java.io.IOException) PreConfiguredCharFilter(org.opensearch.index.analysis.PreConfiguredCharFilter) ShingleTokenFilterFactory(org.opensearch.index.analysis.ShingleTokenFilterFactory) LowercaseNormalizerProvider(org.opensearch.index.analysis.LowercaseNormalizerProvider) AnalysisPlugin(org.opensearch.plugins.AnalysisPlugin) List(java.util.List) AnalyzerProvider(org.opensearch.index.analysis.AnalyzerProvider) NamedRegistry(org.opensearch.common.NamedRegistry) IndexSettings(org.opensearch.index.IndexSettings) WhitespaceAnalyzerProvider(org.opensearch.index.analysis.WhitespaceAnalyzerProvider) HunspellTokenFilterFactory(org.opensearch.index.analysis.HunspellTokenFilterFactory) Collections.unmodifiableMap(java.util.Collections.unmodifiableMap) StandardAnalyzerProvider(org.opensearch.index.analysis.StandardAnalyzerProvider) StopTokenFilterFactory(org.opensearch.index.analysis.StopTokenFilterFactory) TokenStream(org.apache.lucene.analysis.TokenStream) ShingleTokenFilterFactory(org.opensearch.index.analysis.ShingleTokenFilterFactory) IndexSettings(org.opensearch.index.IndexSettings) AbstractTokenFilterFactory(org.opensearch.index.analysis.AbstractTokenFilterFactory) TokenFilterFactory(org.opensearch.index.analysis.TokenFilterFactory) StopTokenFilterFactory(org.opensearch.index.analysis.StopTokenFilterFactory) AbstractTokenFilterFactory(org.opensearch.index.analysis.AbstractTokenFilterFactory) ShingleTokenFilterFactory(org.opensearch.index.analysis.ShingleTokenFilterFactory) HunspellTokenFilterFactory(org.opensearch.index.analysis.HunspellTokenFilterFactory) NamedRegistry(org.opensearch.common.NamedRegistry) Environment(org.opensearch.env.Environment) AnalysisPlugin.requiresAnalysisSettings(org.opensearch.plugins.AnalysisPlugin.requiresAnalysisSettings) Settings(org.opensearch.common.settings.Settings) IndexSettings(org.opensearch.index.IndexSettings) HunspellTokenFilterFactory(org.opensearch.index.analysis.HunspellTokenFilterFactory) AnalysisPlugin(org.opensearch.plugins.AnalysisPlugin)

Example 2 with AbstractTokenFilterFactory

use of org.opensearch.index.analysis.AbstractTokenFilterFactory in project OpenSearch by opensearch-project.

the class TransportAnalyzeActionTests method setUp.

@Override
public void setUp() throws Exception {
    super.setUp();
    Settings settings = Settings.builder().put(Environment.PATH_HOME_SETTING.getKey(), createTempDir().toString()).build();
    Settings indexSettings = Settings.builder().put(IndexMetadata.SETTING_VERSION_CREATED, Version.CURRENT).put(IndexMetadata.SETTING_INDEX_UUID, UUIDs.randomBase64UUID()).put("index.analysis.analyzer.custom_analyzer.tokenizer", "standard").put("index.analysis.analyzer.custom_analyzer.filter", "mock").put("index.analysis.normalizer.my_normalizer.type", "custom").put("index.analysis.char_filter.my_append.type", "append").put("index.analysis.char_filter.my_append.suffix", "baz").put("index.analyze.max_token_count", 100).putList("index.analysis.normalizer.my_normalizer.filter", "lowercase").build();
    this.indexSettings = IndexSettingsModule.newIndexSettings("index", indexSettings);
    Environment environment = TestEnvironment.newEnvironment(settings);
    AnalysisPlugin plugin = new AnalysisPlugin() {

        class MockFactory extends AbstractTokenFilterFactory {

            final CharacterRunAutomaton stopset;

            MockFactory(IndexSettings indexSettings, Environment env, String name, Settings settings) {
                super(indexSettings, name, settings);
                if (settings.hasValue("stopword")) {
                    this.stopset = new CharacterRunAutomaton(Automata.makeString(settings.get("stopword")));
                } else {
                    this.stopset = MockTokenFilter.ENGLISH_STOPSET;
                }
            }

            @Override
            public TokenStream create(TokenStream tokenStream) {
                return new MockTokenFilter(tokenStream, this.stopset);
            }
        }

        class DeprecatedTokenFilterFactory extends AbstractTokenFilterFactory implements NormalizingTokenFilterFactory {

            DeprecatedTokenFilterFactory(IndexSettings indexSettings, Environment env, String name, Settings settings) {
                super(indexSettings, name, settings);
            }

            @Override
            public TokenStream create(TokenStream tokenStream) {
                deprecationLogger.deprecate("deprecated_token_filter_create", "Using deprecated token filter [deprecated]");
                return tokenStream;
            }

            @Override
            public TokenStream normalize(TokenStream tokenStream) {
                deprecationLogger.deprecate("deprecated_token_filter_normalize", "Using deprecated token filter [deprecated]");
                return tokenStream;
            }
        }

        class AppendCharFilterFactory extends AbstractCharFilterFactory {

            final String suffix;

            AppendCharFilterFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) {
                super(indexSettings, name);
                this.suffix = settings.get("suffix", "bar");
            }

            @Override
            public Reader create(Reader reader) {
                return new AppendCharFilter(reader, suffix);
            }
        }

        @Override
        public Map<String, AnalysisProvider<CharFilterFactory>> getCharFilters() {
            return singletonMap("append", AppendCharFilterFactory::new);
        }

        @Override
        public Map<String, AnalysisProvider<TokenizerFactory>> getTokenizers() {
            return singletonMap("keyword", (indexSettings, environment, name, settings) -> TokenizerFactory.newFactory(name, () -> new MockTokenizer(MockTokenizer.KEYWORD, false)));
        }

        @Override
        public Map<String, AnalysisProvider<TokenFilterFactory>> getTokenFilters() {
            Map<String, AnalysisProvider<TokenFilterFactory>> filters = new HashMap<>();
            filters.put("mock", MockFactory::new);
            filters.put("deprecated", DeprecatedTokenFilterFactory::new);
            return filters;
        }

        @Override
        public List<PreConfiguredCharFilter> getPreConfiguredCharFilters() {
            return singletonList(PreConfiguredCharFilter.singleton("append", false, reader -> new AppendCharFilter(reader, "foo")));
        }
    };
    registry = new AnalysisModule(environment, singletonList(plugin)).getAnalysisRegistry();
    indexAnalyzers = registry.build(this.indexSettings);
    maxTokenCount = IndexSettings.MAX_TOKEN_COUNT_SETTING.getDefault(settings);
    idxMaxTokenCount = this.indexSettings.getMaxTokenCount();
}
Also used : TestEnvironment(org.opensearch.env.TestEnvironment) TokenizerFactory(org.opensearch.index.analysis.TokenizerFactory) IndexMetadata(org.opensearch.cluster.metadata.IndexMetadata) MockTokenizer(org.apache.lucene.analysis.MockTokenizer) TokenFilterFactory(org.opensearch.index.analysis.TokenFilterFactory) Version(org.opensearch.Version) HashMap(java.util.HashMap) Collections.singletonList(java.util.Collections.singletonList) AnalysisRegistry(org.opensearch.index.analysis.AnalysisRegistry) AppendCharFilter(org.opensearch.indices.analysis.AnalysisModuleTests.AppendCharFilter) CharFilterFactory(org.opensearch.index.analysis.CharFilterFactory) AnalyzeAction(org.opensearch.action.admin.indices.analyze.AnalyzeAction) AbstractCharFilterFactory(org.opensearch.index.analysis.AbstractCharFilterFactory) Map(java.util.Map) AnalysisProvider(org.opensearch.indices.analysis.AnalysisModule.AnalysisProvider) Collections.singletonMap(java.util.Collections.singletonMap) UUIDs(org.opensearch.common.UUIDs) Automata(org.apache.lucene.util.automaton.Automata) CharacterRunAutomaton(org.apache.lucene.util.automaton.CharacterRunAutomaton) Environment(org.opensearch.env.Environment) NormalizingTokenFilterFactory(org.opensearch.index.analysis.NormalizingTokenFilterFactory) TokenStream(org.apache.lucene.analysis.TokenStream) AbstractTokenFilterFactory(org.opensearch.index.analysis.AbstractTokenFilterFactory) OpenSearchTestCase(org.opensearch.test.OpenSearchTestCase) TransportAnalyzeAction(org.opensearch.action.admin.indices.analyze.TransportAnalyzeAction) Settings(org.opensearch.common.settings.Settings) IOException(java.io.IOException) Reader(java.io.Reader) Mockito.when(org.mockito.Mockito.when) IndexService(org.opensearch.index.IndexService) PreConfiguredCharFilter(org.opensearch.index.analysis.PreConfiguredCharFilter) AnalysisPlugin(org.opensearch.plugins.AnalysisPlugin) List(java.util.List) AnalysisModule(org.opensearch.indices.analysis.AnalysisModule) MockTokenFilter(org.apache.lucene.analysis.MockTokenFilter) IndexSettings(org.opensearch.index.IndexSettings) IndexAnalyzers(org.opensearch.index.analysis.IndexAnalyzers) IndexSettingsModule(org.opensearch.test.IndexSettingsModule) Mockito.mock(org.mockito.Mockito.mock) TokenStream(org.apache.lucene.analysis.TokenStream) MockTokenFilter(org.apache.lucene.analysis.MockTokenFilter) HashMap(java.util.HashMap) CharacterRunAutomaton(org.apache.lucene.util.automaton.CharacterRunAutomaton) IndexSettings(org.opensearch.index.IndexSettings) PreConfiguredCharFilter(org.opensearch.index.analysis.PreConfiguredCharFilter) Reader(java.io.Reader) AppendCharFilter(org.opensearch.indices.analysis.AnalysisModuleTests.AppendCharFilter) MockTokenizer(org.apache.lucene.analysis.MockTokenizer) TestEnvironment(org.opensearch.env.TestEnvironment) Environment(org.opensearch.env.Environment) AnalysisModule(org.opensearch.indices.analysis.AnalysisModule) AnalysisProvider(org.opensearch.indices.analysis.AnalysisModule.AnalysisProvider) Settings(org.opensearch.common.settings.Settings) IndexSettings(org.opensearch.index.IndexSettings) AnalysisPlugin(org.opensearch.plugins.AnalysisPlugin)

Aggregations

IOException (java.io.IOException)2 List (java.util.List)2 Map (java.util.Map)2 TokenStream (org.apache.lucene.analysis.TokenStream)2 Version (org.opensearch.Version)2 IndexMetadata (org.opensearch.cluster.metadata.IndexMetadata)2 Settings (org.opensearch.common.settings.Settings)2 Environment (org.opensearch.env.Environment)2 IndexSettings (org.opensearch.index.IndexSettings)2 AbstractTokenFilterFactory (org.opensearch.index.analysis.AbstractTokenFilterFactory)2 AnalysisRegistry (org.opensearch.index.analysis.AnalysisRegistry)2 CharFilterFactory (org.opensearch.index.analysis.CharFilterFactory)2 PreConfiguredCharFilter (org.opensearch.index.analysis.PreConfiguredCharFilter)2 TokenFilterFactory (org.opensearch.index.analysis.TokenFilterFactory)2 TokenizerFactory (org.opensearch.index.analysis.TokenizerFactory)2 AnalysisPlugin (org.opensearch.plugins.AnalysisPlugin)2 Reader (java.io.Reader)1 Collections.singletonList (java.util.Collections.singletonList)1 Collections.singletonMap (java.util.Collections.singletonMap)1 Collections.unmodifiableMap (java.util.Collections.unmodifiableMap)1