Search in sources :

Example 16 with MockTokenizer

use of org.apache.lucene.analysis.MockTokenizer in project lucene-solr by apache.

the class TestFingerprintFilter method testDupsAndSorting.

public void testDupsAndSorting() throws Exception {
    for (final boolean consumeAll : new boolean[] { true, false }) {
        MockTokenizer tokenizer = whitespaceMockTokenizer("B A B E");
        tokenizer.setEnableChecks(consumeAll);
        TokenStream stream = new FingerprintFilter(tokenizer);
        assertTokenStreamContents(stream, new String[] { "A B E" });
    }
}
Also used : MockTokenizer(org.apache.lucene.analysis.MockTokenizer) TokenStream(org.apache.lucene.analysis.TokenStream)

Example 17 with MockTokenizer

use of org.apache.lucene.analysis.MockTokenizer in project lucene-solr by apache.

the class TestFingerprintFilter method testSingleToken.

public void testSingleToken() throws Exception {
    for (final boolean consumeAll : new boolean[] { true, false }) {
        MockTokenizer tokenizer = whitespaceMockTokenizer("A1");
        tokenizer.setEnableChecks(consumeAll);
        TokenStream stream = new FingerprintFilter(tokenizer);
        assertTokenStreamContents(stream, new String[] { "A1" });
    }
}
Also used : MockTokenizer(org.apache.lucene.analysis.MockTokenizer) TokenStream(org.apache.lucene.analysis.TokenStream)

Example 18 with MockTokenizer

use of org.apache.lucene.analysis.MockTokenizer in project lucene-solr by apache.

the class TestSynonymGraphFilter method getFlattenAnalyzer.

/** Appends FlattenGraphFilter too */
private Analyzer getFlattenAnalyzer(SynonymMap.Builder b, boolean ignoreCase) throws IOException {
    final SynonymMap map = b.build();
    return new Analyzer() {

        @Override
        protected TokenStreamComponents createComponents(String fieldName) {
            Tokenizer tokenizer = new MockTokenizer(MockTokenizer.WHITESPACE, true);
            // Make a local variable so testRandomHuge doesn't share it across threads!
            SynonymGraphFilter synFilter = new SynonymGraphFilter(tokenizer, map, ignoreCase);
            FlattenGraphFilter flattenFilter = new FlattenGraphFilter(synFilter);
            TestSynonymGraphFilter.this.synFilter = synFilter;
            TestSynonymGraphFilter.this.flattenFilter = flattenFilter;
            return new TokenStreamComponents(tokenizer, flattenFilter);
        }
    };
}
Also used : MockTokenizer(org.apache.lucene.analysis.MockTokenizer) Analyzer(org.apache.lucene.analysis.Analyzer) MockAnalyzer(org.apache.lucene.analysis.MockAnalyzer) FlattenGraphFilter(org.apache.lucene.analysis.core.FlattenGraphFilter) Tokenizer(org.apache.lucene.analysis.Tokenizer) MockTokenizer(org.apache.lucene.analysis.MockTokenizer)

Example 19 with MockTokenizer

use of org.apache.lucene.analysis.MockTokenizer in project lucene-solr by apache.

the class TestSynonymGraphFilter method testRandomGraphAfter.

// Adds MockGraphTokenFilter after SynFilter:
public void testRandomGraphAfter() throws Exception {
    final int numIters = atLeast(3);
    for (int i = 0; i < numIters; i++) {
        SynonymMap.Builder b = new SynonymMap.Builder(random().nextBoolean());
        final int numEntries = atLeast(10);
        for (int j = 0; j < numEntries; j++) {
            add(b, randomNonEmptyString(), randomNonEmptyString(), random().nextBoolean());
        }
        final SynonymMap map = b.build();
        final boolean ignoreCase = random().nextBoolean();
        final boolean doFlatten = random().nextBoolean();
        final Analyzer analyzer = new Analyzer() {

            @Override
            protected TokenStreamComponents createComponents(String fieldName) {
                Tokenizer tokenizer = new MockTokenizer(MockTokenizer.SIMPLE, true);
                TokenStream syns = new SynonymGraphFilter(tokenizer, map, ignoreCase);
                TokenStream graph = new MockGraphTokenFilter(random(), syns);
                if (doFlatten) {
                    graph = new FlattenGraphFilter(graph);
                }
                return new TokenStreamComponents(tokenizer, graph);
            }
        };
        checkRandomData(random(), analyzer, 100);
        analyzer.close();
    }
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) CharsRefBuilder(org.apache.lucene.util.CharsRefBuilder) IntsRefBuilder(org.apache.lucene.util.IntsRefBuilder) Analyzer(org.apache.lucene.analysis.Analyzer) MockAnalyzer(org.apache.lucene.analysis.MockAnalyzer) FlattenGraphFilter(org.apache.lucene.analysis.core.FlattenGraphFilter) MockTokenizer(org.apache.lucene.analysis.MockTokenizer) MockGraphTokenFilter(org.apache.lucene.analysis.MockGraphTokenFilter) Tokenizer(org.apache.lucene.analysis.Tokenizer) MockTokenizer(org.apache.lucene.analysis.MockTokenizer)

Example 20 with MockTokenizer

use of org.apache.lucene.analysis.MockTokenizer in project lucene-solr by apache.

the class TestSynonymGraphFilter method getAnalyzer.

private Analyzer getAnalyzer(SynonymMap.Builder b, final boolean ignoreCase) throws IOException {
    final SynonymMap map = b.build();
    return new Analyzer() {

        @Override
        protected TokenStreamComponents createComponents(String fieldName) {
            Tokenizer tokenizer = new MockTokenizer(MockTokenizer.WHITESPACE, false);
            // Make a local variable so testRandomHuge doesn't share it across threads!
            SynonymGraphFilter synFilter = new SynonymGraphFilter(tokenizer, map, ignoreCase);
            TestSynonymGraphFilter.this.flattenFilter = null;
            TestSynonymGraphFilter.this.synFilter = synFilter;
            return new TokenStreamComponents(tokenizer, synFilter);
        }
    };
}
Also used : MockTokenizer(org.apache.lucene.analysis.MockTokenizer) Analyzer(org.apache.lucene.analysis.Analyzer) MockAnalyzer(org.apache.lucene.analysis.MockAnalyzer) Tokenizer(org.apache.lucene.analysis.Tokenizer) MockTokenizer(org.apache.lucene.analysis.MockTokenizer)

Aggregations

MockTokenizer (org.apache.lucene.analysis.MockTokenizer)280 Tokenizer (org.apache.lucene.analysis.Tokenizer)204 Analyzer (org.apache.lucene.analysis.Analyzer)161 StringReader (java.io.StringReader)118 TokenStream (org.apache.lucene.analysis.TokenStream)116 KeywordTokenizer (org.apache.lucene.analysis.core.KeywordTokenizer)106 Reader (java.io.Reader)59 MockAnalyzer (org.apache.lucene.analysis.MockAnalyzer)54 CharArraySet (org.apache.lucene.analysis.CharArraySet)44 Directory (org.apache.lucene.store.Directory)36 Document (org.apache.lucene.document.Document)31 BytesRef (org.apache.lucene.util.BytesRef)25 SetKeywordMarkerFilter (org.apache.lucene.analysis.miscellaneous.SetKeywordMarkerFilter)21 TextField (org.apache.lucene.document.TextField)20 CannedTokenStream (org.apache.lucene.analysis.CannedTokenStream)18 Field (org.apache.lucene.document.Field)17 FieldType (org.apache.lucene.document.FieldType)14 StringField (org.apache.lucene.document.StringField)11 Input (org.apache.lucene.search.suggest.Input)11 InputArrayIterator (org.apache.lucene.search.suggest.InputArrayIterator)11