Search in sources :

Example 96 with TokenStream

use of org.apache.lucene.analysis.TokenStream in project lucene-solr by apache.

the class TestRussianLightStemFilterFactory method testStemming.

public void testStemming() throws Exception {
    Reader reader = new StringReader("журналы");
    TokenStream stream = whitespaceMockTokenizer(reader);
    stream = tokenFilterFactory("RussianLightStem").create(stream);
    assertTokenStreamContents(stream, new String[] { "журнал" });
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) StringReader(java.io.StringReader) StringReader(java.io.StringReader) Reader(java.io.Reader)

Example 97 with TokenStream

use of org.apache.lucene.analysis.TokenStream in project lucene-solr by apache.

the class ShingleAnalyzerWrapperTest method testShingleAnalyzerWrapperPhraseQuery.

/*
   * This shows how to construct a phrase query containing shingles.
   */
public void testShingleAnalyzerWrapperPhraseQuery() throws Exception {
    PhraseQuery.Builder builder = new PhraseQuery.Builder();
    try (TokenStream ts = analyzer.tokenStream("content", "this sentence")) {
        int j = -1;
        PositionIncrementAttribute posIncrAtt = ts.addAttribute(PositionIncrementAttribute.class);
        CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
        ts.reset();
        while (ts.incrementToken()) {
            j += posIncrAtt.getPositionIncrement();
            String termText = termAtt.toString();
            builder.add(new Term("content", termText), j);
        }
        ts.end();
    }
    PhraseQuery q = builder.build();
    ScoreDoc[] hits = searcher.search(q, 1000).scoreDocs;
    int[] ranks = new int[] { 0 };
    compareRanks(hits, ranks);
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream) CharTermAttribute(org.apache.lucene.analysis.tokenattributes.CharTermAttribute) PhraseQuery(org.apache.lucene.search.PhraseQuery) Term(org.apache.lucene.index.Term) PositionIncrementAttribute(org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute) ScoreDoc(org.apache.lucene.search.ScoreDoc)

Example 98 with TokenStream

use of org.apache.lucene.analysis.TokenStream in project lucene-solr by apache.

the class TestDaitchMokotoffSoundexFilterFactory method testSettingInject.

public void testSettingInject() throws Exception {
    Map<String, String> parameters = new HashMap<>();
    parameters.put("inject", "false");
    DaitchMokotoffSoundexFilterFactory factory = new DaitchMokotoffSoundexFilterFactory(parameters);
    Tokenizer inputStream = new MockTokenizer(MockTokenizer.WHITESPACE, false);
    inputStream.setReader(new StringReader("international"));
    TokenStream filteredStream = factory.create(inputStream);
    assertEquals(DaitchMokotoffSoundexFilter.class, filteredStream.getClass());
    assertTokenStreamContents(filteredStream, new String[] { "063963" });
}
Also used : MockTokenizer(org.apache.lucene.analysis.MockTokenizer) TokenStream(org.apache.lucene.analysis.TokenStream) HashMap(java.util.HashMap) StringReader(java.io.StringReader) Tokenizer(org.apache.lucene.analysis.Tokenizer) MockTokenizer(org.apache.lucene.analysis.MockTokenizer)

Example 99 with TokenStream

use of org.apache.lucene.analysis.TokenStream in project lucene-solr by apache.

the class DoubleMetaphoneFilterTest method testAlternateInjectFalse.

public void testAlternateInjectFalse() throws Exception {
    TokenStream stream = whitespaceMockTokenizer("Kuczewski");
    TokenStream filter = new DoubleMetaphoneFilter(stream, 4, false);
    assertTokenStreamContents(filter, new String[] { "KSSK", "KXFS" });
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream)

Example 100 with TokenStream

use of org.apache.lucene.analysis.TokenStream in project lucene-solr by apache.

the class DoubleMetaphoneFilterTest method testNonConvertableStringsWithInject.

public void testNonConvertableStringsWithInject() throws Exception {
    TokenStream stream = whitespaceMockTokenizer("12345 #$%@#^%&");
    TokenStream filter = new DoubleMetaphoneFilter(stream, 8, true);
    assertTokenStreamContents(filter, new String[] { "12345", "#$%@#^%&" });
}
Also used : TokenStream(org.apache.lucene.analysis.TokenStream)

Aggregations

TokenStream (org.apache.lucene.analysis.TokenStream)849 StringReader (java.io.StringReader)337 Tokenizer (org.apache.lucene.analysis.Tokenizer)244 Reader (java.io.Reader)175 CharTermAttribute (org.apache.lucene.analysis.tokenattributes.CharTermAttribute)141 MockTokenizer (org.apache.lucene.analysis.MockTokenizer)128 Analyzer (org.apache.lucene.analysis.Analyzer)121 CannedTokenStream (org.apache.lucene.analysis.CannedTokenStream)94 LowerCaseFilter (org.apache.lucene.analysis.LowerCaseFilter)88 IOException (java.io.IOException)86 StandardFilter (org.apache.lucene.analysis.standard.StandardFilter)73 Term (org.apache.lucene.index.Term)66 Document (org.apache.lucene.document.Document)64 ArrayList (java.util.ArrayList)59 StandardTokenizer (org.apache.lucene.analysis.standard.StandardTokenizer)59 StopFilter (org.apache.lucene.analysis.StopFilter)58 KeywordTokenizer (org.apache.lucene.analysis.core.KeywordTokenizer)57 SetKeywordMarkerFilter (org.apache.lucene.analysis.miscellaneous.SetKeywordMarkerFilter)53 Test (org.junit.Test)53 OffsetAttribute (org.apache.lucene.analysis.tokenattributes.OffsetAttribute)47