Search in sources :

Example 81 with StandardTokenizer

use of org.apache.lucene.analysis.standard.StandardTokenizer in project nutch by apache.

the class LuceneTokenizer method generateTokenStreamFromText.

private TokenStream generateTokenStreamFromText(String content, TokenizerType tokenizerType) {
    Tokenizer tokenizer = null;
    switch(tokenizerType) {
        case CLASSIC:
            tokenizer = new ClassicTokenizer();
            break;
        case STANDARD:
        default:
            tokenizer = new StandardTokenizer();
    }
    tokenizer.setReader(new StringReader(content));
    tokenStream = tokenizer;
    return tokenStream;
}
Also used : StandardTokenizer(org.apache.lucene.analysis.standard.StandardTokenizer) StringReader(java.io.StringReader) ClassicTokenizer(org.apache.lucene.analysis.standard.ClassicTokenizer) Tokenizer(org.apache.lucene.analysis.Tokenizer) StandardTokenizer(org.apache.lucene.analysis.standard.StandardTokenizer) ClassicTokenizer(org.apache.lucene.analysis.standard.ClassicTokenizer)

Example 82 with StandardTokenizer

use of org.apache.lucene.analysis.standard.StandardTokenizer in project omegat by omegat-org.

the class BaseTokenizer method getStandardTokenStream.

/**
 * Minimal implementation that returns the default implementation
 * corresponding to all false parameters. Subclasses should override this to
 * handle true parameters.
 */
protected TokenStream getStandardTokenStream(String strOrig) throws IOException {
    StandardTokenizer tokenizer = new StandardTokenizer();
    tokenizer.setReader(new StringReader(strOrig));
    return tokenizer;
}
Also used : StandardTokenizer(org.apache.lucene.analysis.standard.StandardTokenizer) StringReader(java.io.StringReader)

Aggregations

StandardTokenizer (org.apache.lucene.analysis.standard.StandardTokenizer)82 Tokenizer (org.apache.lucene.analysis.Tokenizer)68 TokenStream (org.apache.lucene.analysis.TokenStream)57 LowerCaseFilter (org.apache.lucene.analysis.LowerCaseFilter)43 StopFilter (org.apache.lucene.analysis.StopFilter)43 StandardFilter (org.apache.lucene.analysis.standard.StandardFilter)36 SetKeywordMarkerFilter (org.apache.lucene.analysis.miscellaneous.SetKeywordMarkerFilter)35 StringReader (java.io.StringReader)18 SnowballFilter (org.apache.lucene.analysis.snowball.SnowballFilter)16 Analyzer (org.apache.lucene.analysis.Analyzer)10 LowerCaseFilter (org.apache.lucene.analysis.core.LowerCaseFilter)10 ASCIIFoldingFilter (org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter)7 EnglishPossessiveFilter (org.apache.lucene.analysis.en.EnglishPossessiveFilter)6 ElisionFilter (org.apache.lucene.analysis.util.ElisionFilter)6 DecimalDigitFilter (org.apache.lucene.analysis.core.DecimalDigitFilter)5 StopFilter (org.apache.lucene.analysis.core.StopFilter)5 ESTestCase (org.elasticsearch.test.ESTestCase)5 HashMap (java.util.HashMap)4 TokenFilter (org.apache.lucene.analysis.TokenFilter)4 PorterStemFilter (org.apache.lucene.analysis.en.PorterStemFilter)4