Search in sources :

Example 1 with ExtendedWhitespaceTokenizer

use of org.carrot2.text.analysis.ExtendedWhitespaceTokenizer in project lucene-solr by apache.

the class DuplicatingTokenizerFactory method getTokenizer.

@Override
public ITokenizer getTokenizer(LanguageCode language) {
    return new ITokenizer() {

        private final ExtendedWhitespaceTokenizer delegate = new ExtendedWhitespaceTokenizer();

        @Override
        public void setTermBuffer(MutableCharArray buffer) {
            delegate.setTermBuffer(buffer);
            buffer.reset(buffer.toString() + buffer.toString());
        }

        @Override
        public void reset(Reader input) {
            delegate.reset(input);
        }

        @Override
        public short nextToken() throws IOException {
            return delegate.nextToken();
        }
    };
}
Also used : ExtendedWhitespaceTokenizer(org.carrot2.text.analysis.ExtendedWhitespaceTokenizer) ITokenizer(org.carrot2.text.analysis.ITokenizer) MutableCharArray(org.carrot2.text.util.MutableCharArray) Reader(java.io.Reader)

Aggregations

Reader (java.io.Reader)1 ExtendedWhitespaceTokenizer (org.carrot2.text.analysis.ExtendedWhitespaceTokenizer)1 ITokenizer (org.carrot2.text.analysis.ITokenizer)1 MutableCharArray (org.carrot2.text.util.MutableCharArray)1