Search in sources :

Example 1 with TokenizedCharSequenceStream

use of com.twitter.common.text.token.TokenizedCharSequenceStream in project commons by twitter.

the class TokenGroupAttributeImpl method getTokenGroupStream.

@Override
public TokenGroupStream getTokenGroupStream() {
    //Lazily process the sequence into a set of states, only do it when getTokenGroupStream is called
    if ((attributeClasses == null || states.isEmpty()) && seq != null) {
        TokenizedCharSequenceStream ret = new TokenizedCharSequenceStream();
        ret.reset(seq);
        //TODO(alewis) This could probably be lazier. Make a new extension of TokenGroupStream?
        ImmutableList.Builder<State> builder = ImmutableList.builder();
        while (ret.incrementToken()) {
            builder.add(ret.captureState());
        }
        setAttributeSource(ret);
        setStates(builder.build());
    }
    // lazy initialize tokenGroupStream
    if (tokenGroupStream == null) {
        tokenGroupStream = new TokenGroupStream(attributeClasses);
    }
    tokenGroupStream.setStates(states);
    return tokenGroupStream;
}
Also used : TokenizedCharSequenceStream(com.twitter.common.text.token.TokenizedCharSequenceStream) TokenGroupStream(com.twitter.common.text.token.TokenGroupStream) ImmutableList(com.google.common.collect.ImmutableList) State(org.apache.lucene.util.AttributeSource.State)

Aggregations

ImmutableList (com.google.common.collect.ImmutableList)1 TokenGroupStream (com.twitter.common.text.token.TokenGroupStream)1 TokenizedCharSequenceStream (com.twitter.common.text.token.TokenizedCharSequenceStream)1 State (org.apache.lucene.util.AttributeSource.State)1