use of org.deeplearning4j.text.tokenization.tokenizer.DefaultStreamTokenizer in project deeplearning4j by deeplearning4j.
the class DefaultTokenizerFactory method create.
@Override
public Tokenizer create(InputStream toTokenize) {
Tokenizer t = new DefaultStreamTokenizer(toTokenize);
t.setTokenPreProcessor(tokenPreProcess);
return t;
}
use of org.deeplearning4j.text.tokenization.tokenizer.DefaultStreamTokenizer in project deeplearning4j by deeplearning4j.
the class Windows method windows.
/**
* Constructs a list of window of size windowSize.
* Note that padding for each window is created as well.
* @param words the words to tokenize and construct windows from
* @param windowSize the window size to generate
* @return the list of windows for the tokenized string
*/
public static List<Window> windows(InputStream words, int windowSize) {
Tokenizer tokenizer = new DefaultStreamTokenizer(words);
List<String> list = new ArrayList<>();
while (tokenizer.hasMoreTokens()) list.add(tokenizer.nextToken());
return windows(list, windowSize);
}
Aggregations