Search in sources :

Example 1 with DefaultStreamTokenizer

use of org.deeplearning4j.text.tokenization.tokenizer.DefaultStreamTokenizer in project deeplearning4j by deeplearning4j.

the class DefaultTokenizerFactory method create.

@Override
public Tokenizer create(InputStream toTokenize) {
    Tokenizer t = new DefaultStreamTokenizer(toTokenize);
    t.setTokenPreProcessor(tokenPreProcess);
    return t;
}
Also used : DefaultStreamTokenizer(org.deeplearning4j.text.tokenization.tokenizer.DefaultStreamTokenizer) DefaultTokenizer(org.deeplearning4j.text.tokenization.tokenizer.DefaultTokenizer) DefaultStreamTokenizer(org.deeplearning4j.text.tokenization.tokenizer.DefaultStreamTokenizer) Tokenizer(org.deeplearning4j.text.tokenization.tokenizer.Tokenizer)

Example 2 with DefaultStreamTokenizer

use of org.deeplearning4j.text.tokenization.tokenizer.DefaultStreamTokenizer in project deeplearning4j by deeplearning4j.

the class Windows method windows.

/**
     * Constructs a list of window of size windowSize.
     * Note that padding for each window is created as well.
     * @param words the words to tokenize and construct windows from
     * @param windowSize the window size to generate
     * @return the list of windows for the tokenized string
     */
public static List<Window> windows(InputStream words, int windowSize) {
    Tokenizer tokenizer = new DefaultStreamTokenizer(words);
    List<String> list = new ArrayList<>();
    while (tokenizer.hasMoreTokens()) list.add(tokenizer.nextToken());
    return windows(list, windowSize);
}
Also used : DefaultStreamTokenizer(org.deeplearning4j.text.tokenization.tokenizer.DefaultStreamTokenizer) ArrayList(java.util.ArrayList) StringTokenizer(java.util.StringTokenizer) DefaultStreamTokenizer(org.deeplearning4j.text.tokenization.tokenizer.DefaultStreamTokenizer) Tokenizer(org.deeplearning4j.text.tokenization.tokenizer.Tokenizer)

Aggregations

DefaultStreamTokenizer (org.deeplearning4j.text.tokenization.tokenizer.DefaultStreamTokenizer)2 Tokenizer (org.deeplearning4j.text.tokenization.tokenizer.Tokenizer)2 ArrayList (java.util.ArrayList)1 StringTokenizer (java.util.StringTokenizer)1 DefaultTokenizer (org.deeplearning4j.text.tokenization.tokenizer.DefaultTokenizer)1