Search in sources :

Example 66 with Token

use of org.apache.lucene.analysis.Token in project lucene-solr by apache.

the class TestFlattenGraphFilter method token.

private static Token token(String term, int posInc, int posLength, int startOffset, int endOffset) {
    final Token t = new Token(term, startOffset, endOffset);
    t.setPositionIncrement(posInc);
    t.setPositionLength(posLength);
    return t;
}
Also used : Token(org.apache.lucene.analysis.Token)

Example 67 with Token

use of org.apache.lucene.analysis.Token in project lucene-solr by apache.

the class PrefixAwareTokenFilter method incrementToken.

@Override
public final boolean incrementToken() throws IOException {
    if (!prefixExhausted) {
        Token nextToken = getNextPrefixInputToken(reusableToken);
        if (nextToken == null) {
            prefixExhausted = true;
        } else {
            previousPrefixToken.reinit(nextToken);
            // Make it a deep copy
            BytesRef p = previousPrefixToken.getPayload();
            if (p != null) {
                previousPrefixToken.setPayload(p.clone());
            }
            setCurrentToken(nextToken);
            return true;
        }
    }
    Token nextToken = getNextSuffixInputToken(reusableToken);
    if (nextToken == null) {
        return false;
    }
    nextToken = updateSuffixToken(nextToken, previousPrefixToken);
    setCurrentToken(nextToken);
    return true;
}
Also used : Token(org.apache.lucene.analysis.Token) BytesRef(org.apache.lucene.util.BytesRef)

Example 68 with Token

use of org.apache.lucene.analysis.Token in project lucene-solr by apache.

the class ShingleFilterTest method testTwoTrailingHolesTriShingle.

public void testTwoTrailingHolesTriShingle() throws IOException {
    // Analyzing "purple wizard of the", where of and the are removed as a
    // stopwords, leaving two trailing holes:
    Token[] inputTokens = new Token[] { createToken("purple", 0, 6), createToken("wizard", 7, 13) };
    ShingleFilter filter = new ShingleFilter(new CannedTokenStream(2, 20, inputTokens), 2, 3);
    assertTokenStreamContents(filter, new String[] { "purple", "purple wizard", "purple wizard _", "wizard", "wizard _", "wizard _ _" }, new int[] { 0, 0, 0, 7, 7, 7 }, new int[] { 6, 13, 20, 13, 20, 20 }, new int[] { 1, 0, 0, 1, 0, 0 }, 20);
}
Also used : Token(org.apache.lucene.analysis.Token) CannedTokenStream(org.apache.lucene.analysis.CannedTokenStream)

Example 69 with Token

use of org.apache.lucene.analysis.Token in project lucene-solr by apache.

the class ShingleFilterTest method testTwoTrailingHoles.

public void testTwoTrailingHoles() throws IOException {
    // Analyzing "purple wizard of the", where of and the are removed as a
    // stopwords, leaving two trailing holes:
    Token[] inputTokens = new Token[] { createToken("purple", 0, 6), createToken("wizard", 7, 13) };
    ShingleFilter filter = new ShingleFilter(new CannedTokenStream(2, 20, inputTokens), 2, 2);
    assertTokenStreamContents(filter, new String[] { "purple", "purple wizard", "wizard", "wizard _" }, new int[] { 0, 0, 7, 7 }, new int[] { 6, 13, 13, 20 }, new int[] { 1, 0, 1, 0 }, 20);
}
Also used : Token(org.apache.lucene.analysis.Token) CannedTokenStream(org.apache.lucene.analysis.CannedTokenStream)

Example 70 with Token

use of org.apache.lucene.analysis.Token in project lucene-solr by apache.

the class ShingleFilterTest method testTrailingHole2.

public void testTrailingHole2() throws IOException {
    // Analyzing "purple wizard of", where of is removed as a
    // stopword leaving a trailing hole:
    Token[] inputTokens = new Token[] { createToken("purple", 0, 6), createToken("wizard", 7, 13) };
    ShingleFilter filter = new ShingleFilter(new CannedTokenStream(1, 16, inputTokens), 2, 2);
    assertTokenStreamContents(filter, new String[] { "purple", "purple wizard", "wizard", "wizard _" }, new int[] { 0, 0, 7, 7 }, new int[] { 6, 13, 13, 16 }, new int[] { 1, 0, 1, 0 }, 16);
}
Also used : Token(org.apache.lucene.analysis.Token) CannedTokenStream(org.apache.lucene.analysis.CannedTokenStream)

Aggregations

Token (org.apache.lucene.analysis.Token)100 CannedTokenStream (org.apache.lucene.analysis.CannedTokenStream)39 TokenStream (org.apache.lucene.analysis.TokenStream)31 Directory (org.apache.lucene.store.Directory)24 Test (org.junit.Test)23 Document (org.apache.lucene.document.Document)19 TextField (org.apache.lucene.document.TextField)19 BytesRef (org.apache.lucene.util.BytesRef)16 NamedList (org.apache.solr.common.util.NamedList)16 StringReader (java.io.StringReader)15 CharTermAttribute (org.apache.lucene.analysis.tokenattributes.CharTermAttribute)15 Analyzer (org.apache.lucene.analysis.Analyzer)14 ArrayList (java.util.ArrayList)13 Map (java.util.Map)13 Field (org.apache.lucene.document.Field)13 FieldType (org.apache.lucene.document.FieldType)11 IndexReader (org.apache.lucene.index.IndexReader)11 MockTokenizer (org.apache.lucene.analysis.MockTokenizer)10 Tokenizer (org.apache.lucene.analysis.Tokenizer)9 Date (java.util.Date)8