Search in sources :

Example 36 with SimpleLinguistics

use of com.yahoo.language.simple.SimpleLinguistics in project vespa by vespa-engine.

the class TokenizerTestCase method testSpecialTokenConfigurationDefault.

@Test
public void testSpecialTokenConfigurationDefault() {
    String tokenFile = "file:src/test/java/com/yahoo/prelude/query/parser/test/specialtokens.cfg";
    SpecialTokenRegistry r = new SpecialTokenRegistry(tokenFile);
    assertEquals("Special tokens configured", 6, r.getSpecialTokens("default").size());
    assertEquals("Special tokens configured", 4, r.getSpecialTokens("other").size());
    Tokenizer tokenizer = new Tokenizer(new SimpleLinguistics());
    tokenizer.setSpecialTokens(r.getSpecialTokens("default"));
    List<?> tokens = tokenizer.tokenize("with space, c++ or .... know, not b.s.d.");
    assertEquals(new Token(WORD, "with space"), tokens.get(0));
    assertEquals(new Token(COMMA, ","), tokens.get(1));
    assertEquals(new Token(SPACE, " "), tokens.get(2));
    assertEquals(new Token(WORD, "c++"), tokens.get(3));
    assertEquals(new Token(SPACE, " "), tokens.get(4));
    assertEquals(new Token(WORD, "or"), tokens.get(5));
    assertEquals(new Token(SPACE, " "), tokens.get(6));
    assertEquals(new Token(WORD, "...."), tokens.get(7));
    assertEquals(new Token(SPACE, " "), tokens.get(8));
    assertEquals(new Token(WORD, "know"), tokens.get(9));
    assertEquals(new Token(COMMA, ","), tokens.get(10));
    assertEquals(new Token(SPACE, " "), tokens.get(11));
    assertEquals(new Token(WORD, "not"), tokens.get(12));
    assertEquals(new Token(SPACE, " "), tokens.get(13));
    assertEquals(new Token(WORD, "b.s.d."), tokens.get(14));
}
Also used : SimpleLinguistics(com.yahoo.language.simple.SimpleLinguistics) SpecialTokenRegistry(com.yahoo.prelude.query.parser.SpecialTokenRegistry) Token(com.yahoo.prelude.query.parser.Token) Tokenizer(com.yahoo.prelude.query.parser.Tokenizer) Test(org.junit.Test)

Example 37 with SimpleLinguistics

use of com.yahoo.language.simple.SimpleLinguistics in project vespa by vespa-engine.

the class TokenizerTestCase method testExactMatchTokenizationTerminatorTerminatesQuery.

@Test
public void testExactMatchTokenizationTerminatorTerminatesQuery() {
    Index index1 = new Index("testexact1");
    index1.setExact(true, null);
    Index index2 = new Index("testexact2");
    index2.setExact(true, "()/aa*::*&");
    IndexFacts facts = new IndexFacts();
    facts.addIndex("testsd", index1);
    facts.addIndex("testsd", index2);
    Tokenizer tokenizer = new Tokenizer(new SimpleLinguistics());
    IndexFacts.Session session = facts.newSession(Collections.emptySet(), Collections.emptySet());
    List<?> tokens = tokenizer.tokenize("normal a:b (normal testexact1:/,%#%&+-+ ) testexact2:ho_/&%&/()/aa*::*&", session);
    assertEquals(new Token(WORD, "normal"), tokens.get(0));
    assertEquals(new Token(SPACE, " "), tokens.get(1));
    assertEquals(new Token(WORD, "a"), tokens.get(2));
    assertEquals(new Token(COLON, ":"), tokens.get(3));
    assertEquals(new Token(WORD, "b"), tokens.get(4));
    assertEquals(new Token(SPACE, " "), tokens.get(5));
    assertEquals(new Token(LBRACE, "("), tokens.get(6));
    assertEquals(new Token(WORD, "normal"), tokens.get(7));
    assertEquals(new Token(SPACE, " "), tokens.get(8));
    assertEquals(new Token(WORD, "testexact1"), tokens.get(9));
    assertEquals(new Token(COLON, ":"), tokens.get(10));
    assertEquals(new Token(WORD, "/,%#%&+-+"), tokens.get(11));
    assertEquals(new Token(SPACE, " "), tokens.get(12));
    assertEquals(new Token(RBRACE, ")"), tokens.get(13));
    assertEquals(new Token(SPACE, " "), tokens.get(14));
    assertEquals(new Token(WORD, "testexact2"), tokens.get(15));
    assertEquals(new Token(COLON, ":"), tokens.get(16));
    assertEquals(new Token(WORD, "ho_/&%&/"), tokens.get(17));
    assertTrue(((Token) tokens.get(17)).isSpecial());
}
Also used : SimpleLinguistics(com.yahoo.language.simple.SimpleLinguistics) IndexFacts(com.yahoo.prelude.IndexFacts) Index(com.yahoo.prelude.Index) Token(com.yahoo.prelude.query.parser.Token) Tokenizer(com.yahoo.prelude.query.parser.Tokenizer) Test(org.junit.Test)

Example 38 with SimpleLinguistics

use of com.yahoo.language.simple.SimpleLinguistics in project vespa by vespa-engine.

the class TokenizerTestCase method testSpecialTokenConfigurationMissing.

@Test
public void testSpecialTokenConfigurationMissing() {
    String tokenFile = "file:source/bogus/specialtokens.cfg";
    SpecialTokenRegistry r = new SpecialTokenRegistry(tokenFile);
    Tokenizer tokenizer = new Tokenizer(new SimpleLinguistics());
    tokenizer.setSpecialTokens(r.getSpecialTokens("other"));
    List<?> tokens = tokenizer.tokenize("c++");
    assertEquals(new Token(WORD, "c"), tokens.get(0));
    assertEquals(new Token(PLUS, "+"), tokens.get(1));
    assertEquals(new Token(PLUS, "+"), tokens.get(2));
}
Also used : SimpleLinguistics(com.yahoo.language.simple.SimpleLinguistics) SpecialTokenRegistry(com.yahoo.prelude.query.parser.SpecialTokenRegistry) Token(com.yahoo.prelude.query.parser.Token) Tokenizer(com.yahoo.prelude.query.parser.Tokenizer) Test(org.junit.Test)

Example 39 with SimpleLinguistics

use of com.yahoo.language.simple.SimpleLinguistics in project vespa by vespa-engine.

the class TokenizerTestCase method testSpecialTokenNonMatch.

@Test
public void testSpecialTokenNonMatch() {
    Tokenizer tokenizer = new Tokenizer(new SimpleLinguistics());
    tokenizer.setSpecialTokens(createSpecialTokens());
    List<?> tokens = tokenizer.tokenize("c++ c+ aS/400 i/o .net i/ooo ap.net");
    assertEquals(new Token(WORD, "c++"), tokens.get(0));
    assertEquals(new Token(SPACE, " "), tokens.get(1));
    assertEquals(new Token(WORD, "c+"), tokens.get(2));
    assertEquals(new Token(SPACE, " "), tokens.get(3));
    assertEquals(new Token(WORD, "as/400"), tokens.get(4));
    assertEquals(new Token(SPACE, " "), tokens.get(5));
    assertEquals(new Token(WORD, "i/o"), tokens.get(6));
    assertEquals(new Token(SPACE, " "), tokens.get(7));
    assertEquals(new Token(WORD, ".net"), tokens.get(8));
    assertEquals(new Token(SPACE, " "), tokens.get(9));
    assertEquals(new Token(WORD, "i"), tokens.get(10));
    assertEquals(new Token(NOISE, "<NOISE>"), tokens.get(11));
    assertEquals(new Token(WORD, "ooo"), tokens.get(12));
    assertEquals(new Token(SPACE, " "), tokens.get(13));
    assertEquals(new Token(WORD, "ap"), tokens.get(14));
    assertEquals(new Token(WORD, ".net"), tokens.get(15));
}
Also used : SimpleLinguistics(com.yahoo.language.simple.SimpleLinguistics) Token(com.yahoo.prelude.query.parser.Token) Tokenizer(com.yahoo.prelude.query.parser.Tokenizer) Test(org.junit.Test)

Example 40 with SimpleLinguistics

use of com.yahoo.language.simple.SimpleLinguistics in project vespa by vespa-engine.

the class TokenizerTestCase method testExactMatchTokenization.

@Test
public void testExactMatchTokenization() {
    Index index1 = new Index("testexact1");
    index1.setExact(true, null);
    Index index2 = new Index("testexact2");
    index2.setExact(true, "()/aa*::*&");
    IndexFacts facts = new IndexFacts();
    facts.addIndex("testsd", index1);
    facts.addIndex("testsd", index2);
    IndexFacts.Session session = facts.newSession(Collections.emptySet(), Collections.emptySet());
    Tokenizer tokenizer = new Tokenizer(new SimpleLinguistics());
    List<?> tokens = tokenizer.tokenize("normal a:b (normal testexact1:/,%#%&+-+ ) testexact2:ho_/&%&/()/aa*::*& b:c", "default", session);
    // tokenizer.print();
    assertEquals(new Token(WORD, "normal"), tokens.get(0));
    assertEquals(new Token(SPACE, " "), tokens.get(1));
    assertEquals(new Token(WORD, "a"), tokens.get(2));
    assertEquals(new Token(COLON, ":"), tokens.get(3));
    assertEquals(new Token(WORD, "b"), tokens.get(4));
    assertEquals(new Token(SPACE, " "), tokens.get(5));
    assertEquals(new Token(LBRACE, "("), tokens.get(6));
    assertEquals(new Token(WORD, "normal"), tokens.get(7));
    assertEquals(new Token(SPACE, " "), tokens.get(8));
    assertEquals(new Token(WORD, "testexact1"), tokens.get(9));
    assertEquals(new Token(COLON, ":"), tokens.get(10));
    assertEquals(new Token(WORD, "/,%#%&+-+"), tokens.get(11));
    assertEquals(new Token(SPACE, " "), tokens.get(12));
    assertEquals(new Token(RBRACE, ")"), tokens.get(13));
    assertEquals(new Token(SPACE, " "), tokens.get(14));
    assertEquals(new Token(WORD, "testexact2"), tokens.get(15));
    assertEquals(new Token(COLON, ":"), tokens.get(16));
    assertEquals(new Token(WORD, "ho_/&%&/"), tokens.get(17));
    assertEquals(new Token(SPACE, " "), tokens.get(18));
    assertEquals(new Token(WORD, "b"), tokens.get(19));
    assertEquals(new Token(COLON, ":"), tokens.get(20));
    assertEquals(new Token(WORD, "c"), tokens.get(21));
    assertTrue(((Token) tokens.get(11)).isSpecial());
    assertFalse(((Token) tokens.get(15)).isSpecial());
    assertTrue(((Token) tokens.get(17)).isSpecial());
}
Also used : SimpleLinguistics(com.yahoo.language.simple.SimpleLinguistics) IndexFacts(com.yahoo.prelude.IndexFacts) Index(com.yahoo.prelude.Index) Token(com.yahoo.prelude.query.parser.Token) Tokenizer(com.yahoo.prelude.query.parser.Tokenizer) Test(org.junit.Test)

Aggregations

SimpleLinguistics (com.yahoo.language.simple.SimpleLinguistics)42 Test (org.junit.Test)37 Token (com.yahoo.prelude.query.parser.Token)17 Tokenizer (com.yahoo.prelude.query.parser.Tokenizer)17 Linguistics (com.yahoo.language.Linguistics)10 Index (com.yahoo.prelude.Index)7 IndexFacts (com.yahoo.prelude.IndexFacts)7 StringFieldValue (com.yahoo.document.datatypes.StringFieldValue)6 AnnotatorConfig (com.yahoo.vespa.indexinglanguage.linguistics.AnnotatorConfig)5 SpecialTokenRegistry (com.yahoo.prelude.query.parser.SpecialTokenRegistry)3 Query (com.yahoo.search.Query)3 Execution (com.yahoo.search.searchchain.Execution)3 SimpleTestAdapter (com.yahoo.vespa.indexinglanguage.SimpleTestAdapter)3 InputExpression (com.yahoo.vespa.indexinglanguage.expressions.InputExpression)3 Pair (com.yahoo.collections.Pair)2 FieldValue (com.yahoo.document.datatypes.FieldValue)2 IntegerFieldValue (com.yahoo.document.datatypes.IntegerFieldValue)2 RendererRegistry (com.yahoo.search.rendering.RendererRegistry)2 ArithmeticExpression (com.yahoo.vespa.indexinglanguage.expressions.ArithmeticExpression)2 AttributeExpression (com.yahoo.vespa.indexinglanguage.expressions.AttributeExpression)2