Search in sources :

Example 1 with Linguistics

use of com.yahoo.language.Linguistics in project vespa by vespa-engine.

the class LinguisticsAnnotatorTestCase method newLinguistics.

private static Linguistics newLinguistics(List<? extends Token> tokens, Map<String, String> replacementTerms) {
    Linguistics linguistics = Mockito.mock(Linguistics.class);
    Mockito.when(linguistics.getTokenizer()).thenReturn(new MyTokenizer(tokens, replacementTerms));
    return linguistics;
}
Also used : Linguistics(com.yahoo.language.Linguistics) SimpleLinguistics(com.yahoo.language.simple.SimpleLinguistics)

Example 2 with Linguistics

use of com.yahoo.language.Linguistics in project vespa by vespa-engine.

the class LinguisticsAnnotatorTestCase method requireThatExistingAnnotationsAreKept.

@Test
public void requireThatExistingAnnotationsAreKept() {
    SpanTree spanTree = new SpanTree(SpanTrees.LINGUISTICS);
    spanTree.spanList().span(0, 3).annotate(new Annotation(AnnotationTypes.TERM, new StringFieldValue("baz")));
    StringFieldValue val = new StringFieldValue("foo");
    val.setSpanTree(spanTree);
    Linguistics linguistics = newLinguistics(Arrays.asList(newToken("foo", "bar", TokenType.ALPHABETIC, false)), Collections.<String, String>emptyMap());
    new LinguisticsAnnotator(linguistics, CONFIG).annotate(val);
    assertTrue(new LinguisticsAnnotator(linguistics, CONFIG).annotate(val));
    assertEquals(spanTree, val.getSpanTree(SpanTrees.LINGUISTICS));
}
Also used : StringFieldValue(com.yahoo.document.datatypes.StringFieldValue) Linguistics(com.yahoo.language.Linguistics) SimpleLinguistics(com.yahoo.language.simple.SimpleLinguistics) Annotation(com.yahoo.document.annotation.Annotation) SpanTree(com.yahoo.document.annotation.SpanTree) Test(org.junit.Test)

Example 3 with Linguistics

use of com.yahoo.language.Linguistics in project vespa by vespa-engine.

the class NGramTestCase method requireThatAccessorsWork.

@Test
public void requireThatAccessorsWork() {
    Linguistics linguistics = new SimpleLinguistics();
    NGramExpression exp = new NGramExpression(linguistics, 69);
    assertSame(linguistics, exp.getLinguistics());
    assertEquals(69, exp.getGramSize());
}
Also used : SimpleLinguistics(com.yahoo.language.simple.SimpleLinguistics) Linguistics(com.yahoo.language.Linguistics) SimpleLinguistics(com.yahoo.language.simple.SimpleLinguistics) Test(org.junit.Test)

Example 4 with Linguistics

use of com.yahoo.language.Linguistics in project vespa by vespa-engine.

the class Model method getParsingLanguage.

/**
 * Gets the language to use for parsing. If this is explicitly set in the model, that language is returned.
 * Otherwise, if a query tree is already produced and any node in it specifies a language the first such
 * node encountered in a depth first
 * left to right search is returned. Otherwise the language is guessed from the query string.
 * If this does not yield an actual language, English is returned as the default.
 *
 * @return the language determined, never null
 */
// TODO: We can support multiple languages per query by changing searchers which call this
// to look up the query to use at each point from item.getLanguage
// with this as fallback for query branches where no parent item specifies language
public Language getParsingLanguage(String languageDetectionText) {
    Language language = getLanguage();
    if (language != null)
        return language;
    language = Language.fromEncoding(encoding);
    if (language != Language.UNKNOWN)
        return language;
    if (queryTree != null)
        language = languageBelow(queryTree);
    if (language != Language.UNKNOWN)
        return language;
    Linguistics linguistics = execution.context().getLinguistics();
    if (linguistics != null)
        // TODO: Set language if detected
        language = linguistics.getDetector().detect(languageDetectionText, null).getLanguage();
    if (language != Language.UNKNOWN)
        return language;
    return Language.ENGLISH;
}
Also used : Language(com.yahoo.language.Language) Linguistics(com.yahoo.language.Linguistics)

Example 5 with Linguistics

use of com.yahoo.language.Linguistics in project vespa by vespa-engine.

the class QueryTestCase method testSimpleFunctionality.

@Test
public void testSimpleFunctionality() {
    Query q = new Query(QueryTestCase.httpEncode("/sdfsd.html?query=this is a simple query&aParameter"));
    assertEquals("this is a simple query", q.getModel().getQueryString());
    assertNotNull(q.getModel().getQueryTree());
    assertNull(q.getModel().getDefaultIndex());
    assertEquals("", q.properties().get("aParameter"));
    assertNull(q.properties().get("notSetParameter"));
    Query query = q;
    String body = "a bb. ccc??!";
    Linguistics linguistics = new SimpleLinguistics();
    AndItem and = new AndItem();
    for (Token token : linguistics.getTokenizer().tokenize(body, Language.ENGLISH, StemMode.SHORTEST, true)) {
        if (token.isIndexable())
            and.addItem(new WordItem(token.getTokenString(), "body"));
    }
    query.getModel().getQueryTree().setRoot(and);
    System.out.println(query);
}
Also used : SimpleLinguistics(com.yahoo.language.simple.SimpleLinguistics) Query(com.yahoo.search.Query) AndItem(com.yahoo.prelude.query.AndItem) Linguistics(com.yahoo.language.Linguistics) SimpleLinguistics(com.yahoo.language.simple.SimpleLinguistics) Token(com.yahoo.language.process.Token) CoreMatchers.containsString(org.hamcrest.CoreMatchers.containsString) WordItem(com.yahoo.prelude.query.WordItem) Test(org.junit.Test)

Aggregations

Linguistics (com.yahoo.language.Linguistics)12 SimpleLinguistics (com.yahoo.language.simple.SimpleLinguistics)11 Test (org.junit.Test)9 Annotation (com.yahoo.document.annotation.Annotation)2 SpanTree (com.yahoo.document.annotation.SpanTree)2 StringFieldValue (com.yahoo.document.datatypes.StringFieldValue)2 Query (com.yahoo.search.Query)2 Language (com.yahoo.language.Language)1 Token (com.yahoo.language.process.Token)1 AndItem (com.yahoo.prelude.query.AndItem)1 WordItem (com.yahoo.prelude.query.WordItem)1 AnnotatorConfig (com.yahoo.vespa.indexinglanguage.linguistics.AnnotatorConfig)1 CoreMatchers.containsString (org.hamcrest.CoreMatchers.containsString)1