Search in sources :

Example 1 with Tuple

use of io.openk9.search.api.query.parser.Tuple in project openk9 by smclab.

the class Grammar method applyAnnotators.

private void applyAnnotators(Map<Tuple, List<Parse>> chart, String[] tokens, int i, int j, long tenantId, Set<String> context) {
    tokens = Arrays.stream(tokens, i, j).toArray(String[]::new);
    Tuple<Integer> chartKey = Tuple.of(i, j);
    for (Annotator annotator : annotators) {
        for (CategorySemantics categorySemantics : annotator.annotate(tenantId, context, tokens)) {
            String category = categorySemantics.getCategory();
            Map<String, Object> semantics = categorySemantics.getSemantics();
            Rule rule = new Rule(category, tokens, Semantic.of(chartKey, semantics));
            chart.computeIfAbsent(chartKey, (k) -> new ArrayList<>()).add(Parse.of(rule, chartKey, tokens));
        }
    }
}
Also used : IntStream(java.util.stream.IntStream) ReactorStopWatch(io.openk9.common.api.reactor.util.ReactorStopWatch) Arrays(java.util.Arrays) Logger(org.slf4j.Logger) Utils(io.openk9.search.query.internal.query.parser.util.Utils) LoggerFactory(org.slf4j.LoggerFactory) Set(java.util.Set) Mono(reactor.core.publisher.Mono) HashMap(java.util.HashMap) Annotator(io.openk9.search.api.query.parser.Annotator) Function(java.util.function.Function) Collectors(java.util.stream.Collectors) CategorySemantics(io.openk9.search.api.query.parser.CategorySemantics) ArrayList(java.util.ArrayList) HashSet(java.util.HashSet) Flux(reactor.core.publisher.Flux) List(java.util.List) Stream(java.util.stream.Stream) Itertools(io.openk9.search.query.internal.query.parser.util.Itertools) Map(java.util.Map) Schedulers(reactor.core.scheduler.Schedulers) Tuple(io.openk9.search.api.query.parser.Tuple) CategorySemantics(io.openk9.search.api.query.parser.CategorySemantics) Annotator(io.openk9.search.api.query.parser.Annotator) ArrayList(java.util.ArrayList)

Example 2 with Tuple

use of io.openk9.search.api.query.parser.Tuple in project openk9 by smclab.

the class Grammar method addRule.

private void addRule(Rule rule) {
    if (rule.containsOptionals()) {
        addRuleContainingOptional(rule);
    } else if (rule.isLexical()) {
        Tuple rhs = rule.getRhsTuple();
        lexicalRules.computeIfAbsent(rhs, (k) -> new ArrayList<>()).add(rule);
    } else if (rule.isUnary()) {
        Tuple rhs = rule.getRhsTuple();
        unaryRules.computeIfAbsent(rhs, (k) -> new ArrayList<>()).add(rule);
    } else if (rule.isBinary()) {
        Tuple rhs = rule.getRhsTuple();
        binaryRules.computeIfAbsent(rhs, (k) -> new ArrayList<>()).add(rule);
    } else if (rule.isCat()) {
        addNAryRule(rule);
    } else {
        throw new RuntimeException(String.format("RHS mixes terminals and non-terminals: %s", rule));
    }
}
Also used : IntStream(java.util.stream.IntStream) ReactorStopWatch(io.openk9.common.api.reactor.util.ReactorStopWatch) Arrays(java.util.Arrays) Logger(org.slf4j.Logger) Utils(io.openk9.search.query.internal.query.parser.util.Utils) LoggerFactory(org.slf4j.LoggerFactory) Set(java.util.Set) Mono(reactor.core.publisher.Mono) HashMap(java.util.HashMap) Annotator(io.openk9.search.api.query.parser.Annotator) Function(java.util.function.Function) Collectors(java.util.stream.Collectors) CategorySemantics(io.openk9.search.api.query.parser.CategorySemantics) ArrayList(java.util.ArrayList) HashSet(java.util.HashSet) Flux(reactor.core.publisher.Flux) List(java.util.List) Stream(java.util.stream.Stream) Itertools(io.openk9.search.query.internal.query.parser.util.Itertools) Map(java.util.Map) Schedulers(reactor.core.scheduler.Schedulers) Tuple(io.openk9.search.api.query.parser.Tuple) ArrayList(java.util.ArrayList) Tuple(io.openk9.search.api.query.parser.Tuple)

Example 3 with Tuple

use of io.openk9.search.api.query.parser.Tuple in project openk9 by smclab.

the class Grammar method applyLexicalRules.

private void applyLexicalRules(Map<Tuple, List<Parse>> chart, String[] tokens, int i, int j) {
    tokens = Arrays.stream(tokens, i, j).toArray(String[]::new);
    Tuple tokenKey = Utils.toTuple(tokens);
    Tuple<Integer> chartKey = Tuple.of(i, j);
    for (Rule rule : lexicalRules.getOrDefault(tokenKey, List.of())) {
        chart.computeIfAbsent(chartKey, (k) -> new ArrayList<>()).add(Parse.of(Rule.of(rule.getLhs(), rule.getRhs(), Semantic.of(chartKey, sems -> rule.getSem().apply(sems).stream().map(maps -> SemanticType.of(chartKey, maps.getValue())).collect(Collectors.collectingAndThen(Collectors.toList(), SemanticTypes::of)))), chartKey, tokens));
    }
}
Also used : IntStream(java.util.stream.IntStream) ReactorStopWatch(io.openk9.common.api.reactor.util.ReactorStopWatch) Arrays(java.util.Arrays) Logger(org.slf4j.Logger) Utils(io.openk9.search.query.internal.query.parser.util.Utils) LoggerFactory(org.slf4j.LoggerFactory) Set(java.util.Set) Mono(reactor.core.publisher.Mono) HashMap(java.util.HashMap) Annotator(io.openk9.search.api.query.parser.Annotator) Function(java.util.function.Function) Collectors(java.util.stream.Collectors) CategorySemantics(io.openk9.search.api.query.parser.CategorySemantics) ArrayList(java.util.ArrayList) HashSet(java.util.HashSet) Flux(reactor.core.publisher.Flux) List(java.util.List) Stream(java.util.stream.Stream) Itertools(io.openk9.search.query.internal.query.parser.util.Itertools) Map(java.util.Map) Schedulers(reactor.core.scheduler.Schedulers) Tuple(io.openk9.search.api.query.parser.Tuple) ArrayList(java.util.ArrayList) Tuple(io.openk9.search.api.query.parser.Tuple)

Example 4 with Tuple

use of io.openk9.search.api.query.parser.Tuple in project openk9 by smclab.

the class BaseAggregatorAnnotator method annotate_.

@Override
public List<CategorySemantics> annotate_(long tenantId, String... tokens) {
    List<String> normalizedKeywords = tenantKeywordsMap.getOrDefault(tenantId, tenantKeywordsMap.get(-1L));
    if (normalizedKeywords == null) {
        return List.of();
    }
    RestHighLevelClient restHighLevelClient = restHighLevelClientProvider.get();
    String token;
    if (tokens.length == 1) {
        token = tokens[0];
    } else {
        token = String.join(" ", tokens);
    }
    BoolQueryBuilder builder = QueryBuilders.boolQuery();
    BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
    for (String keyword : normalizedKeywords) {
        boolQueryBuilder.should(query(keyword, token));
    }
    builder.must(boolQueryBuilder);
    SearchRequest searchRequest;
    if (tenantId == -1) {
        searchRequest = new SearchRequest("*-*-data");
    } else {
        searchRequest = new SearchRequest(tenantId + "-*-data");
    }
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.size(0);
    searchSourceBuilder.query(builder);
    for (String keyword : normalizedKeywords) {
        searchSourceBuilder.aggregation(AggregationBuilders.terms(keyword).field(keyword).size(10));
    }
    searchRequest.source(searchSourceBuilder);
    if (_log.isDebugEnabled()) {
        _log.debug(builder.toString());
    }
    List<Tuple> scoreKeys = new ArrayList<>();
    try {
        SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        for (Aggregation aggregation : search.getAggregations()) {
            Terms terms = (Terms) aggregation;
            for (Terms.Bucket bucket : terms.getBuckets()) {
                String keyAsString = bucket.getKeyAsString();
                if (token.equalsIgnoreCase(keyAsString)) {
                    return List.of(_createCategorySemantics(terms.getName(), keyAsString));
                }
                scoreKeys.add(Tuple.of((Supplier<Double>) () -> _levenshteinDistance(token, keyAsString), keyAsString, terms.getName()));
            }
        }
    } catch (IOException e) {
        _log.error(e.getMessage(), e);
    }
    if (scoreKeys.isEmpty()) {
        return List.of();
    }
    scoreKeys.sort(Collections.reverseOrder(Comparator.comparingDouble(t -> ((Supplier<Double>) t.get(0)).get())));
    String key = (String) scoreKeys.get(0).get(1);
    String name = (String) scoreKeys.get(0).get(2);
    return List.of(_createCategorySemantics(name, key));
}
Also used : SearchRequest(org.elasticsearch.action.search.SearchRequest) ArrayList(java.util.ArrayList) Terms(org.elasticsearch.search.aggregations.bucket.terms.Terms) RestHighLevelClient(org.elasticsearch.client.RestHighLevelClient) IOException(java.io.IOException) SearchSourceBuilder(org.elasticsearch.search.builder.SearchSourceBuilder) SearchResponse(org.elasticsearch.action.search.SearchResponse) Aggregation(org.elasticsearch.search.aggregations.Aggregation) BoolQueryBuilder(org.elasticsearch.index.query.BoolQueryBuilder) Supplier(java.util.function.Supplier) Tuple(io.openk9.search.api.query.parser.Tuple)

Aggregations

Tuple (io.openk9.search.api.query.parser.Tuple)4 ArrayList (java.util.ArrayList)4 ReactorStopWatch (io.openk9.common.api.reactor.util.ReactorStopWatch)3 Annotator (io.openk9.search.api.query.parser.Annotator)3 CategorySemantics (io.openk9.search.api.query.parser.CategorySemantics)3 Itertools (io.openk9.search.query.internal.query.parser.util.Itertools)3 Utils (io.openk9.search.query.internal.query.parser.util.Utils)3 Arrays (java.util.Arrays)3 HashMap (java.util.HashMap)3 HashSet (java.util.HashSet)3 List (java.util.List)3 Map (java.util.Map)3 Set (java.util.Set)3 Function (java.util.function.Function)3 Collectors (java.util.stream.Collectors)3 IntStream (java.util.stream.IntStream)3 Stream (java.util.stream.Stream)3 Logger (org.slf4j.Logger)3 LoggerFactory (org.slf4j.LoggerFactory)3 Flux (reactor.core.publisher.Flux)3