Search in sources :

Example 1 with Wildcard

use of org.exist.xquery.modules.ngram.query.Wildcard in project exist by eXist-db.

the class NGramSearch method parseQuery.

private EvaluatableExpression parseQuery(final String query) throws XPathException {
    List<String> queryTokens = tokenizeQuery(query);
    LOG.trace("Tokenized query: {}", queryTokens);
    if (queryTokens.isEmpty())
        return new EmptyExpression();
    List<WildcardedExpression> expressions = new ArrayList<>();
    if (queryTokens.get(0).equals("^")) {
        expressions.add(new StartAnchor());
        queryTokens.remove(0);
    }
    if (queryTokens.isEmpty())
        return new EmptyExpression();
    boolean endAnchorPresent = false;
    if (queryTokens.get(queryTokens.size() - 1).equals("$")) {
        endAnchorPresent = true;
        queryTokens.remove(queryTokens.size() - 1);
    }
    if (queryTokens.isEmpty())
        return new EmptyExpression();
    for (String token : queryTokens) {
        if (token.startsWith(".")) {
            Wildcard wildcard = null;
            if (token.length() == 1) {
                wildcard = new Wildcard(1, 1);
            } else {
                String qualifier = token.substring(1);
                switch(qualifier) {
                    case "?":
                        wildcard = new Wildcard(0, 1);
                        break;
                    case "*":
                        wildcard = new Wildcard(0, Integer.MAX_VALUE);
                        break;
                    case "+":
                        wildcard = new Wildcard(1, Integer.MAX_VALUE);
                        break;
                    default:
                        Pattern p = Pattern.compile(INTERVAL_QUALIFIER_PATTERN);
                        Matcher m = p.matcher(qualifier);
                        if (// Should not happen
                        !m.matches())
                            throw new XPathException(this, ErrorCodes.FTDY0020, "query string violates wildcard qualifier syntax");
                        try {
                            wildcard = new Wildcard(Integer.parseInt(m.group(1)), Integer.parseInt(m.group(2)));
                        } catch (NumberFormatException nfe) {
                            throw new XPathException(this, ErrorCodes.FTDY0020, "query string violates wildcard qualifier syntax", new StringValue(query), nfe);
                        }
                        break;
                }
            }
            expressions.add(wildcard);
        } else {
            if (token.startsWith("[")) {
                Set<String> strings = new HashSet<>(token.length() - 2);
                for (int i = 1; i < token.length() - 1; i++) strings.add(Character.toString(token.charAt(i)));
                expressions.add(new AlternativeStrings(this, strings));
            } else {
                expressions.add(new FixedString(this, unescape(token)));
            }
        }
    }
    if (endAnchorPresent)
        expressions.add(new EndAnchor());
    return new WildcardedExpressionSequence(expressions);
}
Also used : Pattern(java.util.regex.Pattern) Matcher(java.util.regex.Matcher) FixedString(org.exist.xquery.modules.ngram.query.FixedString) StartAnchor(org.exist.xquery.modules.ngram.query.StartAnchor) WildcardedExpression(org.exist.xquery.modules.ngram.query.WildcardedExpression) EmptyExpression(org.exist.xquery.modules.ngram.query.EmptyExpression) Wildcard(org.exist.xquery.modules.ngram.query.Wildcard) WildcardedExpressionSequence(org.exist.xquery.modules.ngram.query.WildcardedExpressionSequence) FixedString(org.exist.xquery.modules.ngram.query.FixedString) AlternativeStrings(org.exist.xquery.modules.ngram.query.AlternativeStrings) EndAnchor(org.exist.xquery.modules.ngram.query.EndAnchor)

Example 2 with Wildcard

use of org.exist.xquery.modules.ngram.query.Wildcard in project exist by eXist-db.

the class NGramSearch method processMatches.

private NodeSet processMatches(NGramIndexWorker index, DocumentSet docs, List<QName> qnames, String query, NodeSet nodeSet, int axis) throws XPathException {
    EvaluatableExpression parsedQuery = null;
    if (getLocalName().equals("wildcard-contains"))
        parsedQuery = parseQuery(query);
    else
        parsedQuery = new FixedString(this, query);
    LOG.debug("Parsed Query: {}", parsedQuery);
    NodeSet result = parsedQuery.eval(index, docs, qnames, nodeSet, axis, this.getExpressionId());
    if (getLocalName().startsWith("starts-with"))
        result = NodeSets.getNodesMatchingAtStart(result, getExpressionId());
    else if (getLocalName().startsWith("ends-with"))
        result = NodeSets.getNodesMatchingAtEnd(result, getExpressionId());
    result = NodeSets.transformNodes(result, proxy -> NodeProxies.transformOwnMatches(proxy, Match::filterOutOverlappingOffsets, getExpressionId()));
    return result;
}
Also used : NodeSet(org.exist.dom.persistent.NodeSet) EmptyNodeSet(org.exist.dom.persistent.EmptyNodeSet) Match(org.exist.dom.persistent.Match) EvaluatableExpression(org.exist.xquery.modules.ngram.query.EvaluatableExpression) java.util(java.util) QName(org.exist.dom.QName) NodeProxy(org.exist.dom.persistent.NodeProxy) org.exist.xquery.value(org.exist.xquery.value) Wildcard(org.exist.xquery.modules.ngram.query.Wildcard) NodeSet(org.exist.dom.persistent.NodeSet) EmptyExpression(org.exist.xquery.modules.ngram.query.EmptyExpression) org.exist.xquery(org.exist.xquery) NodeProxies(org.exist.xquery.modules.ngram.utils.NodeProxies) NGramIndex(org.exist.indexing.ngram.NGramIndex) Matcher(java.util.regex.Matcher) NodeSets(org.exist.xquery.modules.ngram.utils.NodeSets) ElementValue(org.exist.storage.ElementValue) Error(org.exist.xquery.util.Error) DocumentSet(org.exist.dom.persistent.DocumentSet) AlternativeStrings(org.exist.xquery.modules.ngram.query.AlternativeStrings) StartAnchor(org.exist.xquery.modules.ngram.query.StartAnchor) EmptyNodeSet(org.exist.dom.persistent.EmptyNodeSet) NGramIndexWorker(org.exist.indexing.ngram.NGramIndexWorker) Logger(org.apache.logging.log4j.Logger) FixedString(org.exist.xquery.modules.ngram.query.FixedString) EndAnchor(org.exist.xquery.modules.ngram.query.EndAnchor) WildcardedExpressionSequence(org.exist.xquery.modules.ngram.query.WildcardedExpressionSequence) Pattern(java.util.regex.Pattern) WildcardedExpression(org.exist.xquery.modules.ngram.query.WildcardedExpression) LogManager(org.apache.logging.log4j.LogManager) EvaluatableExpression(org.exist.xquery.modules.ngram.query.EvaluatableExpression) FixedString(org.exist.xquery.modules.ngram.query.FixedString) Match(org.exist.dom.persistent.Match)

Aggregations

Matcher (java.util.regex.Matcher)2 Pattern (java.util.regex.Pattern)2 AlternativeStrings (org.exist.xquery.modules.ngram.query.AlternativeStrings)2 EmptyExpression (org.exist.xquery.modules.ngram.query.EmptyExpression)2 EndAnchor (org.exist.xquery.modules.ngram.query.EndAnchor)2 FixedString (org.exist.xquery.modules.ngram.query.FixedString)2 StartAnchor (org.exist.xquery.modules.ngram.query.StartAnchor)2 Wildcard (org.exist.xquery.modules.ngram.query.Wildcard)2 WildcardedExpression (org.exist.xquery.modules.ngram.query.WildcardedExpression)2 WildcardedExpressionSequence (org.exist.xquery.modules.ngram.query.WildcardedExpressionSequence)2 java.util (java.util)1 LogManager (org.apache.logging.log4j.LogManager)1 Logger (org.apache.logging.log4j.Logger)1 QName (org.exist.dom.QName)1 DocumentSet (org.exist.dom.persistent.DocumentSet)1 EmptyNodeSet (org.exist.dom.persistent.EmptyNodeSet)1 Match (org.exist.dom.persistent.Match)1 NodeProxy (org.exist.dom.persistent.NodeProxy)1 NodeSet (org.exist.dom.persistent.NodeSet)1 NGramIndex (org.exist.indexing.ngram.NGramIndex)1