Search in sources :

Example 1 with UTF32ToUTF8

use of org.apache.lucene.util.automaton.UTF32ToUTF8 in project lucene-solr by apache.

the class FuzzySuggester method convertAutomaton.

@Override
protected Automaton convertAutomaton(Automaton a) {
    if (unicodeAware) {
        Automaton utf8automaton = new UTF32ToUTF8().convert(a);
        utf8automaton = Operations.determinize(utf8automaton, DEFAULT_MAX_DETERMINIZED_STATES);
        return utf8automaton;
    } else {
        return a;
    }
}
Also used : Automaton(org.apache.lucene.util.automaton.Automaton) TokenStreamToAutomaton(org.apache.lucene.analysis.TokenStreamToAutomaton) UTF32ToUTF8(org.apache.lucene.util.automaton.UTF32ToUTF8)

Example 2 with UTF32ToUTF8

use of org.apache.lucene.util.automaton.UTF32ToUTF8 in project lucene-solr by apache.

the class FuzzyCompletionQuery method createWeight.

@Override
public Weight createWeight(IndexSearcher searcher, boolean needsScores, float boost) throws IOException {
    CompletionTokenStream stream = (CompletionTokenStream) analyzer.tokenStream(getField(), getTerm().text());
    Set<IntsRef> refs = new HashSet<>();
    Automaton automaton = toLevenshteinAutomata(stream.toAutomaton(unicodeAware), refs);
    if (unicodeAware) {
        Automaton utf8automaton = new UTF32ToUTF8().convert(automaton);
        utf8automaton = Operations.determinize(utf8automaton, maxDeterminizedStates);
        automaton = utf8automaton;
    }
    // TODO Better iterate over automaton again inside FuzzyCompletionWeight?
    return new FuzzyCompletionWeight(this, automaton, refs);
}
Also used : Automaton(org.apache.lucene.util.automaton.Automaton) IntsRef(org.apache.lucene.util.IntsRef) HashSet(java.util.HashSet) UTF32ToUTF8(org.apache.lucene.util.automaton.UTF32ToUTF8)

Example 3 with UTF32ToUTF8

use of org.apache.lucene.util.automaton.UTF32ToUTF8 in project elasticsearch by elastic.

the class XFuzzySuggester method convertAutomaton.

@Override
protected Automaton convertAutomaton(Automaton a) {
    if (unicodeAware) {
        // FLORIAN EDIT: get converted Automaton from superclass
        Automaton utf8automaton = new UTF32ToUTF8().convert(super.convertAutomaton(a));
        // This automaton should not blow up during determinize:
        utf8automaton = Operations.determinize(utf8automaton, Integer.MAX_VALUE);
        return utf8automaton;
    } else {
        return super.convertAutomaton(a);
    }
}
Also used : Automaton(org.apache.lucene.util.automaton.Automaton) TokenStreamToAutomaton(org.apache.lucene.analysis.TokenStreamToAutomaton) UTF32ToUTF8(org.apache.lucene.util.automaton.UTF32ToUTF8)

Aggregations

Automaton (org.apache.lucene.util.automaton.Automaton)3 UTF32ToUTF8 (org.apache.lucene.util.automaton.UTF32ToUTF8)3 TokenStreamToAutomaton (org.apache.lucene.analysis.TokenStreamToAutomaton)2 HashSet (java.util.HashSet)1 IntsRef (org.apache.lucene.util.IntsRef)1