Examples with CharacterIterator - java.text.CharacterIterator

Example 26 with CharacterIterator

use of java.text.CharacterIterator in project jdk8u_jdk by JetBrains.

the class DictionaryBasedBreakIterator method following.

/**
     * Sets the current iteration position to the first boundary position after
     * the specified position.
     * @param offset The position to begin searching forward from
     * @return The position of the first boundary after "offset"
     */
@Override
public int following(int offset) {
    CharacterIterator text = getText();
    checkOffset(offset, text);
    // class that may refresh the cache.
    if (cachedBreakPositions == null || offset < cachedBreakPositions[0] || offset >= cachedBreakPositions[cachedBreakPositions.length - 1]) {
        cachedBreakPositions = null;
        return super.following(offset);
    } else // on the other hand, if "offset" is within the range covered by the
    // cache, then just search the cache for the first break position
    // after "offset"
    {
        positionInCache = 0;
        while (positionInCache < cachedBreakPositions.length && offset >= cachedBreakPositions[positionInCache]) {
            ++positionInCache;
        }
        text.setIndex(cachedBreakPositions[positionInCache]);
        return text.getIndex();
    }
}

Also used : CharacterIterator(java.text.CharacterIterator)

Example 27 with CharacterIterator

use of java.text.CharacterIterator in project jdk8u_jdk by JetBrains.

the class DictionaryBasedBreakIterator method divideUpDictionaryRange.

/**
     * This is the function that actually implements the dictionary-based
     * algorithm.  Given the endpoints of a range of text, it uses the
     * dictionary to determine the positions of any boundaries in this
     * range.  It stores all the boundary positions it discovers in
     * cachedBreakPositions so that we only have to do this work once
     * for each time we enter the range.
     */
@SuppressWarnings("unchecked")
private void divideUpDictionaryRange(int startPos, int endPos) {
    CharacterIterator text = getText();
    // the range we're dividing may begin or end with non-dictionary characters
    // (i.e., for line breaking, we may have leading or trailing punctuation
    // that needs to be kept with the word).  Seek from the beginning of the
    // range to the first dictionary character
    text.setIndex(startPos);
    int c = getCurrent();
    int category = lookupCategory(c);
    while (category == IGNORE || !categoryFlags[category]) {
        c = getNext();
        category = lookupCategory(c);
    }
    // initialize.  We maintain two stacks: currentBreakPositions contains
    // the list of break positions that will be returned if we successfully
    // finish traversing the whole range now.  possibleBreakPositions lists
    // all other possible word ends we've passed along the way.  (Whenever
    // we reach an error [a sequence of characters that can't begin any word
    // in the dictionary], we back up, possibly delete some breaks from
    // currentBreakPositions, move a break from possibleBreakPositions
    // to currentBreakPositions, and start over from there.  This process
    // continues in this way until we either successfully make it all the way
    // across the range, or exhaust all of our combinations of break
    // positions.)
    Stack<Integer> currentBreakPositions = new Stack<>();
    Stack<Integer> possibleBreakPositions = new Stack<>();
    List<Integer> wrongBreakPositions = new ArrayList<>();
    // the dictionary is implemented as a trie, which is treated as a state
    // machine.  -1 represents the end of a legal word.  Every word in the
    // dictionary is represented by a path from the root node to -1.  A path
    // that ends in state 0 is an illegal combination of characters.
    int state = 0;
    // these two variables are used for error handling.  We keep track of the
    // farthest we've gotten through the range being divided, and the combination
    // of breaks that got us that far.  If we use up all possible break
    // combinations, the text contains an error or a word that's not in the
    // dictionary.  In this case, we "bless" the break positions that got us the
    // farthest as real break positions, and then start over from scratch with
    // the character where the error occurred.
    int farthestEndPoint = text.getIndex();
    Stack<Integer> bestBreakPositions = null;
    // initialize (we always exit the loop with a break statement)
    c = getCurrent();
    while (true) {
        // the possible-break-positions stack
        if (dictionary.getNextState(state, 0) == -1) {
            possibleBreakPositions.push(text.getIndex());
        }
        // look up the new state to transition to in the dictionary
        state = dictionary.getNextStateFromCharacter(state, c);
        // of the loop.
        if (state == -1) {
            currentBreakPositions.push(text.getIndex());
            break;
        } else // an error...
        if (state == 0 || text.getIndex() >= endPos) {
            // case there's an error in the text
            if (text.getIndex() > farthestEndPoint) {
                farthestEndPoint = text.getIndex();
                @SuppressWarnings("unchecked") Stack<Integer> currentBreakPositionsCopy = (Stack<Integer>) currentBreakPositions.clone();
                bestBreakPositions = currentBreakPositionsCopy;
            }
            // repetitive checks from slowing down some extreme cases)
            while (!possibleBreakPositions.isEmpty() && wrongBreakPositions.contains(possibleBreakPositions.peek())) {
                possibleBreakPositions.pop();
            }
            // far as real break positions
            if (possibleBreakPositions.isEmpty()) {
                if (bestBreakPositions != null) {
                    currentBreakPositions = bestBreakPositions;
                    if (farthestEndPoint < endPos) {
                        text.setIndex(farthestEndPoint + 1);
                    } else {
                        break;
                    }
                } else {
                    if ((currentBreakPositions.size() == 0 || currentBreakPositions.peek().intValue() != text.getIndex()) && text.getIndex() != startPos) {
                        currentBreakPositions.push(new Integer(text.getIndex()));
                    }
                    getNext();
                    currentBreakPositions.push(new Integer(text.getIndex()));
                }
            } else // if we still have more break positions we can try, then promote the
            // last break in possibleBreakPositions into currentBreakPositions,
            // and get rid of all entries in currentBreakPositions that come after
            // it.  Then back up to that position and start over from there (i.e.,
            // treat that position as the beginning of a new word)
            {
                Integer temp = possibleBreakPositions.pop();
                Integer temp2 = null;
                while (!currentBreakPositions.isEmpty() && temp.intValue() < currentBreakPositions.peek().intValue()) {
                    temp2 = currentBreakPositions.pop();
                    wrongBreakPositions.add(temp2);
                }
                currentBreakPositions.push(temp);
                text.setIndex(currentBreakPositions.peek().intValue());
            }
            // re-sync "c" for the next go-round, and drop out of the loop if
            // we've made it off the end of the range
            c = getCurrent();
            if (text.getIndex() >= endPos) {
                break;
            }
        } else // if we didn't hit any exceptional conditions on this last iteration,
        // just advance to the next character and loop
        {
            c = getNext();
        }
    }
    // keep with the word)
    if (!currentBreakPositions.isEmpty()) {
        currentBreakPositions.pop();
    }
    currentBreakPositions.push(endPos);
    // create a regular array to hold the break positions and copy
    // the break positions from the stack to the array (in addition,
    // our starting position goes into this array as a break position).
    // This array becomes the cache of break positions used by next()
    // and previous(), so this is where we actually refresh the cache.
    cachedBreakPositions = new int[currentBreakPositions.size() + 1];
    cachedBreakPositions[0] = startPos;
    for (int i = 0; i < currentBreakPositions.size(); i++) {
        cachedBreakPositions[i + 1] = currentBreakPositions.elementAt(i).intValue();
    }
    positionInCache = 0;
}

Also used : CharacterIterator(java.text.CharacterIterator) ArrayList(java.util.ArrayList) Stack(java.util.Stack)

Example 28 with CharacterIterator

use of java.text.CharacterIterator in project jdk8u_jdk by JetBrains.

the class DictionaryBasedBreakIterator method preceding.

/**
     * Sets the current iteration position to the last boundary position
     * before the specified position.
     * @param offset The position to begin searching from
     * @return The position of the last boundary before "offset"
     */
@Override
public int preceding(int offset) {
    CharacterIterator text = getText();
    checkOffset(offset, text);
    // refresh the cache)
    if (cachedBreakPositions == null || offset <= cachedBreakPositions[0] || offset > cachedBreakPositions[cachedBreakPositions.length - 1]) {
        cachedBreakPositions = null;
        return super.preceding(offset);
    } else // on the other hand, if "offset" is within the range covered by the cache,
    // then all we have to do is search the cache for the last break position
    // before "offset"
    {
        positionInCache = 0;
        while (positionInCache < cachedBreakPositions.length && offset > cachedBreakPositions[positionInCache]) {
            ++positionInCache;
        }
        --positionInCache;
        text.setIndex(cachedBreakPositions[positionInCache]);
        return text.getIndex();
    }
}

Also used : CharacterIterator(java.text.CharacterIterator)

Example 29 with CharacterIterator

use of java.text.CharacterIterator in project jena by apache.

the class N3JenaWriterCommon method checkNamePart.

protected static boolean checkNamePart(String s) {
    if (s.length() == 0)
        return true;
    CharacterIterator cIter = new StringCharacterIterator(s);
    char ch = cIter.first();
    if (!checkNameStartChar(ch))
        return false;
    return checkNameTail(cIter);
}

Also used : StringCharacterIterator(java.text.StringCharacterIterator) CharacterIterator(java.text.CharacterIterator) StringCharacterIterator(java.text.StringCharacterIterator)

Example 30 with CharacterIterator

use of java.text.CharacterIterator in project jena by apache.

the class TurtleValidate method checkValidNamePart.

protected static boolean checkValidNamePart(String s) {
    if (s.length() == 0)
        return true;
    CharacterIterator cIter = new StringCharacterIterator(s);
    char ch = cIter.first();
    if (!checkNameStartChar(ch))
        return false;
    return checkNameTail(cIter);
}

Also used : StringCharacterIterator(java.text.StringCharacterIterator) StringCharacterIterator(java.text.StringCharacterIterator) CharacterIterator(java.text.CharacterIterator)

Aggregations

CharacterIterator (java.text.CharacterIterator)33 StringCharacterIterator (java.text.StringCharacterIterator)28 Nullable (org.jetbrains.annotations.Nullable)3 ClsFormatException (com.intellij.util.cls.ClsFormatException)2 ArrayList (java.util.ArrayList)2 Logger (com.intellij.openapi.diagnostic.Logger)1 Pair (com.intellij.openapi.util.Pair)1 Pair.pair (com.intellij.openapi.util.Pair.pair)1 StringUtil (com.intellij.openapi.util.text.StringUtil)1 VirtualFile (com.intellij.openapi.vfs.VirtualFile)1 LanguageLevel (com.intellij.pom.java.LanguageLevel)1 CommonClassNames (com.intellij.psi.CommonClassNames)1 PsiNameHelper (com.intellij.psi.PsiNameHelper)1 ModifierFlags (com.intellij.psi.impl.cache.ModifierFlags)1 TypeInfo (com.intellij.psi.impl.cache.TypeInfo)1 com.intellij.psi.impl.java.stubs (com.intellij.psi.impl.java.stubs)1 com.intellij.psi.impl.java.stubs.impl (com.intellij.psi.impl.java.stubs.impl)1 PsiFileStub (com.intellij.psi.stubs.PsiFileStub)1 StubElement (com.intellij.psi.stubs.StubElement)1 ArrayUtil (com.intellij.util.ArrayUtil)1