Search in sources :

Example 36 with ATN

use of org.antlr.v4.runtime.atn.ATN in project antlr4 by antlr.

the class LexerATNSimulator method execATN.

protected int execATN(CharStream input, DFAState ds0) {
    //System.out.println("enter exec index "+input.index()+" from "+ds0.configs);
    if (debug) {
        System.out.format(Locale.getDefault(), "start state closure=%s\n", ds0.configs);
    }
    if (ds0.isAcceptState) {
        // allow zero-length tokens
        captureSimState(prevAccept, input, ds0);
    }
    int t = input.LA(1);
    // s is current/from DFA state
    DFAState s = ds0;
    while (true) {
        // while more work
        if (debug) {
            System.out.format(Locale.getDefault(), "execATN loop starting closure: %s\n", s.configs);
        }
        // As we move src->trg, src->trg, we keep track of the previous trg to
        // avoid looking up the DFA state again, which is expensive.
        // If the previous target was already part of the DFA, we might
        // be able to avoid doing a reach operation upon t. If s!=null,
        // it means that semantic predicates didn't prevent us from
        // creating a DFA state. Once we know s!=null, we check to see if
        // the DFA state has an edge already for t. If so, we can just reuse
        // it's configuration set; there's no point in re-computing it.
        // This is kind of like doing DFA simulation within the ATN
        // simulation because DFA simulation is really just a way to avoid
        // computing reach/closure sets. Technically, once we know that
        // we have a previously added DFA state, we could jump over to
        // the DFA simulator. But, that would mean popping back and forth
        // a lot and making things more complicated algorithmically.
        // This optimization makes a lot of sense for loops within DFA.
        // A character will take us back to an existing DFA state
        // that already has lots of edges out of it. e.g., .* in comments.
        DFAState target = getExistingTargetState(s, t);
        if (target == null) {
            target = computeTargetState(input, s, t);
        }
        if (target == ERROR) {
            break;
        }
        // end of the token.
        if (t != IntStream.EOF) {
            consume(input);
        }
        if (target.isAcceptState) {
            captureSimState(prevAccept, input, target);
            if (t == IntStream.EOF) {
                break;
            }
        }
        t = input.LA(1);
        // flip; current DFA target becomes new src/from state
        s = target;
    }
    return failOrAccept(prevAccept, input, s.configs, t);
}
Also used : DFAState(org.antlr.v4.runtime.dfa.DFAState)

Example 37 with ATN

use of org.antlr.v4.runtime.atn.ATN in project antlr4 by antlr.

the class DefaultErrorStrategy method recover.

/**
	 * {@inheritDoc}
	 *
	 * <p>The default implementation resynchronizes the parser by consuming tokens
	 * until we find one in the resynchronization set--loosely the set of tokens
	 * that can follow the current rule.</p>
	 */
@Override
public void recover(Parser recognizer, RecognitionException e) {
    //						   ", states="+lastErrorStates);
    if (lastErrorIndex == recognizer.getInputStream().index() && lastErrorStates != null && lastErrorStates.contains(recognizer.getState())) {
        // uh oh, another error at same token index and previously-visited
        // state in ATN; must be a case where LT(1) is in the recovery
        // token set so nothing got consumed. Consume a single token
        // at least to prevent an infinite loop; this is a failsafe.
        //			System.err.println("seen error condition before index="+
        //							   lastErrorIndex+", states="+lastErrorStates);
        //			System.err.println("FAILSAFE consumes "+recognizer.getTokenNames()[recognizer.getInputStream().LA(1)]);
        recognizer.consume();
    }
    lastErrorIndex = recognizer.getInputStream().index();
    if (lastErrorStates == null)
        lastErrorStates = new IntervalSet();
    lastErrorStates.add(recognizer.getState());
    IntervalSet followSet = getErrorRecoverySet(recognizer);
    consumeUntil(recognizer, followSet);
}
Also used : IntervalSet(org.antlr.v4.runtime.misc.IntervalSet)

Example 38 with ATN

use of org.antlr.v4.runtime.atn.ATN in project antlr4 by antlr.

the class DefaultErrorStrategy method getErrorRecoverySet.

/*  Compute the error recovery set for the current rule.  During
	 *  rule invocation, the parser pushes the set of tokens that can
	 *  follow that rule reference on the stack; this amounts to
	 *  computing FIRST of what follows the rule reference in the
	 *  enclosing rule. See LinearApproximator.FIRST().
	 *  This local follow set only includes tokens
	 *  from within the rule; i.e., the FIRST computation done by
	 *  ANTLR stops at the end of a rule.
	 *
	 *  EXAMPLE
	 *
	 *  When you find a "no viable alt exception", the input is not
	 *  consistent with any of the alternatives for rule r.  The best
	 *  thing to do is to consume tokens until you see something that
	 *  can legally follow a call to r *or* any rule that called r.
	 *  You don't want the exact set of viable next tokens because the
	 *  input might just be missing a token--you might consume the
	 *  rest of the input looking for one of the missing tokens.
	 *
	 *  Consider grammar:
	 *
	 *  a : '[' b ']'
	 *    | '(' b ')'
	 *    ;
	 *  b : c '^' INT ;
	 *  c : ID
	 *    | INT
	 *    ;
	 *
	 *  At each rule invocation, the set of tokens that could follow
	 *  that rule is pushed on a stack.  Here are the various
	 *  context-sensitive follow sets:
	 *
	 *  FOLLOW(b1_in_a) = FIRST(']') = ']'
	 *  FOLLOW(b2_in_a) = FIRST(')') = ')'
	 *  FOLLOW(c_in_b) = FIRST('^') = '^'
	 *
	 *  Upon erroneous input "[]", the call chain is
	 *
	 *  a -> b -> c
	 *
	 *  and, hence, the follow context stack is:
	 *
	 *  depth     follow set       start of rule execution
	 *    0         <EOF>                    a (from main())
	 *    1          ']'                     b
	 *    2          '^'                     c
	 *
	 *  Notice that ')' is not included, because b would have to have
	 *  been called from a different context in rule a for ')' to be
	 *  included.
	 *
	 *  For error recovery, we cannot consider FOLLOW(c)
	 *  (context-sensitive or otherwise).  We need the combined set of
	 *  all context-sensitive FOLLOW sets--the set of all tokens that
	 *  could follow any reference in the call chain.  We need to
	 *  resync to one of those tokens.  Note that FOLLOW(c)='^' and if
	 *  we resync'd to that token, we'd consume until EOF.  We need to
	 *  sync to context-sensitive FOLLOWs for a, b, and c: {']','^'}.
	 *  In this case, for input "[]", LA(1) is ']' and in the set, so we would
	 *  not consume anything. After printing an error, rule c would
	 *  return normally.  Rule b would not find the required '^' though.
	 *  At this point, it gets a mismatched token error and throws an
	 *  exception (since LA(1) is not in the viable following token
	 *  set).  The rule exception handler tries to recover, but finds
	 *  the same recovery set and doesn't consume anything.  Rule b
	 *  exits normally returning to rule a.  Now it finds the ']' (and
	 *  with the successful match exits errorRecovery mode).
	 *
	 *  So, you can see that the parser walks up the call chain looking
	 *  for the token that was a member of the recovery set.
	 *
	 *  Errors are not generated in errorRecovery mode.
	 *
	 *  ANTLR's error recovery mechanism is based upon original ideas:
	 *
	 *  "Algorithms + Data Structures = Programs" by Niklaus Wirth
	 *
	 *  and
	 *
	 *  "A note on error recovery in recursive descent parsers":
	 *  http://portal.acm.org/citation.cfm?id=947902.947905
	 *
	 *  Later, Josef Grosch had some good ideas:
	 *
	 *  "Efficient and Comfortable Error Recovery in Recursive Descent
	 *  Parsers":
	 *  ftp://www.cocolab.com/products/cocktail/doca4.ps/ell.ps.zip
	 *
	 *  Like Grosch I implement context-sensitive FOLLOW sets that are combined
	 *  at run-time upon error to avoid overhead during parsing.
	 */
protected IntervalSet getErrorRecoverySet(Parser recognizer) {
    ATN atn = recognizer.getInterpreter().atn;
    RuleContext ctx = recognizer._ctx;
    IntervalSet recoverSet = new IntervalSet();
    while (ctx != null && ctx.invokingState >= 0) {
        // compute what follows who invoked us
        ATNState invokingState = atn.states.get(ctx.invokingState);
        RuleTransition rt = (RuleTransition) invokingState.transition(0);
        IntervalSet follow = atn.nextTokens(rt.followState);
        recoverSet.addAll(follow);
        ctx = ctx.parent;
    }
    recoverSet.remove(Token.EPSILON);
    //		System.out.println("recover set "+recoverSet.toString(recognizer.getTokenNames()));
    return recoverSet;
}
Also used : IntervalSet(org.antlr.v4.runtime.misc.IntervalSet) RuleTransition(org.antlr.v4.runtime.atn.RuleTransition) ATN(org.antlr.v4.runtime.atn.ATN) ATNState(org.antlr.v4.runtime.atn.ATNState)

Example 39 with ATN

use of org.antlr.v4.runtime.atn.ATN in project antlr4 by antlr.

the class DefaultErrorStrategy method sync.

/**
	 * The default implementation of {@link ANTLRErrorStrategy#sync} makes sure
	 * that the current lookahead symbol is consistent with what were expecting
	 * at this point in the ATN. You can call this anytime but ANTLR only
	 * generates code to check before subrules/loops and each iteration.
	 *
	 * <p>Implements Jim Idle's magic sync mechanism in closures and optional
	 * subrules. E.g.,</p>
	 *
	 * <pre>
	 * a : sync ( stuff sync )* ;
	 * sync : {consume to what can follow sync} ;
	 * </pre>
	 *
	 * At the start of a sub rule upon error, {@link #sync} performs single
	 * token deletion, if possible. If it can't do that, it bails on the current
	 * rule and uses the default error recovery, which consumes until the
	 * resynchronization set of the current rule.
	 *
	 * <p>If the sub rule is optional ({@code (...)?}, {@code (...)*}, or block
	 * with an empty alternative), then the expected set includes what follows
	 * the subrule.</p>
	 *
	 * <p>During loop iteration, it consumes until it sees a token that can start a
	 * sub rule or what follows loop. Yes, that is pretty aggressive. We opt to
	 * stay in the loop as long as possible.</p>
	 *
	 * <p><strong>ORIGINS</strong></p>
	 *
	 * <p>Previous versions of ANTLR did a poor job of their recovery within loops.
	 * A single mismatch token or missing token would force the parser to bail
	 * out of the entire rules surrounding the loop. So, for rule</p>
	 *
	 * <pre>
	 * classDef : 'class' ID '{' member* '}'
	 * </pre>
	 *
	 * input with an extra token between members would force the parser to
	 * consume until it found the next class definition rather than the next
	 * member definition of the current class.
	 *
	 * <p>This functionality cost a little bit of effort because the parser has to
	 * compare token set at the start of the loop and at each iteration. If for
	 * some reason speed is suffering for you, you can turn off this
	 * functionality by simply overriding this method as a blank { }.</p>
	 */
@Override
public void sync(Parser recognizer) throws RecognitionException {
    ATNState s = recognizer.getInterpreter().atn.states.get(recognizer.getState());
    // If already recovering, don't try to sync
    if (inErrorRecoveryMode(recognizer)) {
        return;
    }
    TokenStream tokens = recognizer.getInputStream();
    int la = tokens.LA(1);
    // try cheaper subset first; might get lucky. seems to shave a wee bit off
    IntervalSet nextTokens = recognizer.getATN().nextTokens(s);
    if (nextTokens.contains(Token.EPSILON) || nextTokens.contains(la)) {
        return;
    }
    switch(s.getStateType()) {
        case ATNState.BLOCK_START:
        case ATNState.STAR_BLOCK_START:
        case ATNState.PLUS_BLOCK_START:
        case ATNState.STAR_LOOP_ENTRY:
            // report error and recover if possible
            if (singleTokenDeletion(recognizer) != null) {
                return;
            }
            throw new InputMismatchException(recognizer);
        case ATNState.PLUS_LOOP_BACK:
        case ATNState.STAR_LOOP_BACK:
            //			System.err.println("at loop back: "+s.getClass().getSimpleName());
            reportUnwantedToken(recognizer);
            IntervalSet expecting = recognizer.getExpectedTokens();
            IntervalSet whatFollowsLoopIterationOrRule = expecting.or(getErrorRecoverySet(recognizer));
            consumeUntil(recognizer, whatFollowsLoopIterationOrRule);
            break;
        default:
            // do nothing if we can't identify the exact kind of ATN state
            break;
    }
}
Also used : IntervalSet(org.antlr.v4.runtime.misc.IntervalSet) ATNState(org.antlr.v4.runtime.atn.ATNState)

Example 40 with ATN

use of org.antlr.v4.runtime.atn.ATN in project antlr4 by antlr.

the class Parser method isExpectedToken.

/**
	 * Checks whether or not {@code symbol} can follow the current state in the
	 * ATN. The behavior of this method is equivalent to the following, but is
	 * implemented such that the complete context-sensitive follow set does not
	 * need to be explicitly constructed.
	 *
	 * <pre>
	 * return getExpectedTokens().contains(symbol);
	 * </pre>
	 *
	 * @param symbol the symbol type to check
	 * @return {@code true} if {@code symbol} can follow the current state in
	 * the ATN, otherwise {@code false}.
	 */
public boolean isExpectedToken(int symbol) {
    //   		return getInterpreter().atn.nextTokens(_ctx);
    ATN atn = getInterpreter().atn;
    ParserRuleContext ctx = _ctx;
    ATNState s = atn.states.get(getState());
    IntervalSet following = atn.nextTokens(s);
    if (following.contains(symbol)) {
        return true;
    }
    //        System.out.println("following "+s+"="+following);
    if (!following.contains(Token.EPSILON))
        return false;
    while (ctx != null && ctx.invokingState >= 0 && following.contains(Token.EPSILON)) {
        ATNState invokingState = atn.states.get(ctx.invokingState);
        RuleTransition rt = (RuleTransition) invokingState.transition(0);
        following = atn.nextTokens(rt.followState);
        if (following.contains(symbol)) {
            return true;
        }
        ctx = (ParserRuleContext) ctx.parent;
    }
    if (following.contains(Token.EPSILON) && symbol == Token.EOF) {
        return true;
    }
    return false;
}
Also used : IntervalSet(org.antlr.v4.runtime.misc.IntervalSet) RuleTransition(org.antlr.v4.runtime.atn.RuleTransition) ATN(org.antlr.v4.runtime.atn.ATN) ATNState(org.antlr.v4.runtime.atn.ATNState)

Aggregations

ATN (org.antlr.v4.runtime.atn.ATN)73 LexerGrammar (org.antlr.v4.tool.LexerGrammar)48 Test (org.junit.Test)41 ATNState (org.antlr.v4.runtime.atn.ATNState)23 Grammar (org.antlr.v4.tool.Grammar)20 IntervalSet (org.antlr.v4.runtime.misc.IntervalSet)18 ParserATNFactory (org.antlr.v4.automata.ParserATNFactory)16 ArrayList (java.util.ArrayList)13 DFA (org.antlr.v4.runtime.dfa.DFA)13 ATNDeserializer (org.antlr.v4.runtime.atn.ATNDeserializer)12 LexerATNSimulator (org.antlr.v4.runtime.atn.LexerATNSimulator)11 STGroupString (org.stringtemplate.v4.STGroupString)11 DecisionState (org.antlr.v4.runtime.atn.DecisionState)10 BaseRuntimeTest.antlrOnString (org.antlr.v4.test.runtime.BaseRuntimeTest.antlrOnString)10 Rule (org.antlr.v4.tool.Rule)10 DOTGenerator (org.antlr.v4.tool.DOTGenerator)9 LexerATNFactory (org.antlr.v4.automata.LexerATNFactory)8 IntegerList (org.antlr.v4.runtime.misc.IntegerList)6 ErrorQueue (org.antlr.v4.test.runtime.ErrorQueue)6 BitSet (java.util.BitSet)5