use of org.antlr.v4.runtime.atn.ATNState in project antlr4 by antlr.
the class Parser method isExpectedToken.
/**
* Checks whether or not {@code symbol} can follow the current state in the
* ATN. The behavior of this method is equivalent to the following, but is
* implemented such that the complete context-sensitive follow set does not
* need to be explicitly constructed.
*
* <pre>
* return getExpectedTokens().contains(symbol);
* </pre>
*
* @param symbol the symbol type to check
* @return {@code true} if {@code symbol} can follow the current state in
* the ATN, otherwise {@code false}.
*/
public boolean isExpectedToken(int symbol) {
// return getInterpreter().atn.nextTokens(_ctx);
ATN atn = getInterpreter().atn;
ParserRuleContext ctx = _ctx;
ATNState s = atn.states.get(getState());
IntervalSet following = atn.nextTokens(s);
if (following.contains(symbol)) {
return true;
}
// System.out.println("following "+s+"="+following);
if (!following.contains(Token.EPSILON))
return false;
while (ctx != null && ctx.invokingState >= 0 && following.contains(Token.EPSILON)) {
ATNState invokingState = atn.states.get(ctx.invokingState);
RuleTransition rt = (RuleTransition) invokingState.transition(0);
following = atn.nextTokens(rt.followState);
if (following.contains(symbol)) {
return true;
}
ctx = (ParserRuleContext) ctx.parent;
}
if (following.contains(Token.EPSILON) && symbol == Token.EOF) {
return true;
}
return false;
}
use of org.antlr.v4.runtime.atn.ATNState in project antlr4 by antlr.
the class DefaultErrorStrategy method getErrorRecoverySet.
/* Compute the error recovery set for the current rule. During
* rule invocation, the parser pushes the set of tokens that can
* follow that rule reference on the stack; this amounts to
* computing FIRST of what follows the rule reference in the
* enclosing rule. See LinearApproximator.FIRST().
* This local follow set only includes tokens
* from within the rule; i.e., the FIRST computation done by
* ANTLR stops at the end of a rule.
*
* EXAMPLE
*
* When you find a "no viable alt exception", the input is not
* consistent with any of the alternatives for rule r. The best
* thing to do is to consume tokens until you see something that
* can legally follow a call to r *or* any rule that called r.
* You don't want the exact set of viable next tokens because the
* input might just be missing a token--you might consume the
* rest of the input looking for one of the missing tokens.
*
* Consider grammar:
*
* a : '[' b ']'
* | '(' b ')'
* ;
* b : c '^' INT ;
* c : ID
* | INT
* ;
*
* At each rule invocation, the set of tokens that could follow
* that rule is pushed on a stack. Here are the various
* context-sensitive follow sets:
*
* FOLLOW(b1_in_a) = FIRST(']') = ']'
* FOLLOW(b2_in_a) = FIRST(')') = ')'
* FOLLOW(c_in_b) = FIRST('^') = '^'
*
* Upon erroneous input "[]", the call chain is
*
* a -> b -> c
*
* and, hence, the follow context stack is:
*
* depth follow set start of rule execution
* 0 <EOF> a (from main())
* 1 ']' b
* 2 '^' c
*
* Notice that ')' is not included, because b would have to have
* been called from a different context in rule a for ')' to be
* included.
*
* For error recovery, we cannot consider FOLLOW(c)
* (context-sensitive or otherwise). We need the combined set of
* all context-sensitive FOLLOW sets--the set of all tokens that
* could follow any reference in the call chain. We need to
* resync to one of those tokens. Note that FOLLOW(c)='^' and if
* we resync'd to that token, we'd consume until EOF. We need to
* sync to context-sensitive FOLLOWs for a, b, and c: {']','^'}.
* In this case, for input "[]", LA(1) is ']' and in the set, so we would
* not consume anything. After printing an error, rule c would
* return normally. Rule b would not find the required '^' though.
* At this point, it gets a mismatched token error and throws an
* exception (since LA(1) is not in the viable following token
* set). The rule exception handler tries to recover, but finds
* the same recovery set and doesn't consume anything. Rule b
* exits normally returning to rule a. Now it finds the ']' (and
* with the successful match exits errorRecovery mode).
*
* So, you can see that the parser walks up the call chain looking
* for the token that was a member of the recovery set.
*
* Errors are not generated in errorRecovery mode.
*
* ANTLR's error recovery mechanism is based upon original ideas:
*
* "Algorithms + Data Structures = Programs" by Niklaus Wirth
*
* and
*
* "A note on error recovery in recursive descent parsers":
* http://portal.acm.org/citation.cfm?id=947902.947905
*
* Later, Josef Grosch had some good ideas:
*
* "Efficient and Comfortable Error Recovery in Recursive Descent
* Parsers":
* ftp://www.cocolab.com/products/cocktail/doca4.ps/ell.ps.zip
*
* Like Grosch I implement context-sensitive FOLLOW sets that are combined
* at run-time upon error to avoid overhead during parsing.
*/
protected IntervalSet getErrorRecoverySet(Parser recognizer) {
ATN atn = recognizer.getInterpreter().atn;
RuleContext ctx = recognizer._ctx;
IntervalSet recoverSet = new IntervalSet();
while (ctx != null && ctx.invokingState >= 0) {
// compute what follows who invoked us
ATNState invokingState = atn.states.get(ctx.invokingState);
RuleTransition rt = (RuleTransition) invokingState.transition(0);
IntervalSet follow = atn.nextTokens(rt.followState);
recoverSet.addAll(follow);
ctx = ctx.parent;
}
recoverSet.remove(Token.EPSILON);
// System.out.println("recover set "+recoverSet.toString(recognizer.getTokenNames()));
return recoverSet;
}
use of org.antlr.v4.runtime.atn.ATNState in project antlr4 by antlr.
the class DefaultErrorStrategy method sync.
/**
* The default implementation of {@link ANTLRErrorStrategy#sync} makes sure
* that the current lookahead symbol is consistent with what were expecting
* at this point in the ATN. You can call this anytime but ANTLR only
* generates code to check before subrules/loops and each iteration.
*
* <p>Implements Jim Idle's magic sync mechanism in closures and optional
* subrules. E.g.,</p>
*
* <pre>
* a : sync ( stuff sync )* ;
* sync : {consume to what can follow sync} ;
* </pre>
*
* At the start of a sub rule upon error, {@link #sync} performs single
* token deletion, if possible. If it can't do that, it bails on the current
* rule and uses the default error recovery, which consumes until the
* resynchronization set of the current rule.
*
* <p>If the sub rule is optional ({@code (...)?}, {@code (...)*}, or block
* with an empty alternative), then the expected set includes what follows
* the subrule.</p>
*
* <p>During loop iteration, it consumes until it sees a token that can start a
* sub rule or what follows loop. Yes, that is pretty aggressive. We opt to
* stay in the loop as long as possible.</p>
*
* <p><strong>ORIGINS</strong></p>
*
* <p>Previous versions of ANTLR did a poor job of their recovery within loops.
* A single mismatch token or missing token would force the parser to bail
* out of the entire rules surrounding the loop. So, for rule</p>
*
* <pre>
* classDef : 'class' ID '{' member* '}'
* </pre>
*
* input with an extra token between members would force the parser to
* consume until it found the next class definition rather than the next
* member definition of the current class.
*
* <p>This functionality cost a little bit of effort because the parser has to
* compare token set at the start of the loop and at each iteration. If for
* some reason speed is suffering for you, you can turn off this
* functionality by simply overriding this method as a blank { }.</p>
*/
@Override
public void sync(Parser recognizer) throws RecognitionException {
ATNState s = recognizer.getInterpreter().atn.states.get(recognizer.getState());
// If already recovering, don't try to sync
if (inErrorRecoveryMode(recognizer)) {
return;
}
TokenStream tokens = recognizer.getInputStream();
int la = tokens.LA(1);
// try cheaper subset first; might get lucky. seems to shave a wee bit off
IntervalSet nextTokens = recognizer.getATN().nextTokens(s);
if (nextTokens.contains(la)) {
// We are sure the token matches
nextTokensContext = null;
nextTokensState = ATNState.INVALID_STATE_NUMBER;
return;
}
if (nextTokens.contains(Token.EPSILON)) {
if (nextTokensContext == null) {
// It's possible the next token won't match; information tracked
// by sync is restricted for performance.
nextTokensContext = recognizer.getContext();
nextTokensState = recognizer.getState();
}
return;
}
switch(s.getStateType()) {
case ATNState.BLOCK_START:
case ATNState.STAR_BLOCK_START:
case ATNState.PLUS_BLOCK_START:
case ATNState.STAR_LOOP_ENTRY:
// report error and recover if possible
if (singleTokenDeletion(recognizer) != null) {
return;
}
throw new InputMismatchException(recognizer);
case ATNState.PLUS_LOOP_BACK:
case ATNState.STAR_LOOP_BACK:
// System.err.println("at loop back: "+s.getClass().getSimpleName());
reportUnwantedToken(recognizer);
IntervalSet expecting = recognizer.getExpectedTokens();
IntervalSet whatFollowsLoopIterationOrRule = expecting.or(getErrorRecoverySet(recognizer));
consumeUntil(recognizer, whatFollowsLoopIterationOrRule);
break;
default:
// do nothing if we can't identify the exact kind of ATN state
break;
}
}
use of org.antlr.v4.runtime.atn.ATNState in project antlr4 by antlr.
the class TestATNInterpreter method checkMatchedAlt.
public void checkMatchedAlt(LexerGrammar lg, final Grammar g, String inputString, int expected) {
ATN lexatn = createATN(lg, true);
LexerATNSimulator lexInterp = new LexerATNSimulator(lexatn, new DFA[] { new DFA(lexatn.modeToStartState.get(Lexer.DEFAULT_MODE)) }, null);
IntegerList types = getTokenTypesViaATN(inputString, lexInterp);
// System.out.println(types);
g.importVocab(lg);
ParserATNFactory f = new ParserATNFactory(g);
ATN atn = f.createATN();
TokenStream input = new MockIntTokenStream(types);
// System.out.println("input="+input.types);
ParserInterpreterForTesting interp = new ParserInterpreterForTesting(g, input);
ATNState startState = atn.ruleToStartState[g.getRule("a").index];
if (startState.transition(0).target instanceof BlockStartState) {
startState = startState.transition(0).target;
}
DOTGenerator dot = new DOTGenerator(g);
// System.out.println(dot.getDOT(atn.ruleToStartState[g.getRule("a").index]));
Rule r = g.getRule("e");
// if ( r!=null ) System.out.println(dot.getDOT(atn.ruleToStartState[r.index]));
int result = interp.matchATN(input, startState);
assertEquals(expected, result);
}
use of org.antlr.v4.runtime.atn.ATNState in project antlr4 by antlr.
the class ParserATNFactory method star.
/**
* From {@code (blk)*} build {@code ( blk+ )?} with *two* decisions, one for
* entry and one for choosing alts of {@code blk}.
*
* <pre>
* |-------------|
* v |
* o--[o-blk-o]->o o
* | ^
* -----------------|
* </pre>
*
* Note that the optional bypass must jump outside the loop as
* {@code (A|B)*} is not the same thing as {@code (A|B|)+}.
*/
@Override
public Handle star(GrammarAST starAST, Handle elem) {
StarBlockStartState blkStart = (StarBlockStartState) elem.left;
BlockEndState blkEnd = (BlockEndState) elem.right;
preventEpsilonClosureBlocks.add(new Triple<Rule, ATNState, ATNState>(currentRule, blkStart, blkEnd));
StarLoopEntryState entry = newState(StarLoopEntryState.class, starAST);
entry.nonGreedy = !((QuantifierAST) starAST).isGreedy();
atn.defineDecisionState(entry);
LoopEndState end = newState(LoopEndState.class, starAST);
StarLoopbackState loop = newState(StarLoopbackState.class, starAST);
entry.loopBackState = loop;
end.loopBackState = loop;
BlockAST blkAST = (BlockAST) starAST.getChild(0);
if (((QuantifierAST) starAST).isGreedy()) {
if (expectNonGreedy(blkAST)) {
g.tool.errMgr.grammarError(ErrorType.EXPECTED_NON_GREEDY_WILDCARD_BLOCK, g.fileName, starAST.getToken(), starAST.getToken().getText());
}
// loop enter edge (alt 1)
epsilon(entry, blkStart);
// bypass loop edge (alt 2)
epsilon(entry, end);
} else {
// if not greedy, priority to exit branch; make it first
// bypass loop edge (alt 1)
epsilon(entry, end);
// loop enter edge (alt 2)
epsilon(entry, blkStart);
}
// block end hits loop back
epsilon(blkEnd, loop);
// loop back to entry/exit decision
epsilon(loop, entry);
// decision is to enter/exit; blk is its own decision
starAST.atnState = entry;
return new Handle(entry, end);
}
Aggregations