use of org.antlr.v4.runtime.misc.IntervalSet in project antlr4 by antlr.
the class DefaultErrorStrategy method getMissingSymbol.
/** Conjure up a missing token during error recovery.
*
* The recognizer attempts to recover from single missing
* symbols. But, actions might refer to that missing symbol.
* For example, x=ID {f($x);}. The action clearly assumes
* that there has been an identifier matched previously and that
* $x points at that token. If that token is missing, but
* the next token in the stream is what we want we assume that
* this token is missing and we keep going. Because we
* have to return some token to replace the missing token,
* we have to conjure one up. This method gives the user control
* over the tokens returned for missing tokens. Mostly,
* you will want to create something special for identifier
* tokens. For literals such as '{' and ',', the default
* action in the parser or tree parser works. It simply creates
* a CommonToken of the appropriate type. The text will be the token.
* If you change what tokens must be created by the lexer,
* override this method to create the appropriate tokens.
*/
protected Token getMissingSymbol(Parser recognizer) {
Token currentSymbol = recognizer.getCurrentToken();
IntervalSet expecting = getExpectedTokens(recognizer);
int expectedTokenType = Token.INVALID_TYPE;
if (!expecting.isNil()) {
// get any element
expectedTokenType = expecting.getMinElement();
}
String tokenText;
if (expectedTokenType == Token.EOF)
tokenText = "<missing EOF>";
else
tokenText = "<missing " + recognizer.getVocabulary().getDisplayName(expectedTokenType) + ">";
Token current = currentSymbol;
Token lookback = recognizer.getInputStream().LT(-1);
if (current.getType() == Token.EOF && lookback != null) {
current = lookback;
}
return recognizer.getTokenFactory().create(new Pair<TokenSource, CharStream>(current.getTokenSource(), current.getTokenSource().getInputStream()), expectedTokenType, tokenText, Token.DEFAULT_CHANNEL, -1, -1, current.getLine(), current.getCharPositionInLine());
}
use of org.antlr.v4.runtime.misc.IntervalSet in project antlr4 by antlr.
the class DefaultErrorStrategy method sync.
/**
* The default implementation of {@link ANTLRErrorStrategy#sync} makes sure
* that the current lookahead symbol is consistent with what were expecting
* at this point in the ATN. You can call this anytime but ANTLR only
* generates code to check before subrules/loops and each iteration.
*
* <p>Implements Jim Idle's magic sync mechanism in closures and optional
* subrules. E.g.,</p>
*
* <pre>
* a : sync ( stuff sync )* ;
* sync : {consume to what can follow sync} ;
* </pre>
*
* At the start of a sub rule upon error, {@link #sync} performs single
* token deletion, if possible. If it can't do that, it bails on the current
* rule and uses the default error recovery, which consumes until the
* resynchronization set of the current rule.
*
* <p>If the sub rule is optional ({@code (...)?}, {@code (...)*}, or block
* with an empty alternative), then the expected set includes what follows
* the subrule.</p>
*
* <p>During loop iteration, it consumes until it sees a token that can start a
* sub rule or what follows loop. Yes, that is pretty aggressive. We opt to
* stay in the loop as long as possible.</p>
*
* <p><strong>ORIGINS</strong></p>
*
* <p>Previous versions of ANTLR did a poor job of their recovery within loops.
* A single mismatch token or missing token would force the parser to bail
* out of the entire rules surrounding the loop. So, for rule</p>
*
* <pre>
* classDef : 'class' ID '{' member* '}'
* </pre>
*
* input with an extra token between members would force the parser to
* consume until it found the next class definition rather than the next
* member definition of the current class.
*
* <p>This functionality cost a little bit of effort because the parser has to
* compare token set at the start of the loop and at each iteration. If for
* some reason speed is suffering for you, you can turn off this
* functionality by simply overriding this method as a blank { }.</p>
*/
@Override
public void sync(Parser recognizer) throws RecognitionException {
ATNState s = recognizer.getInterpreter().atn.states.get(recognizer.getState());
// If already recovering, don't try to sync
if (inErrorRecoveryMode(recognizer)) {
return;
}
TokenStream tokens = recognizer.getInputStream();
int la = tokens.LA(1);
// try cheaper subset first; might get lucky. seems to shave a wee bit off
IntervalSet nextTokens = recognizer.getATN().nextTokens(s);
if (nextTokens.contains(Token.EPSILON) || nextTokens.contains(la)) {
return;
}
switch(s.getStateType()) {
case ATNState.BLOCK_START:
case ATNState.STAR_BLOCK_START:
case ATNState.PLUS_BLOCK_START:
case ATNState.STAR_LOOP_ENTRY:
// report error and recover if possible
if (singleTokenDeletion(recognizer) != null) {
return;
}
throw new InputMismatchException(recognizer);
case ATNState.PLUS_LOOP_BACK:
case ATNState.STAR_LOOP_BACK:
// System.err.println("at loop back: "+s.getClass().getSimpleName());
reportUnwantedToken(recognizer);
IntervalSet expecting = recognizer.getExpectedTokens();
IntervalSet whatFollowsLoopIterationOrRule = expecting.or(getErrorRecoverySet(recognizer));
consumeUntil(recognizer, whatFollowsLoopIterationOrRule);
break;
default:
// do nothing if we can't identify the exact kind of ATN state
break;
}
}
use of org.antlr.v4.runtime.misc.IntervalSet in project antlr4 by antlr.
the class Parser method isExpectedToken.
/**
* Checks whether or not {@code symbol} can follow the current state in the
* ATN. The behavior of this method is equivalent to the following, but is
* implemented such that the complete context-sensitive follow set does not
* need to be explicitly constructed.
*
* <pre>
* return getExpectedTokens().contains(symbol);
* </pre>
*
* @param symbol the symbol type to check
* @return {@code true} if {@code symbol} can follow the current state in
* the ATN, otherwise {@code false}.
*/
public boolean isExpectedToken(int symbol) {
// return getInterpreter().atn.nextTokens(_ctx);
ATN atn = getInterpreter().atn;
ParserRuleContext ctx = _ctx;
ATNState s = atn.states.get(getState());
IntervalSet following = atn.nextTokens(s);
if (following.contains(symbol)) {
return true;
}
// System.out.println("following "+s+"="+following);
if (!following.contains(Token.EPSILON))
return false;
while (ctx != null && ctx.invokingState >= 0 && following.contains(Token.EPSILON)) {
ATNState invokingState = atn.states.get(ctx.invokingState);
RuleTransition rt = (RuleTransition) invokingState.transition(0);
following = atn.nextTokens(rt.followState);
if (following.contains(symbol)) {
return true;
}
ctx = (ParserRuleContext) ctx.parent;
}
if (following.contains(Token.EPSILON) && symbol == Token.EOF) {
return true;
}
return false;
}
use of org.antlr.v4.runtime.misc.IntervalSet in project antlr4 by antlr.
the class ATN method nextTokens.
/** Compute the set of valid tokens that can occur starting in state {@code s}.
* If {@code ctx} is null, the set of tokens will not include what can follow
* the rule surrounding {@code s}. In other words, the set will be
* restricted to tokens reachable staying within {@code s}'s rule.
*/
public IntervalSet nextTokens(ATNState s, RuleContext ctx) {
LL1Analyzer anal = new LL1Analyzer(this);
IntervalSet next = anal.LOOK(s, ctx);
return next;
}
use of org.antlr.v4.runtime.misc.IntervalSet in project antlr4 by antlr.
the class ATN method getExpectedTokens.
/**
* Computes the set of input symbols which could follow ATN state number
* {@code stateNumber} in the specified full {@code context}. This method
* considers the complete parser context, but does not evaluate semantic
* predicates (i.e. all predicates encountered during the calculation are
* assumed true). If a path in the ATN exists from the starting state to the
* {@link RuleStopState} of the outermost context without matching any
* symbols, {@link Token#EOF} is added to the returned set.
*
* <p>If {@code context} is {@code null}, it is treated as {@link ParserRuleContext#EMPTY}.</p>
*
* Note that this does NOT give you the set of all tokens that could
* appear at a given token position in the input phrase. In other words,
* it does not answer:
*
* "Given a specific partial input phrase, return the set of all tokens
* that can follow the last token in the input phrase."
*
* The big difference is that with just the input, the parser could
* land right in the middle of a lookahead decision. Getting
* all *possible* tokens given a partial input stream is a separate
* computation. See https://github.com/antlr/antlr4/issues/1428
*
* For this function, we are specifying an ATN state and call stack to compute
* what token(s) can come next and specifically: outside of a lookahead decision.
* That is what you want for error reporting and recovery upon parse error.
*
* @param stateNumber the ATN state number
* @param context the full parse context
* @return The set of potentially valid input symbols which could follow the
* specified state in the specified context.
* @throws IllegalArgumentException if the ATN does not contain a state with
* number {@code stateNumber}
*/
public IntervalSet getExpectedTokens(int stateNumber, RuleContext context) {
if (stateNumber < 0 || stateNumber >= states.size()) {
throw new IllegalArgumentException("Invalid state number.");
}
RuleContext ctx = context;
ATNState s = states.get(stateNumber);
IntervalSet following = nextTokens(s);
if (!following.contains(Token.EPSILON)) {
return following;
}
IntervalSet expected = new IntervalSet();
expected.addAll(following);
expected.remove(Token.EPSILON);
while (ctx != null && ctx.invokingState >= 0 && following.contains(Token.EPSILON)) {
ATNState invokingState = states.get(ctx.invokingState);
RuleTransition rt = (RuleTransition) invokingState.transition(0);
following = nextTokens(rt.followState);
expected.addAll(following);
expected.remove(Token.EPSILON);
ctx = ctx.parent;
}
if (following.contains(Token.EPSILON)) {
expected.add(Token.EOF);
}
return expected;
}
Aggregations