Search in sources :

Example 21 with Recognizer

use of org.antlr.v4.runtime.Recognizer in project antlr4 by tunnelvisionlabs.

the class DefaultErrorStrategy method reportMissingToken.

/**
 * This method is called to report a syntax error which requires the
 * insertion of a missing token into the input stream. At the time this
 * method is called, the missing token has not yet been inserted. When this
 * method returns, {@code recognizer} is in error recovery mode.
 *
 * <p>This method is called when {@link #singleTokenInsertion} identifies
 * single-token insertion as a viable recovery strategy for a mismatched
 * input error.</p>
 *
 * <p>The default implementation simply returns if the handler is already in
 * error recovery mode. Otherwise, it calls {@link #beginErrorCondition} to
 * enter error recovery mode, followed by calling
 * {@link Parser#notifyErrorListeners}.</p>
 *
 * @param recognizer the parser instance
 */
protected void reportMissingToken(@NotNull Parser recognizer) {
    if (inErrorRecoveryMode(recognizer)) {
        return;
    }
    beginErrorCondition(recognizer);
    Token t = recognizer.getCurrentToken();
    IntervalSet expecting = getExpectedTokens(recognizer);
    String msg = "missing " + expecting.toString(recognizer.getVocabulary()) + " at " + getTokenErrorDisplay(t);
    recognizer.notifyErrorListeners(t, msg, null);
}
Also used : IntervalSet(org.antlr.v4.runtime.misc.IntervalSet)

Example 22 with Recognizer

use of org.antlr.v4.runtime.Recognizer in project antlr4 by tunnelvisionlabs.

the class DefaultErrorStrategy method singleTokenDeletion.

/**
 * This method implements the single-token deletion inline error recovery
 * strategy. It is called by {@link #recoverInline} to attempt to recover
 * from mismatched input. If this method returns null, the parser and error
 * handler state will not have changed. If this method returns non-null,
 * {@code recognizer} will <em>not</em> be in error recovery mode since the
 * returned token was a successful match.
 *
 * <p>If the single-token deletion is successful, this method calls
 * {@link #reportUnwantedToken} to report the error, followed by
 * {@link Parser#consume} to actually "delete" the extraneous token. Then,
 * before returning {@link #reportMatch} is called to signal a successful
 * match.</p>
 *
 * @param recognizer the parser instance
 * @return the successfully matched {@link Token} instance if single-token
 * deletion successfully recovers from the mismatched input, otherwise
 * {@code null}
 */
@Nullable
protected Token singleTokenDeletion(@NotNull Parser recognizer) {
    int nextTokenType = recognizer.getInputStream().LA(2);
    IntervalSet expecting = getExpectedTokens(recognizer);
    if (expecting.contains(nextTokenType)) {
        reportUnwantedToken(recognizer);
        /*
			System.err.println("recoverFromMismatchedToken deleting "+
							   ((TokenStream)recognizer.getInputStream()).LT(1)+
							   " since "+((TokenStream)recognizer.getInputStream()).LT(2)+
							   " is what we want");
			*/
        // simply delete extra token
        recognizer.consume();
        // we want to return the token we're actually matching
        Token matchedSymbol = recognizer.getCurrentToken();
        // we know current token is correct
        reportMatch(recognizer);
        return matchedSymbol;
    }
    return null;
}
Also used : IntervalSet(org.antlr.v4.runtime.misc.IntervalSet) Nullable(org.antlr.v4.runtime.misc.Nullable)

Example 23 with Recognizer

use of org.antlr.v4.runtime.Recognizer in project antlr4 by tunnelvisionlabs.

the class DefaultErrorStrategy method getMissingSymbol.

/**
 * Conjure up a missing token during error recovery.
 *
 *  The recognizer attempts to recover from single missing
 *  symbols. But, actions might refer to that missing symbol.
 *  For example, x=ID {f($x);}. The action clearly assumes
 *  that there has been an identifier matched previously and that
 *  $x points at that token. If that token is missing, but
 *  the next token in the stream is what we want we assume that
 *  this token is missing and we keep going. Because we
 *  have to return some token to replace the missing token,
 *  we have to conjure one up. This method gives the user control
 *  over the tokens returned for missing tokens. Mostly,
 *  you will want to create something special for identifier
 *  tokens. For literals such as '{' and ',', the default
 *  action in the parser or tree parser works. It simply creates
 *  a CommonToken of the appropriate type. The text will be the token.
 *  If you change what tokens must be created by the lexer,
 *  override this method to create the appropriate tokens.
 */
@NotNull
protected Token getMissingSymbol(@NotNull Parser recognizer) {
    Token currentSymbol = recognizer.getCurrentToken();
    IntervalSet expecting = getExpectedTokens(recognizer);
    int expectedTokenType = Token.INVALID_TYPE;
    if (!expecting.isNil()) {
        // get any element
        expectedTokenType = expecting.getMinElement();
    }
    String tokenText;
    if (expectedTokenType == Token.EOF)
        tokenText = "<missing EOF>";
    else
        tokenText = "<missing " + recognizer.getVocabulary().getDisplayName(expectedTokenType) + ">";
    Token current = currentSymbol;
    Token lookback = recognizer.getInputStream().LT(-1);
    if (current.getType() == Token.EOF && lookback != null) {
        current = lookback;
    }
    return constructToken(recognizer.getInputStream().getTokenSource(), expectedTokenType, tokenText, current);
}
Also used : IntervalSet(org.antlr.v4.runtime.misc.IntervalSet) NotNull(org.antlr.v4.runtime.misc.NotNull)

Example 24 with Recognizer

use of org.antlr.v4.runtime.Recognizer in project antlr4 by tunnelvisionlabs.

the class DefaultErrorStrategy method getErrorRecoverySet.

/*  Compute the error recovery set for the current rule.  During
	 *  rule invocation, the parser pushes the set of tokens that can
	 *  follow that rule reference on the stack; this amounts to
	 *  computing FIRST of what follows the rule reference in the
	 *  enclosing rule. See LinearApproximator.FIRST().
	 *  This local follow set only includes tokens
	 *  from within the rule; i.e., the FIRST computation done by
	 *  ANTLR stops at the end of a rule.
	 *
	 *  EXAMPLE
	 *
	 *  When you find a "no viable alt exception", the input is not
	 *  consistent with any of the alternatives for rule r.  The best
	 *  thing to do is to consume tokens until you see something that
	 *  can legally follow a call to r *or* any rule that called r.
	 *  You don't want the exact set of viable next tokens because the
	 *  input might just be missing a token--you might consume the
	 *  rest of the input looking for one of the missing tokens.
	 *
	 *  Consider grammar:
	 *
	 *  a : '[' b ']'
	 *    | '(' b ')'
	 *    ;
	 *  b : c '^' INT ;
	 *  c : ID
	 *    | INT
	 *    ;
	 *
	 *  At each rule invocation, the set of tokens that could follow
	 *  that rule is pushed on a stack.  Here are the various
	 *  context-sensitive follow sets:
	 *
	 *  FOLLOW(b1_in_a) = FIRST(']') = ']'
	 *  FOLLOW(b2_in_a) = FIRST(')') = ')'
	 *  FOLLOW(c_in_b) = FIRST('^') = '^'
	 *
	 *  Upon erroneous input "[]", the call chain is
	 *
	 *  a -> b -> c
	 *
	 *  and, hence, the follow context stack is:
	 *
	 *  depth     follow set       start of rule execution
	 *    0         <EOF>                    a (from main())
	 *    1          ']'                     b
	 *    2          '^'                     c
	 *
	 *  Notice that ')' is not included, because b would have to have
	 *  been called from a different context in rule a for ')' to be
	 *  included.
	 *
	 *  For error recovery, we cannot consider FOLLOW(c)
	 *  (context-sensitive or otherwise).  We need the combined set of
	 *  all context-sensitive FOLLOW sets--the set of all tokens that
	 *  could follow any reference in the call chain.  We need to
	 *  resync to one of those tokens.  Note that FOLLOW(c)='^' and if
	 *  we resync'd to that token, we'd consume until EOF.  We need to
	 *  sync to context-sensitive FOLLOWs for a, b, and c: {']','^'}.
	 *  In this case, for input "[]", LA(1) is ']' and in the set, so we would
	 *  not consume anything. After printing an error, rule c would
	 *  return normally.  Rule b would not find the required '^' though.
	 *  At this point, it gets a mismatched token error and throws an
	 *  exception (since LA(1) is not in the viable following token
	 *  set).  The rule exception handler tries to recover, but finds
	 *  the same recovery set and doesn't consume anything.  Rule b
	 *  exits normally returning to rule a.  Now it finds the ']' (and
	 *  with the successful match exits errorRecovery mode).
	 *
	 *  So, you can see that the parser walks up the call chain looking
	 *  for the token that was a member of the recovery set.
	 *
	 *  Errors are not generated in errorRecovery mode.
	 *
	 *  ANTLR's error recovery mechanism is based upon original ideas:
	 *
	 *  "Algorithms + Data Structures = Programs" by Niklaus Wirth
	 *
	 *  and
	 *
	 *  "A note on error recovery in recursive descent parsers":
	 *  http://portal.acm.org/citation.cfm?id=947902.947905
	 *
	 *  Later, Josef Grosch had some good ideas:
	 *
	 *  "Efficient and Comfortable Error Recovery in Recursive Descent
	 *  Parsers":
	 *  ftp://www.cocolab.com/products/cocktail/doca4.ps/ell.ps.zip
	 *
	 *  Like Grosch I implement context-sensitive FOLLOW sets that are combined
	 *  at run-time upon error to avoid overhead during parsing.
	 */
@NotNull
protected IntervalSet getErrorRecoverySet(@NotNull Parser recognizer) {
    ATN atn = recognizer.getInterpreter().atn;
    RuleContext ctx = recognizer._ctx;
    IntervalSet recoverSet = new IntervalSet();
    while (ctx != null && ctx.invokingState >= 0) {
        // compute what follows who invoked us
        ATNState invokingState = atn.states.get(ctx.invokingState);
        RuleTransition rt = (RuleTransition) invokingState.transition(0);
        IntervalSet follow = atn.nextTokens(rt.followState);
        recoverSet.addAll(follow);
        ctx = ctx.parent;
    }
    recoverSet.remove(Token.EPSILON);
    // System.out.println("recover set "+recoverSet.toString(recognizer.getTokenNames()));
    return recoverSet;
}
Also used : IntervalSet(org.antlr.v4.runtime.misc.IntervalSet) RuleTransition(org.antlr.v4.runtime.atn.RuleTransition) ATN(org.antlr.v4.runtime.atn.ATN) ATNState(org.antlr.v4.runtime.atn.ATNState) NotNull(org.antlr.v4.runtime.misc.NotNull)

Example 25 with Recognizer

use of org.antlr.v4.runtime.Recognizer in project presto by prestodb.

the class SqlParser method invokeParser.

private Node invokeParser(String name, String sql, Function<SqlBaseParser, ParserRuleContext> parseFunction, ParsingOptions parsingOptions) {
    try {
        SqlBaseLexer lexer = new SqlBaseLexer(new CaseInsensitiveStream(CharStreams.fromString(sql)));
        CommonTokenStream tokenStream = new CommonTokenStream(lexer);
        SqlBaseParser parser = new SqlBaseParser(tokenStream);
        initializer.accept(lexer, parser);
        // Override the default error strategy to not attempt inserting or deleting a token.
        // Otherwise, it messes up error reporting
        parser.setErrorHandler(new DefaultErrorStrategy() {

            @Override
            public Token recoverInline(Parser recognizer) throws RecognitionException {
                if (nextTokensContext == null) {
                    throw new InputMismatchException(recognizer);
                } else {
                    throw new InputMismatchException(recognizer, nextTokensState, nextTokensContext);
                }
            }
        });
        parser.addParseListener(new PostProcessor(Arrays.asList(parser.getRuleNames()), parsingOptions.getWarningConsumer()));
        lexer.removeErrorListeners();
        lexer.addErrorListener(LEXER_ERROR_LISTENER);
        parser.removeErrorListeners();
        if (enhancedErrorHandlerEnabled) {
            parser.addErrorListener(PARSER_ERROR_HANDLER);
        } else {
            parser.addErrorListener(LEXER_ERROR_LISTENER);
        }
        ParserRuleContext tree;
        try {
            // first, try parsing with potentially faster SLL mode
            parser.getInterpreter().setPredictionMode(PredictionMode.SLL);
            tree = parseFunction.apply(parser);
        } catch (ParseCancellationException ex) {
            // if we fail, parse with LL mode
            // rewind input stream
            tokenStream.reset();
            parser.reset();
            parser.getInterpreter().setPredictionMode(PredictionMode.LL);
            tree = parseFunction.apply(parser);
        }
        return new AstBuilder(parsingOptions).visit(tree);
    } catch (StackOverflowError e) {
        throw new ParsingException(name + " is too large (stack overflow while parsing)");
    }
}
Also used : CommonTokenStream(org.antlr.v4.runtime.CommonTokenStream) ParserRuleContext(org.antlr.v4.runtime.ParserRuleContext) Token(org.antlr.v4.runtime.Token) CommonToken(org.antlr.v4.runtime.CommonToken) InputMismatchException(org.antlr.v4.runtime.InputMismatchException) Parser(org.antlr.v4.runtime.Parser) ParseCancellationException(org.antlr.v4.runtime.misc.ParseCancellationException) DefaultErrorStrategy(org.antlr.v4.runtime.DefaultErrorStrategy) RecognitionException(org.antlr.v4.runtime.RecognitionException)

Aggregations

IntervalSet (org.antlr.v4.runtime.misc.IntervalSet)24 Token (org.antlr.v4.runtime.Token)22 RecognitionException (org.antlr.v4.runtime.RecognitionException)19 CommonTokenStream (org.antlr.v4.runtime.CommonTokenStream)15 File (java.io.File)11 ParserRuleContext (org.antlr.v4.runtime.ParserRuleContext)10 BaseRuntimeTest.antlrOnString (org.antlr.v4.test.runtime.BaseRuntimeTest.antlrOnString)10 ATNState (org.antlr.v4.runtime.atn.ATNState)9 IOException (java.io.IOException)8 BaseErrorListener (org.antlr.v4.runtime.BaseErrorListener)8 Parser (org.antlr.v4.runtime.Parser)8 BaseRuntimeTest.writeFile (org.antlr.v4.test.runtime.BaseRuntimeTest.writeFile)8 ArrayList (java.util.ArrayList)7 ATN (org.antlr.v4.runtime.atn.ATN)6 Pair (com.abubusoft.kripton.common.Pair)5 InputMismatchException (org.antlr.v4.runtime.InputMismatchException)5 TokenStream (org.antlr.v4.runtime.TokenStream)5 BeetlException (org.beetl.core.exception.BeetlException)5 STGroupString (org.stringtemplate.v4.STGroupString)5 CommonToken (org.antlr.v4.runtime.CommonToken)4