Search in sources :

Example 1 with Alternative

use of org.antlr.v4.tool.Alternative in project antlr4 by antlr.

the class DefaultErrorStrategy method sync.

/**
	 * The default implementation of {@link ANTLRErrorStrategy#sync} makes sure
	 * that the current lookahead symbol is consistent with what were expecting
	 * at this point in the ATN. You can call this anytime but ANTLR only
	 * generates code to check before subrules/loops and each iteration.
	 *
	 * <p>Implements Jim Idle's magic sync mechanism in closures and optional
	 * subrules. E.g.,</p>
	 *
	 * <pre>
	 * a : sync ( stuff sync )* ;
	 * sync : {consume to what can follow sync} ;
	 * </pre>
	 *
	 * At the start of a sub rule upon error, {@link #sync} performs single
	 * token deletion, if possible. If it can't do that, it bails on the current
	 * rule and uses the default error recovery, which consumes until the
	 * resynchronization set of the current rule.
	 *
	 * <p>If the sub rule is optional ({@code (...)?}, {@code (...)*}, or block
	 * with an empty alternative), then the expected set includes what follows
	 * the subrule.</p>
	 *
	 * <p>During loop iteration, it consumes until it sees a token that can start a
	 * sub rule or what follows loop. Yes, that is pretty aggressive. We opt to
	 * stay in the loop as long as possible.</p>
	 *
	 * <p><strong>ORIGINS</strong></p>
	 *
	 * <p>Previous versions of ANTLR did a poor job of their recovery within loops.
	 * A single mismatch token or missing token would force the parser to bail
	 * out of the entire rules surrounding the loop. So, for rule</p>
	 *
	 * <pre>
	 * classDef : 'class' ID '{' member* '}'
	 * </pre>
	 *
	 * input with an extra token between members would force the parser to
	 * consume until it found the next class definition rather than the next
	 * member definition of the current class.
	 *
	 * <p>This functionality cost a little bit of effort because the parser has to
	 * compare token set at the start of the loop and at each iteration. If for
	 * some reason speed is suffering for you, you can turn off this
	 * functionality by simply overriding this method as a blank { }.</p>
	 */
@Override
public void sync(Parser recognizer) throws RecognitionException {
    ATNState s = recognizer.getInterpreter().atn.states.get(recognizer.getState());
    // If already recovering, don't try to sync
    if (inErrorRecoveryMode(recognizer)) {
        return;
    }
    TokenStream tokens = recognizer.getInputStream();
    int la = tokens.LA(1);
    // try cheaper subset first; might get lucky. seems to shave a wee bit off
    IntervalSet nextTokens = recognizer.getATN().nextTokens(s);
    if (nextTokens.contains(Token.EPSILON) || nextTokens.contains(la)) {
        return;
    }
    switch(s.getStateType()) {
        case ATNState.BLOCK_START:
        case ATNState.STAR_BLOCK_START:
        case ATNState.PLUS_BLOCK_START:
        case ATNState.STAR_LOOP_ENTRY:
            // report error and recover if possible
            if (singleTokenDeletion(recognizer) != null) {
                return;
            }
            throw new InputMismatchException(recognizer);
        case ATNState.PLUS_LOOP_BACK:
        case ATNState.STAR_LOOP_BACK:
            //			System.err.println("at loop back: "+s.getClass().getSimpleName());
            reportUnwantedToken(recognizer);
            IntervalSet expecting = recognizer.getExpectedTokens();
            IntervalSet whatFollowsLoopIterationOrRule = expecting.or(getErrorRecoverySet(recognizer));
            consumeUntil(recognizer, whatFollowsLoopIterationOrRule);
            break;
        default:
            // do nothing if we can't identify the exact kind of ATN state
            break;
    }
}
Also used : IntervalSet(org.antlr.v4.runtime.misc.IntervalSet) ATNState(org.antlr.v4.runtime.atn.ATNState)

Example 2 with Alternative

use of org.antlr.v4.tool.Alternative in project antlr4 by antlr.

the class ParserATNSimulator method execATN.

/** Performs ATN simulation to compute a predicted alternative based
	 *  upon the remaining input, but also updates the DFA cache to avoid
	 *  having to traverse the ATN again for the same input sequence.

	 There are some key conditions we're looking for after computing a new
	 set of ATN configs (proposed DFA state):
	       * if the set is empty, there is no viable alternative for current symbol
	       * does the state uniquely predict an alternative?
	       * does the state have a conflict that would prevent us from
	         putting it on the work list?

	 We also have some key operations to do:
	       * add an edge from previous DFA state to potentially new DFA state, D,
	         upon current symbol but only if adding to work list, which means in all
	         cases except no viable alternative (and possibly non-greedy decisions?)
	       * collecting predicates and adding semantic context to DFA accept states
	       * adding rule context to context-sensitive DFA accept states
	       * consuming an input symbol
	       * reporting a conflict
	       * reporting an ambiguity
	       * reporting a context sensitivity
	       * reporting insufficient predicates

	 cover these cases:
	    dead end
	    single alt
	    single alt + preds
	    conflict
	    conflict + preds
	 */
protected int execATN(DFA dfa, DFAState s0, TokenStream input, int startIndex, ParserRuleContext outerContext) {
    if (debug || debug_list_atn_decisions) {
        System.out.println("execATN decision " + dfa.decision + " exec LA(1)==" + getLookaheadName(input) + " line " + input.LT(1).getLine() + ":" + input.LT(1).getCharPositionInLine());
    }
    DFAState previousD = s0;
    if (debug)
        System.out.println("s0 = " + s0);
    int t = input.LA(1);
    while (true) {
        // while more work
        DFAState D = getExistingTargetState(previousD, t);
        if (D == null) {
            D = computeTargetState(dfa, previousD, t);
        }
        if (D == ERROR) {
            // if any configs in previous dipped into outer context, that
            // means that input up to t actually finished entry rule
            // at least for SLL decision. Full LL doesn't dip into outer
            // so don't need special case.
            // We will get an error no matter what so delay until after
            // decision; better error message. Also, no reachable target
            // ATN states in SLL implies LL will also get nowhere.
            // If conflict in states that dip out, choose min since we
            // will get error no matter what.
            NoViableAltException e = noViableAlt(input, outerContext, previousD.configs, startIndex);
            input.seek(startIndex);
            int alt = getSynValidOrSemInvalidAltThatFinishedDecisionEntryRule(previousD.configs, outerContext);
            if (alt != ATN.INVALID_ALT_NUMBER) {
                return alt;
            }
            throw e;
        }
        if (D.requiresFullContext && mode != PredictionMode.SLL) {
            // IF PREDS, MIGHT RESOLVE TO SINGLE ALT => SLL (or syntax error)
            BitSet conflictingAlts = D.configs.conflictingAlts;
            if (D.predicates != null) {
                if (debug)
                    System.out.println("DFA state has preds in DFA sim LL failover");
                int conflictIndex = input.index();
                if (conflictIndex != startIndex) {
                    input.seek(startIndex);
                }
                conflictingAlts = evalSemanticContext(D.predicates, outerContext, true);
                if (conflictingAlts.cardinality() == 1) {
                    if (debug)
                        System.out.println("Full LL avoided");
                    return conflictingAlts.nextSetBit(0);
                }
                if (conflictIndex != startIndex) {
                    // restore the index so reporting the fallback to full
                    // context occurs with the index at the correct spot
                    input.seek(conflictIndex);
                }
            }
            if (dfa_debug)
                System.out.println("ctx sensitive state " + outerContext + " in " + D);
            boolean fullCtx = true;
            ATNConfigSet s0_closure = computeStartState(dfa.atnStartState, outerContext, fullCtx);
            reportAttemptingFullContext(dfa, conflictingAlts, D.configs, startIndex, input.index());
            int alt = execATNWithFullContext(dfa, D, s0_closure, input, startIndex, outerContext);
            return alt;
        }
        if (D.isAcceptState) {
            if (D.predicates == null) {
                return D.prediction;
            }
            int stopIndex = input.index();
            input.seek(startIndex);
            BitSet alts = evalSemanticContext(D.predicates, outerContext, true);
            switch(alts.cardinality()) {
                case 0:
                    throw noViableAlt(input, outerContext, D.configs, startIndex);
                case 1:
                    return alts.nextSetBit(0);
                default:
                    // report ambiguity after predicate evaluation to make sure the correct
                    // set of ambig alts is reported.
                    reportAmbiguity(dfa, D, startIndex, stopIndex, false, alts, D.configs);
                    return alts.nextSetBit(0);
            }
        }
        previousD = D;
        if (t != IntStream.EOF) {
            input.consume();
            t = input.LA(1);
        }
    }
}
Also used : DFAState(org.antlr.v4.runtime.dfa.DFAState) NoViableAltException(org.antlr.v4.runtime.NoViableAltException) BitSet(java.util.BitSet)

Example 3 with Alternative

use of org.antlr.v4.tool.Alternative in project antlr4 by antlr.

the class GrammarParserInterpreter method getAllPossibleParseTrees.

/** Given an ambiguous parse information, return the list of ambiguous parse trees.
	 *  An ambiguity occurs when a specific token sequence can be recognized
	 *  in more than one way by the grammar. These ambiguities are detected only
	 *  at decision points.
	 *
	 *  The list of trees includes the actual interpretation (that for
	 *  the minimum alternative number) and all ambiguous alternatives.
	 *  The actual interpretation is always first.
	 *
	 *  This method reuses the same physical input token stream used to
	 *  detect the ambiguity by the original parser in the first place.
	 *  This method resets/seeks within but does not alter originalParser.
	 *
	 *  The trees are rooted at the node whose start..stop token indices
	 *  include the start and stop indices of this ambiguity event. That is,
	 *  the trees returned will always include the complete ambiguous subphrase
	 *  identified by the ambiguity event.  The subtrees returned will
	 *  also always contain the node associated with the overridden decision.
	 *
	 *  Be aware that this method does NOT notify error or parse listeners as
	 *  it would trigger duplicate or otherwise unwanted events.
	 *
	 *  This uses a temporary ParserATNSimulator and a ParserInterpreter
	 *  so we don't mess up any statistics, event lists, etc...
	 *  The parse tree constructed while identifying/making ambiguityInfo is
	 *  not affected by this method as it creates a new parser interp to
	 *  get the ambiguous interpretations.
	 *
	 *  Nodes in the returned ambig trees are independent of the original parse
	 *  tree (constructed while identifying/creating ambiguityInfo).
	 *
	 *  @since 4.5.1
	 *
	 *  @param g              From which grammar should we drive alternative
	 *                        numbers and alternative labels.
	 *
	 *  @param originalParser The parser used to create ambiguityInfo; it
	 *                        is not modified by this routine and can be either
	 *                        a generated or interpreted parser. It's token
	 *                        stream *is* reset/seek()'d.
	 *  @param tokens		  A stream of tokens to use with the temporary parser.
	 *                        This will often be just the token stream within the
	 *                        original parser but here it is for flexibility.
	 *
	 *  @param decision       Which decision to try different alternatives for.
	 *
	 *  @param alts           The set of alternatives to try while re-parsing.
	 *
	 *  @param startIndex	  The index of the first token of the ambiguous
	 *                        input or other input of interest.
	 *
	 *  @param stopIndex      The index of the last token of the ambiguous input.
	 *                        The start and stop indexes are used primarily to
	 *                        identify how much of the resulting parse tree
	 *                        to return.
	 *
	 *  @param startRuleIndex The start rule for the entire grammar, not
	 *                        the ambiguous decision. We re-parse the entire input
	 *                        and so we need the original start rule.
	 *
	 *  @return               The list of all possible interpretations of
	 *                        the input for the decision in ambiguityInfo.
	 *                        The actual interpretation chosen by the parser
	 *                        is always given first because this method
	 *                        retests the input in alternative order and
	 *                        ANTLR always resolves ambiguities by choosing
	 *                        the first alternative that matches the input.
	 *                        The subtree returned
	 *
	 *  @throws RecognitionException Throws upon syntax error while matching
	 *                               ambig input.
	 */
public static List<ParserRuleContext> getAllPossibleParseTrees(Grammar g, Parser originalParser, TokenStream tokens, int decision, BitSet alts, int startIndex, int stopIndex, int startRuleIndex) throws RecognitionException {
    List<ParserRuleContext> trees = new ArrayList<ParserRuleContext>();
    // Create a new parser interpreter to parse the ambiguous subphrase
    ParserInterpreter parser = deriveTempParserInterpreter(g, originalParser, tokens);
    if (stopIndex >= (tokens.size() - 1)) {
        // if we are pointing at EOF token
        // EOF is not in tree, so must be 1 less than last non-EOF token
        stopIndex = tokens.size() - 2;
    }
    // get ambig trees
    int alt = alts.nextSetBit(0);
    while (alt >= 0) {
        // re-parse entire input for all ambiguous alternatives
        // (don't have to do first as it's been parsed, but do again for simplicity
        //  using this temp parser.)
        parser.reset();
        parser.addDecisionOverride(decision, startIndex, alt);
        ParserRuleContext t = parser.parse(startRuleIndex);
        GrammarInterpreterRuleContext ambigSubTree = (GrammarInterpreterRuleContext) Trees.getRootOfSubtreeEnclosingRegion(t, startIndex, stopIndex);
        // Use higher of overridden decision tree or tree enclosing all tokens
        if (Trees.isAncestorOf(parser.getOverrideDecisionRoot(), ambigSubTree)) {
            ambigSubTree = (GrammarInterpreterRuleContext) parser.getOverrideDecisionRoot();
        }
        trees.add(ambigSubTree);
        alt = alts.nextSetBit(alt + 1);
    }
    return trees;
}
Also used : ParserRuleContext(org.antlr.v4.runtime.ParserRuleContext) ParserInterpreter(org.antlr.v4.runtime.ParserInterpreter) ArrayList(java.util.ArrayList)

Example 4 with Alternative

use of org.antlr.v4.tool.Alternative in project antlr4 by antlr.

the class GrammarParserInterpreter method getLookaheadParseTrees.

/** Return a list of parse trees, one for each alternative in a decision
	 *  given the same input.
	 *
	 *  Very similar to {@link #getAllPossibleParseTrees} except
	 *  that it re-parses the input for every alternative in a decision,
	 *  not just the ambiguous ones (there is no alts parameter here).
	 *  This method also tries to reduce the size of the parse trees
	 *  by stripping away children of the tree that are completely out of range
	 *  of startIndex..stopIndex. Also, because errors are expected, we
	 *  use a specialized error handler that more or less bails out
	 *  but that also consumes the first erroneous token at least. This
	 *  ensures that an error node will be in the parse tree for display.
	 *
	 *  NOTES:
    // we must parse the entire input now with decision overrides
	// we cannot parse a subset because it could be that a decision
	// above our decision of interest needs to read way past
	// lookaheadInfo.stopIndex. It seems like there is no escaping
	// the use of a full and complete token stream if we are
	// resetting to token index 0 and re-parsing from the start symbol.
	// It's not easy to restart parsing somewhere in the middle like a
	// continuation because our call stack does not match the
	// tree stack because of left recursive rule rewriting. grrrr!
	 *
	 * @since 4.5.1
	 */
public static List<ParserRuleContext> getLookaheadParseTrees(Grammar g, ParserInterpreter originalParser, TokenStream tokens, int startRuleIndex, int decision, int startIndex, int stopIndex) {
    List<ParserRuleContext> trees = new ArrayList<ParserRuleContext>();
    // Create a new parser interpreter to parse the ambiguous subphrase
    ParserInterpreter parser = deriveTempParserInterpreter(g, originalParser, tokens);
    DecisionState decisionState = originalParser.getATN().decisionToState.get(decision);
    for (int alt = 1; alt <= decisionState.getTransitions().length; alt++) {
        // re-parse entire input for all ambiguous alternatives
        // (don't have to do first as it's been parsed, but do again for simplicity
        //  using this temp parser.)
        GrammarParserInterpreter.BailButConsumeErrorStrategy errorHandler = new GrammarParserInterpreter.BailButConsumeErrorStrategy();
        parser.setErrorHandler(errorHandler);
        parser.reset();
        parser.addDecisionOverride(decision, startIndex, alt);
        ParserRuleContext tt = parser.parse(startRuleIndex);
        int stopTreeAt = stopIndex;
        if (errorHandler.firstErrorTokenIndex >= 0) {
            // cut off rest at first error
            stopTreeAt = errorHandler.firstErrorTokenIndex;
        }
        Interval overallRange = tt.getSourceInterval();
        if (stopTreeAt > overallRange.b) {
            // If we try to look beyond range of tree, stopTreeAt must be EOF
            // for which there is no EOF ref in grammar. That means tree
            // will not have node for stopTreeAt; limit to overallRange.b
            stopTreeAt = overallRange.b;
        }
        ParserRuleContext subtree = Trees.getRootOfSubtreeEnclosingRegion(tt, startIndex, stopTreeAt);
        // Use higher of overridden decision tree or tree enclosing all tokens
        if (Trees.isAncestorOf(parser.getOverrideDecisionRoot(), subtree)) {
            subtree = parser.getOverrideDecisionRoot();
        }
        Trees.stripChildrenOutOfRange(subtree, parser.getOverrideDecisionRoot(), startIndex, stopTreeAt);
        trees.add(subtree);
    }
    return trees;
}
Also used : ParserRuleContext(org.antlr.v4.runtime.ParserRuleContext) ParserInterpreter(org.antlr.v4.runtime.ParserInterpreter) ArrayList(java.util.ArrayList) DecisionState(org.antlr.v4.runtime.atn.DecisionState) Interval(org.antlr.v4.runtime.misc.Interval)

Example 5 with Alternative

use of org.antlr.v4.tool.Alternative in project antlr4 by antlr.

the class GrammarParserInterpreter method deriveTempParserInterpreter.

/** Derive a new parser from an old one that has knowledge of the grammar.
	 *  The Grammar object is used to correctly compute outer alternative
	 *  numbers for parse tree nodes. A parser of the same type is created
	 *  for subclasses of {@link ParserInterpreter}.
	 */
public static ParserInterpreter deriveTempParserInterpreter(Grammar g, Parser originalParser, TokenStream tokens) {
    ParserInterpreter parser;
    if (originalParser instanceof ParserInterpreter) {
        Class<? extends ParserInterpreter> c = originalParser.getClass().asSubclass(ParserInterpreter.class);
        try {
            Constructor<? extends ParserInterpreter> ctor = c.getConstructor(Grammar.class, ATN.class, TokenStream.class);
            parser = ctor.newInstance(g, originalParser.getATN(), originalParser.getTokenStream());
        } catch (Exception e) {
            throw new IllegalArgumentException("can't create parser to match incoming " + originalParser.getClass().getSimpleName(), e);
        }
    } else {
        // must've been a generated parser
        char[] serializedAtn = ATNSerializer.getSerializedAsChars(originalParser.getATN());
        ATN deserialized = new ATNDeserializer().deserialize(serializedAtn);
        parser = new ParserInterpreter(originalParser.getGrammarFileName(), originalParser.getVocabulary(), Arrays.asList(originalParser.getRuleNames()), deserialized, tokens);
    }
    parser.setInputStream(tokens);
    // Make sure that we don't get any error messages from using this temporary parser
    parser.setErrorHandler(new BailErrorStrategy());
    parser.removeErrorListeners();
    parser.removeParseListeners();
    parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION);
    return parser;
}
Also used : ATNDeserializer(org.antlr.v4.runtime.atn.ATNDeserializer) ParserInterpreter(org.antlr.v4.runtime.ParserInterpreter) BailErrorStrategy(org.antlr.v4.runtime.BailErrorStrategy) ATN(org.antlr.v4.runtime.atn.ATN) InputMismatchException(org.antlr.v4.runtime.InputMismatchException) RecognitionException(org.antlr.v4.runtime.RecognitionException)

Aggregations

ArrayList (java.util.ArrayList)4 Alternative (org.antlr.v4.tool.Alternative)4 Rule (org.antlr.v4.tool.Rule)4 GrammarAST (org.antlr.v4.tool.ast.GrammarAST)4 BitSet (java.util.BitSet)3 ParserInterpreter (org.antlr.v4.runtime.ParserInterpreter)3 ATNState (org.antlr.v4.runtime.atn.ATNState)3 IntervalSet (org.antlr.v4.runtime.misc.IntervalSet)3 HashMap (java.util.HashMap)2 LinkedHashMap (java.util.LinkedHashMap)2 List (java.util.List)2 NoViableAltException (org.antlr.v4.runtime.NoViableAltException)2 ParserRuleContext (org.antlr.v4.runtime.ParserRuleContext)2 DFAState (org.antlr.v4.runtime.dfa.DFAState)2 LeftRecursiveRule (org.antlr.v4.tool.LeftRecursiveRule)2 ActionAST (org.antlr.v4.tool.ast.ActionAST)2 Map (java.util.Map)1 CommonToken (org.antlr.runtime.CommonToken)1 Token (org.antlr.runtime.Token)1 BailErrorStrategy (org.antlr.v4.runtime.BailErrorStrategy)1