Search in sources :

Example 21 with Parser

use of org.antlr.v4.runtime.Parser in project antlr4 by antlr.

the class GrammarParserInterpreter method getAllPossibleParseTrees.

/** Given an ambiguous parse information, return the list of ambiguous parse trees.
	 *  An ambiguity occurs when a specific token sequence can be recognized
	 *  in more than one way by the grammar. These ambiguities are detected only
	 *  at decision points.
	 *
	 *  The list of trees includes the actual interpretation (that for
	 *  the minimum alternative number) and all ambiguous alternatives.
	 *  The actual interpretation is always first.
	 *
	 *  This method reuses the same physical input token stream used to
	 *  detect the ambiguity by the original parser in the first place.
	 *  This method resets/seeks within but does not alter originalParser.
	 *
	 *  The trees are rooted at the node whose start..stop token indices
	 *  include the start and stop indices of this ambiguity event. That is,
	 *  the trees returned will always include the complete ambiguous subphrase
	 *  identified by the ambiguity event.  The subtrees returned will
	 *  also always contain the node associated with the overridden decision.
	 *
	 *  Be aware that this method does NOT notify error or parse listeners as
	 *  it would trigger duplicate or otherwise unwanted events.
	 *
	 *  This uses a temporary ParserATNSimulator and a ParserInterpreter
	 *  so we don't mess up any statistics, event lists, etc...
	 *  The parse tree constructed while identifying/making ambiguityInfo is
	 *  not affected by this method as it creates a new parser interp to
	 *  get the ambiguous interpretations.
	 *
	 *  Nodes in the returned ambig trees are independent of the original parse
	 *  tree (constructed while identifying/creating ambiguityInfo).
	 *
	 *  @since 4.5.1
	 *
	 *  @param g              From which grammar should we drive alternative
	 *                        numbers and alternative labels.
	 *
	 *  @param originalParser The parser used to create ambiguityInfo; it
	 *                        is not modified by this routine and can be either
	 *                        a generated or interpreted parser. It's token
	 *                        stream *is* reset/seek()'d.
	 *  @param tokens		  A stream of tokens to use with the temporary parser.
	 *                        This will often be just the token stream within the
	 *                        original parser but here it is for flexibility.
	 *
	 *  @param decision       Which decision to try different alternatives for.
	 *
	 *  @param alts           The set of alternatives to try while re-parsing.
	 *
	 *  @param startIndex	  The index of the first token of the ambiguous
	 *                        input or other input of interest.
	 *
	 *  @param stopIndex      The index of the last token of the ambiguous input.
	 *                        The start and stop indexes are used primarily to
	 *                        identify how much of the resulting parse tree
	 *                        to return.
	 *
	 *  @param startRuleIndex The start rule for the entire grammar, not
	 *                        the ambiguous decision. We re-parse the entire input
	 *                        and so we need the original start rule.
	 *
	 *  @return               The list of all possible interpretations of
	 *                        the input for the decision in ambiguityInfo.
	 *                        The actual interpretation chosen by the parser
	 *                        is always given first because this method
	 *                        retests the input in alternative order and
	 *                        ANTLR always resolves ambiguities by choosing
	 *                        the first alternative that matches the input.
	 *                        The subtree returned
	 *
	 *  @throws RecognitionException Throws upon syntax error while matching
	 *                               ambig input.
	 */
public static List<ParserRuleContext> getAllPossibleParseTrees(Grammar g, Parser originalParser, TokenStream tokens, int decision, BitSet alts, int startIndex, int stopIndex, int startRuleIndex) throws RecognitionException {
    List<ParserRuleContext> trees = new ArrayList<ParserRuleContext>();
    // Create a new parser interpreter to parse the ambiguous subphrase
    ParserInterpreter parser = deriveTempParserInterpreter(g, originalParser, tokens);
    if (stopIndex >= (tokens.size() - 1)) {
        // if we are pointing at EOF token
        // EOF is not in tree, so must be 1 less than last non-EOF token
        stopIndex = tokens.size() - 2;
    }
    // get ambig trees
    int alt = alts.nextSetBit(0);
    while (alt >= 0) {
        // re-parse entire input for all ambiguous alternatives
        // (don't have to do first as it's been parsed, but do again for simplicity
        //  using this temp parser.)
        parser.reset();
        parser.addDecisionOverride(decision, startIndex, alt);
        ParserRuleContext t = parser.parse(startRuleIndex);
        GrammarInterpreterRuleContext ambigSubTree = (GrammarInterpreterRuleContext) Trees.getRootOfSubtreeEnclosingRegion(t, startIndex, stopIndex);
        // Use higher of overridden decision tree or tree enclosing all tokens
        if (Trees.isAncestorOf(parser.getOverrideDecisionRoot(), ambigSubTree)) {
            ambigSubTree = (GrammarInterpreterRuleContext) parser.getOverrideDecisionRoot();
        }
        trees.add(ambigSubTree);
        alt = alts.nextSetBit(alt + 1);
    }
    return trees;
}
Also used : ParserRuleContext(org.antlr.v4.runtime.ParserRuleContext) ParserInterpreter(org.antlr.v4.runtime.ParserInterpreter) ArrayList(java.util.ArrayList)

Example 22 with Parser

use of org.antlr.v4.runtime.Parser in project antlr4 by antlr.

the class GrammarParserInterpreter method getLookaheadParseTrees.

/** Return a list of parse trees, one for each alternative in a decision
	 *  given the same input.
	 *
	 *  Very similar to {@link #getAllPossibleParseTrees} except
	 *  that it re-parses the input for every alternative in a decision,
	 *  not just the ambiguous ones (there is no alts parameter here).
	 *  This method also tries to reduce the size of the parse trees
	 *  by stripping away children of the tree that are completely out of range
	 *  of startIndex..stopIndex. Also, because errors are expected, we
	 *  use a specialized error handler that more or less bails out
	 *  but that also consumes the first erroneous token at least. This
	 *  ensures that an error node will be in the parse tree for display.
	 *
	 *  NOTES:
    // we must parse the entire input now with decision overrides
	// we cannot parse a subset because it could be that a decision
	// above our decision of interest needs to read way past
	// lookaheadInfo.stopIndex. It seems like there is no escaping
	// the use of a full and complete token stream if we are
	// resetting to token index 0 and re-parsing from the start symbol.
	// It's not easy to restart parsing somewhere in the middle like a
	// continuation because our call stack does not match the
	// tree stack because of left recursive rule rewriting. grrrr!
	 *
	 * @since 4.5.1
	 */
public static List<ParserRuleContext> getLookaheadParseTrees(Grammar g, ParserInterpreter originalParser, TokenStream tokens, int startRuleIndex, int decision, int startIndex, int stopIndex) {
    List<ParserRuleContext> trees = new ArrayList<ParserRuleContext>();
    // Create a new parser interpreter to parse the ambiguous subphrase
    ParserInterpreter parser = deriveTempParserInterpreter(g, originalParser, tokens);
    DecisionState decisionState = originalParser.getATN().decisionToState.get(decision);
    for (int alt = 1; alt <= decisionState.getTransitions().length; alt++) {
        // re-parse entire input for all ambiguous alternatives
        // (don't have to do first as it's been parsed, but do again for simplicity
        //  using this temp parser.)
        GrammarParserInterpreter.BailButConsumeErrorStrategy errorHandler = new GrammarParserInterpreter.BailButConsumeErrorStrategy();
        parser.setErrorHandler(errorHandler);
        parser.reset();
        parser.addDecisionOverride(decision, startIndex, alt);
        ParserRuleContext tt = parser.parse(startRuleIndex);
        int stopTreeAt = stopIndex;
        if (errorHandler.firstErrorTokenIndex >= 0) {
            // cut off rest at first error
            stopTreeAt = errorHandler.firstErrorTokenIndex;
        }
        Interval overallRange = tt.getSourceInterval();
        if (stopTreeAt > overallRange.b) {
            // If we try to look beyond range of tree, stopTreeAt must be EOF
            // for which there is no EOF ref in grammar. That means tree
            // will not have node for stopTreeAt; limit to overallRange.b
            stopTreeAt = overallRange.b;
        }
        ParserRuleContext subtree = Trees.getRootOfSubtreeEnclosingRegion(tt, startIndex, stopTreeAt);
        // Use higher of overridden decision tree or tree enclosing all tokens
        if (Trees.isAncestorOf(parser.getOverrideDecisionRoot(), subtree)) {
            subtree = parser.getOverrideDecisionRoot();
        }
        Trees.stripChildrenOutOfRange(subtree, parser.getOverrideDecisionRoot(), startIndex, stopTreeAt);
        trees.add(subtree);
    }
    return trees;
}
Also used : ParserRuleContext(org.antlr.v4.runtime.ParserRuleContext) ParserInterpreter(org.antlr.v4.runtime.ParserInterpreter) ArrayList(java.util.ArrayList) DecisionState(org.antlr.v4.runtime.atn.DecisionState) Interval(org.antlr.v4.runtime.misc.Interval)

Example 23 with Parser

use of org.antlr.v4.runtime.Parser in project antlr4 by antlr.

the class GrammarParserInterpreter method deriveTempParserInterpreter.

/** Derive a new parser from an old one that has knowledge of the grammar.
	 *  The Grammar object is used to correctly compute outer alternative
	 *  numbers for parse tree nodes. A parser of the same type is created
	 *  for subclasses of {@link ParserInterpreter}.
	 */
public static ParserInterpreter deriveTempParserInterpreter(Grammar g, Parser originalParser, TokenStream tokens) {
    ParserInterpreter parser;
    if (originalParser instanceof ParserInterpreter) {
        Class<? extends ParserInterpreter> c = originalParser.getClass().asSubclass(ParserInterpreter.class);
        try {
            Constructor<? extends ParserInterpreter> ctor = c.getConstructor(Grammar.class, ATN.class, TokenStream.class);
            parser = ctor.newInstance(g, originalParser.getATN(), originalParser.getTokenStream());
        } catch (Exception e) {
            throw new IllegalArgumentException("can't create parser to match incoming " + originalParser.getClass().getSimpleName(), e);
        }
    } else {
        // must've been a generated parser
        char[] serializedAtn = ATNSerializer.getSerializedAsChars(originalParser.getATN());
        ATN deserialized = new ATNDeserializer().deserialize(serializedAtn);
        parser = new ParserInterpreter(originalParser.getGrammarFileName(), originalParser.getVocabulary(), Arrays.asList(originalParser.getRuleNames()), deserialized, tokens);
    }
    parser.setInputStream(tokens);
    // Make sure that we don't get any error messages from using this temporary parser
    parser.setErrorHandler(new BailErrorStrategy());
    parser.removeErrorListeners();
    parser.removeParseListeners();
    parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION);
    return parser;
}
Also used : ATNDeserializer(org.antlr.v4.runtime.atn.ATNDeserializer) ParserInterpreter(org.antlr.v4.runtime.ParserInterpreter) BailErrorStrategy(org.antlr.v4.runtime.BailErrorStrategy) ATN(org.antlr.v4.runtime.atn.ATN) InputMismatchException(org.antlr.v4.runtime.InputMismatchException) RecognitionException(org.antlr.v4.runtime.RecognitionException)

Example 24 with Parser

use of org.antlr.v4.runtime.Parser in project antlr4 by antlr.

the class GrammarTransformPipeline method extractImplicitLexer.

/** Build lexer grammar from combined grammar that looks like:
	 *
	 *  (COMBINED_GRAMMAR A
	 *      (tokens { X (= Y 'y'))
	 *      (OPTIONS (= x 'y'))
	 *      (@ members {foo})
	 *      (@ lexer header {package jj;})
	 *      (RULES (RULE .+)))
	 *
	 *  Move rules and actions to new tree, don't dup. Split AST apart.
	 *  We'll have this Grammar share token symbols later; don't generate
	 *  tokenVocab or tokens{} section.  Copy over named actions.
	 *
	 *  Side-effects: it removes children from GRAMMAR &amp; RULES nodes
	 *                in combined AST.  Anything cut out is dup'd before
	 *                adding to lexer to avoid "who's ur daddy" issues
	 */
public GrammarRootAST extractImplicitLexer(Grammar combinedGrammar) {
    GrammarRootAST combinedAST = combinedGrammar.ast;
    //tool.log("grammar", "before="+combinedAST.toStringTree());
    GrammarASTAdaptor adaptor = new GrammarASTAdaptor(combinedAST.token.getInputStream());
    GrammarAST[] elements = combinedAST.getChildren().toArray(new GrammarAST[0]);
    // MAKE A GRAMMAR ROOT and ID
    String lexerName = combinedAST.getChild(0).getText() + "Lexer";
    GrammarRootAST lexerAST = new GrammarRootAST(new CommonToken(ANTLRParser.GRAMMAR, "LEXER_GRAMMAR"), combinedGrammar.ast.tokenStream);
    lexerAST.grammarType = ANTLRParser.LEXER;
    lexerAST.token.setInputStream(combinedAST.token.getInputStream());
    lexerAST.addChild((GrammarAST) adaptor.create(ANTLRParser.ID, lexerName));
    // COPY OPTIONS
    GrammarAST optionsRoot = (GrammarAST) combinedAST.getFirstChildWithType(ANTLRParser.OPTIONS);
    if (optionsRoot != null && optionsRoot.getChildCount() != 0) {
        GrammarAST lexerOptionsRoot = (GrammarAST) adaptor.dupNode(optionsRoot);
        lexerAST.addChild(lexerOptionsRoot);
        GrammarAST[] options = optionsRoot.getChildren().toArray(new GrammarAST[0]);
        for (GrammarAST o : options) {
            String optionName = o.getChild(0).getText();
            if (Grammar.lexerOptions.contains(optionName) && !Grammar.doNotCopyOptionsToLexer.contains(optionName)) {
                GrammarAST optionTree = (GrammarAST) adaptor.dupTree(o);
                lexerOptionsRoot.addChild(optionTree);
                lexerAST.setOption(optionName, (GrammarAST) optionTree.getChild(1));
            }
        }
    }
    // COPY all named actions, but only move those with lexer:: scope
    List<GrammarAST> actionsWeMoved = new ArrayList<GrammarAST>();
    for (GrammarAST e : elements) {
        if (e.getType() == ANTLRParser.AT) {
            lexerAST.addChild((Tree) adaptor.dupTree(e));
            if (e.getChild(0).getText().equals("lexer")) {
                actionsWeMoved.add(e);
            }
        }
    }
    for (GrammarAST r : actionsWeMoved) {
        combinedAST.deleteChild(r);
    }
    GrammarAST combinedRulesRoot = (GrammarAST) combinedAST.getFirstChildWithType(ANTLRParser.RULES);
    if (combinedRulesRoot == null)
        return lexerAST;
    // MOVE lexer rules
    GrammarAST lexerRulesRoot = (GrammarAST) adaptor.create(ANTLRParser.RULES, "RULES");
    lexerAST.addChild(lexerRulesRoot);
    List<GrammarAST> rulesWeMoved = new ArrayList<GrammarAST>();
    GrammarASTWithOptions[] rules;
    if (combinedRulesRoot.getChildCount() > 0) {
        rules = combinedRulesRoot.getChildren().toArray(new GrammarASTWithOptions[0]);
    } else {
        rules = new GrammarASTWithOptions[0];
    }
    for (GrammarASTWithOptions r : rules) {
        String ruleName = r.getChild(0).getText();
        if (Grammar.isTokenName(ruleName)) {
            lexerRulesRoot.addChild((Tree) adaptor.dupTree(r));
            rulesWeMoved.add(r);
        }
    }
    for (GrammarAST r : rulesWeMoved) {
        combinedRulesRoot.deleteChild(r);
    }
    // Will track 'if' from IF : 'if' ; rules to avoid defining new token for 'if'
    List<Pair<GrammarAST, GrammarAST>> litAliases = Grammar.getStringLiteralAliasesFromLexerRules(lexerAST);
    Set<String> stringLiterals = combinedGrammar.getStringLiterals();
    // add strings from combined grammar (and imported grammars) into lexer
    // put them first as they are keywords; must resolve ambigs to these rules
    //		tool.log("grammar", "strings from parser: "+stringLiterals);
    int insertIndex = 0;
    nextLit: for (String lit : stringLiterals) {
        // if lexer already has a rule for literal, continue
        if (litAliases != null) {
            for (Pair<GrammarAST, GrammarAST> pair : litAliases) {
                GrammarAST litAST = pair.b;
                if (lit.equals(litAST.getText()))
                    continue nextLit;
            }
        }
        // create for each literal: (RULE <uniquename> (BLOCK (ALT <lit>))
        String rname = combinedGrammar.getStringLiteralLexerRuleName(lit);
        // can't use wizard; need special node types
        GrammarAST litRule = new RuleAST(ANTLRParser.RULE);
        BlockAST blk = new BlockAST(ANTLRParser.BLOCK);
        AltAST alt = new AltAST(ANTLRParser.ALT);
        TerminalAST slit = new TerminalAST(new CommonToken(ANTLRParser.STRING_LITERAL, lit));
        alt.addChild(slit);
        blk.addChild(alt);
        CommonToken idToken = new CommonToken(ANTLRParser.TOKEN_REF, rname);
        litRule.addChild(new TerminalAST(idToken));
        litRule.addChild(blk);
        lexerRulesRoot.insertChild(insertIndex, litRule);
        //			lexerRulesRoot.getChildren().add(0, litRule);
        // reset indexes and set litRule parent
        lexerRulesRoot.freshenParentAndChildIndexes();
        // next literal will be added after the one just added
        insertIndex++;
    }
    // TODO: take out after stable if slow
    lexerAST.sanityCheckParentAndChildIndexes();
    combinedAST.sanityCheckParentAndChildIndexes();
    //		tool.log("grammar", combinedAST.toTokenString());
    combinedGrammar.tool.log("grammar", "after extract implicit lexer =" + combinedAST.toStringTree());
    combinedGrammar.tool.log("grammar", "lexer =" + lexerAST.toStringTree());
    if (lexerRulesRoot.getChildCount() == 0)
        return null;
    return lexerAST;
}
Also used : RuleAST(org.antlr.v4.tool.ast.RuleAST) GrammarRootAST(org.antlr.v4.tool.ast.GrammarRootAST) GrammarAST(org.antlr.v4.tool.ast.GrammarAST) ArrayList(java.util.ArrayList) BlockAST(org.antlr.v4.tool.ast.BlockAST) AltAST(org.antlr.v4.tool.ast.AltAST) TerminalAST(org.antlr.v4.tool.ast.TerminalAST) GrammarASTAdaptor(org.antlr.v4.parse.GrammarASTAdaptor) CommonToken(org.antlr.runtime.CommonToken) GrammarASTWithOptions(org.antlr.v4.tool.ast.GrammarASTWithOptions) Pair(org.antlr.v4.runtime.misc.Pair)

Example 25 with Parser

use of org.antlr.v4.runtime.Parser in project antlr4 by antlr.

the class BaseJavaTest method execParser.

public ParseTree execParser(String startRuleName, String input, String parserName, String lexerName) throws Exception {
    Pair<Parser, Lexer> pl = getParserAndLexer(input, parserName, lexerName);
    Parser parser = pl.a;
    return execStartRule(startRuleName, parser);
}
Also used : Lexer(org.antlr.v4.runtime.Lexer) Parser(org.antlr.v4.runtime.Parser)

Aggregations

Test (org.junit.Test)138 Grammar (org.antlr.v4.tool.Grammar)130 LexerGrammar (org.antlr.v4.tool.LexerGrammar)117 CommonTokenStream (org.antlr.v4.runtime.CommonTokenStream)39 ANTLRInputStream (org.antlr.v4.runtime.ANTLRInputStream)33 ParseTree (org.antlr.v4.runtime.tree.ParseTree)31 ATN (org.antlr.v4.runtime.atn.ATN)19 IntervalSet (org.antlr.v4.runtime.misc.IntervalSet)16 BaseRuntimeTest (org.antlr.v4.test.runtime.BaseRuntimeTest)14 ErrorQueue (org.antlr.v4.test.runtime.ErrorQueue)14 ArrayList (java.util.ArrayList)13 ParseCancellationException (org.antlr.v4.runtime.misc.ParseCancellationException)13 Parser (org.antlr.v4.runtime.Parser)10 RecognitionException (org.antlr.v4.runtime.RecognitionException)10 DecisionInfo (org.antlr.v4.runtime.atn.DecisionInfo)10 Lexer (org.antlr.v4.runtime.Lexer)9 ParserRuleContext (org.antlr.v4.runtime.ParserRuleContext)9 LexerInterpreter (org.antlr.v4.runtime.LexerInterpreter)8 ParserInterpreter (org.antlr.v4.runtime.ParserInterpreter)8 Token (org.antlr.v4.runtime.Token)8