Abstract Syntax

The LR parser can be used to shift and reduce according to grammar rules.

A compiler must do more than recognize whether a sentence belongs to the language of a grammar - it must do something useful with that sentence.

The parser also need to record the semantic inluding the type / value during parsing.

Two ways: semantic actions or abstract syntax tree.

1. semantic actions

For a rule A —> B C D, the semantic action must return a value whose type is the one associated with the nonterminal A.

The value associated with the terminals / non-terminals, usually stored in a table.

Everyt time we reduce one rule, we need to update the table to record the rule.

Semantic actions also require us to do type checking while parsing.

It is possible to write an entire compiler that fits within the semantic action phrases of an ML-Yacc parser. However, such a compiler is difficult to read and maintain. And this approach constrains the compiler to analyze the program in exactly the order it is parsed.

To improve modularity, it is better to separate issues of syntax (parsing) from issues of semantics (type-checking and translation to machine code).

2. Abstract Syntax Tree

The abstract syntax tree conveys the phrase structure of the source program, with all parsing issues resolved but without any semantic interpretation.

The ML-Yacc (or recursive-descent) parser, parsing the concrete syntax, constructs the abstract syntax tree.

In ML we represent an abstract syntax as a set of mutually recursive datatypes.

Each abstract syntax grammar production is represented by a single data constructor in ML; each nonterminal is represented by an ML type. Just as nonterminals are mutually recursive in the grammar, the ML types for abstract syntax refer to each other recursively. (For the nonterminal L of expression lists, the ML data constructors are :: and nil of the built-in list datatype.)

2.1. positions

Comparing with semantic actions, which parse from begin to end. The abstract synctax tree may lose the position of source code, so we need to remember the pos of terminals / non-terminals in the parsing tree.

2.2. Abstract Syntax Tree for Tiger

The parse.sml file is the main file to include ML Lex and ML Yacc LrParser to build the abstract syntax tree.

In the tiger.lex, we defined the regular expression to match each tokens and the Token types (including the position) it will produce.

In the tiger.grm, we defined the syntax grammar and the Abstract Syntax Tree/Node it will produce based on the rule

In the absyn.sml, we define the tree structure, which will be used in tiger.grm