Programming Languages and Translators

Lecture 10: Constructing Predictive Parsers

February 25, 2008

- Review
- FIRST
- FOLLOW
- Constructing a predictive parsing table
- LL(1) grammars
- Reading

- Context-free grammars
- Top-down parsing
- Nonrecursive predictive parser
- Transformations on grammars

- FIRST(α) is the set of terminals that begin the strings
derivable from α.

If α can derive ε, then ε is also in FIRST(α). - Algorithm to compute FIRST(
*X*): - If
*X*is a terminal, then FIRST(*X*) = {*X*}. - If
*X*→ ε is a production, then add ε to FIRST(*X*). - If
*X*→*Y*_{1}*Y*_{2}...*Y*is a production for_{k}*k*≥ 1, and

for some*i*≤*k*,*Y*_{1}*Y*_{2}...*Y*-1 derives the empty string, and_{i}`a`

is in FIRST(*Y*), then add_{i}`a`

to FIRST(*X*).

If*Y*_{1}*Y*_{2}...*Y*derives the empty string, then add ε to FIRST(_{k}*X*). **Example.**Consider the grammar*G*:`S → ( S ) S | ε`

- For
*G*, FIRST(`S`

) = {`(, ε`

}.

- FOLLOW(
*A*) is the set of terminals that can appear immediately to the right of*A*in some sentential form.

If*A*can be the rightmost symbol in some sentential form, then add $ to FOLLOW(*A*). - Algorithm to compute FOLLOW(
*A*) for all nonterminals*A*of a grammar: - Place $ in FOLLOW(
*S*) where*S*is the start symbol of the grammar. - If
*A*→ α*B*β is a production, then add every terminal in FIRST(β) (except for ε if it is there) to FOLLOW(*B*). - If there is a production
*A*→ α*B*, or a production*A*→ α*B*β, where FIRST(β) contains ε,

then put everything in FOLLOW(*A*) into FOLLOW(*B*) **Example.**For*G*above, FOLLOW(`S`

) = {`)`

, $}.

- Input: Grammar
*G*. - Output: Predictive parsing table
*M*. - Method:

```
for (each production
```*A* → α in *G*) {
for (each terminal *a* in FIRST(α))
add *A* → α to *M*[*A, a*];
if (ε is in FIRST(α))
for (each symbol *b* in FOLLOW(*A*))
add *A* → α to *M*[*A, b*];
}
make each undefined entry of *M* be **error**;

`S → ( S ) S | ε`

```
Input Symbol
Nonterminal ( ) $
S S → (S)S S → ε S → ε
```

`S → S ( S ) | ε`

`S`

) = {`(, ε`

}`S`

) = {`(, )`

, $}```
Input Symbol
Nonterminal ( ) $
S S → (S)S S → ε S → ε
S → ε
```

- A grammar is LL(1) iff whenever
`A → α | β`

are two distinct productions, the following three conditions hold: - For no terminal
`a`

do both α and β derive strings beginning with`a`

. - At most one of α and β can derive the empty string.
- If β derives the empty string, then α does not derive any string
beginning with a terminal in FOLLOW(
`A`

).

Likewise, if α derives the empty string, then β does not derive any string beginning with a terminal in FOLLOW(`A`

). - We can use the algorithm above to construct a predictive parsing table with uniquely defined entries for any LL(1) grammar.
- The first "L" in LL(1) means scanning the input from left to right, the second "L" for producing a leftmost derivation, and the "1" for using one symbol of lookahead to make each parsing action decision.

- ALSU, Section 4.4

aho@cs.columbia.edu