# COMS W4115

Programming Languages and Translators

Lecture 7: Implementing a Lexical Analyzer

February 13, 2008

## Lecture Outline

- Review
- Finite automata
- Converting a regular expression to an NFA
- Converting an NFA to a DFA
- Simulating an NFA

## 1. Review

- Issues for a the lexical analyzer
- Tokens/patterns/lexemes/attributes
- Specifying a lexical analyzer with Lex
- Creating a lexical processor with Lex

## 2. Finite Automata

- A nondeterministic finite automaton (NFA) consists of
- A finite set of states
*S*.
- An input alphabet consisting of a finite set of symbols Σ.
- A transition function δ that maps
*S* × (&Sigma ∪ {ε})
to subsets of *S*. This transition function can be represented
by a transition graph in which the nodes are labeled by states
and there is a directed edge labeled *a* from node *w* to node *v*
if &delta(*w*, *a*) contains *v*.
- An initial state
*s*_{0} in *S*.
*F*, a subset of *S*, called the final (or accepting) states.

- An NFA accepts an input string
*x* if there is a path in the transition
graph from the initial state to a final state that spells out *x*.
- The language defined by an NFA is the set of strings accepted by the NFA.
- A deterministic finite automaton (DFA) is an NFA in which
- There are no ε moves, and
- For each state
*s* and input symbol *a* there is exactly one transition
out of *s* labeled *a*.

## 3. Converting a Regular Expression to an NFA

- The McNaughton-Yamada-Thompson algorithm: Algorithm 3.23, pp. 159-163.

## 4. Converting an NFA to a DFA

- The subset construction: Algorithm 3.20, ALSU, p. 153.

## 5. Simulating an NFA

- Two-stack simulation of an NFA: Algorithm 3.22, pp. 156-159.

## 6. Reading Assignment

- Read Chapter 3, all sections except 3.9.

aho@cs.columbia.edu