Khayyam Math

DFA construction: build an automaton for (a|b)* ending in 'ab'

A minimal three-state deterministic finite automaton that accepts exactly the strings ending in 'ab' — and the systematic way to design it.

startq₀q₁q₂abbaab

Try this live →

What this shows

A deterministic finite automaton, or DFA, is a tiny rule-based machine that reads an input string one symbol at a time and either accepts it (says "yes, this is in the language") or rejects it. The machine has a fixed number of states, exactly one of which is the start state, some subset of which are accept states, and a transition function that says — for every state and every input symbol — which state to move to next.

To accept exactly the strings that end in "ab" over the alphabet {a, b}, we need just three states:

q₀ : we haven't seen "ab" recently — input is junk so far. q₁ : we just read an "a" — one more symbol decides things. q₂ : we read "ab" as the last two symbols — ACCEPT.

Transitions:

q₀ --a--> q₁ q₀ --b--> q₀ q₁ --a--> q₁ q₁ --b--> q₂ q₂ --a--> q₁ q₂ --b--> q₀

Read carefully: from q₂ we go back to q₁ on 'a' (because the last symbol read is now 'a', not "ab"), and to q₀ on 'b' (the last two symbols are now "bb", not "ab"). Those are the easy-to-miss transitions; the figure pins them down.

This DFA is minimal — three states is the smallest number that can do the job, because we need to remember whether the last character was 'a' (one bit of state) plus an accept flag (another).

Where it shows up

Every regular expression compiles to a DFA. When you type a grep pattern, when a build pipeline runs a glob match, when a network firewall checks whether a packet payload matches a rule — under the hood, a DFA is reading characters one at a time and walking the transition table.

DFAs are the simplest member of the formal-languages hierarchy. Stepping up gives push-down automata (for context-free languages, like balanced brackets), then linear bounded automata (for context-sensitive), then Turing machines (for everything computable). Knowing where the boundary sits — what a DFA can and can't recognise — is central to understanding why some languages are harder to parse than others.

Frequently asked questions

Why three states and not two?

Because we need to remember the last symbol read AND whether we just saw 'ab'. Two states can only distinguish 'accept' from 'reject', and that's not enough — we also need a 'just saw an a, waiting for b' state. A formal lower-bound proof uses the Myhill-Nerode theorem to show three is optimal.

What's the difference between a DFA and an NFA?

An NFA — nondeterministic finite automaton — can have multiple transitions on the same symbol from the same state, or transitions with no symbol at all (ε-moves). Every NFA can be converted to an equivalent DFA, sometimes at the cost of exponentially more states. DFAs are easier to execute; NFAs are easier to design.

Does the order of accept-state check matter?

After reading the entire input string, look at the current state. If it's in the accept set, accept; otherwise reject. Intermediate visits to accept states during reading don't matter — only the final state.

How do real-world regex engines differ?

They usually start by parsing the regex into an NFA, then either run the NFA directly or convert to a DFA. Backreferences (\1, \2) push you out of regular languages entirely, which is why patterns using them are exponentially slower.

Related topics