Question 1

Why three states and not two?

Accepted Answer

Because we need to remember the last symbol read AND whether we just saw 'ab'.  Two states can only distinguish 'accept' from 'reject', and that's not enough — we also need a 'just saw an a, waiting for b' state.  A formal lower-bound proof uses the Myhill-Nerode theorem to show three is optimal.

Question 2

What's the difference between a DFA and an NFA?

Accepted Answer

An NFA — nondeterministic finite automaton — can have multiple transitions on the same symbol from the same state, or transitions with no symbol at all (ε-moves).  Every NFA can be converted to an equivalent DFA, sometimes at the cost of exponentially more states.  DFAs are easier to execute; NFAs are easier to design.

Question 3

Does the order of accept-state check matter?

Accepted Answer

After reading the entire input string, look at the current state.  If it's in the accept set, accept; otherwise reject.  Intermediate visits to accept states during reading don't matter — only the final state.

Question 4

How do real-world regex engines differ?

Accepted Answer

They usually start by parsing the regex into an NFA, then either run the NFA directly or convert to a DFA.  Backreferences (\1, \2) push you out of regular languages entirely, which is why patterns using them are exponentially slower.

DFA construction: build an automaton for (a|b)* ending in 'ab'

What this shows

Where it shows up

Frequently asked questions

Related topics