Skip to Content
All memories

Diving Deep into Regular Languages - Automata, Expressions, and Grammars

 — #AFL#CS Basics

Hello again, explorers of computation! In our previous journey, we introduced the fundamental concepts of regular languages. Today, we're diving even deeper, adding more detail and formal structure to our understanding. We'll reinforce the connections between machines, expressions, and grammars that define this foundational class of languages. Get ready for a more thorough look!

1. The Groundwork Revisited: More Detailed Basic Concepts

Let's solidify our understanding of the core building blocks.

Alphabets, Strings, and Languages

  • Alphabet (Σ): A finite, non-empty set of symbols. Examples: Σ={a,b}\Sigma = \{a, b\}, Σ={0,1,,9}\Sigma = \{0, 1, \dots, 9\}.
  • String: A finite sequence of symbols from Σ. The empty string, denoted by λ, is the string with zero symbols. λ=0|\lambda| = 0.
  • String Operations:
    • Concatenation: If w=a1a2anw = a_1a_2\dots a_n and v=b1b2bmv = b_1b_2\dots b_m, their concatenation wvwv is a1a2anb1b2bma_1a_2\dots a_n b_1b_2\dots b_m. Concatenation with the empty string leaves a string unchanged: λw=wλ=w\lambda w = w \lambda = w.
    • Length: w|w| denotes the number of symbols in ww. It follows that uv=u+v|uv| = |u| + |v|.
    • Reversal: wRw^R is the string written in reverse order. If w=a1anw=a_1\dots a_n, then wR=ana1w^R = a_n \dots a_1. We have (uv)R=vRuR(uv)^R = v^R u^R and (wR)R=w(w^R)^R = w.
    • Powers: wnw^n is ww concatenated with itself nn times. w0=λw^0 = \lambda.
    • Substring: Any string of consecutive symbols within ww.
    • Prefix/Suffix: If w=vuw = vu, then vv is a prefix and uu is a suffix of ww. λ is a prefix and suffix of every string.
  • Set of Strings:
    • Σ*: The set of all possible finite strings over Σ, including λ. This set is always infinite if Σ is non-empty.
    • Σ+: The set Σ* excluding λ. Σ+ = Σ* - {λ}.
  • Language (L): Defined formally as any subset of Σ*.
    • Sentence: A string ww is a sentence of language LL if wLw \in L.
    • Finite Language: A language with a finite number of sentences.
    • Infinite Language: A language with an infinite number of sentences (most interesting languages are infinite). Example: L={anbn:n0}={λ,ab,aabb,aaabbb,}L = \{a^n b^n : n \ge 0\} = \{\lambda, ab, aabb, aaabbb, \dots\}.

Language Operations

Since languages are sets, standard set operations apply:

  • Union (L1L2L_1 \cup L_2)
  • Intersection (L1L2L_1 \cap L_2)
  • Difference (L1L2L_1 - L_2)
  • Complement (L=ΣL\overline{L} = \Sigma^* - L, relative to the alphabet Σ)

We also define operations specific to languages:

  • Reverse: LR={wR:wL}L^R = \{w^R : w \in L\}.
  • Concatenation: L1L2={uv:uL1,vL2}L_1 L_2 = \{uv : u \in L_1, v \in L_2\}.
  • Powers: Ln=LL^n = L concatenated with itself nn times. L0={λ}L^0 = \{\lambda\}, L1=LL^1 = L.
  • Kleene Star (Star Closure): L=L0L1L2=i=0LiL^* = L^0 \cup L^1 \cup L^2 \cup \dots = \bigcup_{i=0}^{\infty} L^i.
  • Positive Closure: L+=L1L2=i=1LiL^+ = L^1 \cup L^2 \cup \dots = \bigcup_{i=1}^{\infty} L^i. Note that L=L+{λ}L^* = L^+ \cup \{\lambda\}.

Grammars

Grammars provide a finite way to describe potentially infinite languages. A grammar G=(V,T,S,P)G = (V, T, S, P) consists of:

  • VV: Finite set of variables (non-terminals, often uppercase letters).
  • TT: Finite set of terminal symbols (the alphabet Σ, often lowercase letters or digits). VT=V \cap T = \emptyset.
  • SS: The start variable (SVS \in V), representing the top-level concept (like 'sentence').
  • PP: Finite set of productions (rewrite rules).

Productions specify how strings can be transformed. For now, we consider general productions xyx \rightarrow y, where x(VT)+x \in (V \cup T)^+ (at least one symbol, including at least one variable) and y(VT)y \in (V \cup T)^*.

  • Derivation Step: If w=uxvw = uxv, and xyx \rightarrow y is a production in PP, we say ww derives z=uyvz = uyv, written wzw \Rightarrow z.
  • Derivation: A sequence of derivation steps: w1w2wnw_1 \Rightarrow w_2 \Rightarrow \dots \Rightarrow w_n. We write w1wnw_1 \Rightarrow^* w_n to indicate wnw_n can be derived from w1w_1 in zero or more steps.
  • Sentential Form: Any string (containing variables and/or terminals) derivable from the start symbol SS.
  • Language Generated (L(G)L(G)): The set of all strings consisting only of terminals that can be derived from SS. L(G)={wT:Sw}L(G) = \{w \in T^* : S \Rightarrow^* w\}.

Example: Grammar G=({S},{a,b},S,{SaSb,Sλ})G = (\{S\}, \{a, b\}, S, \{S \rightarrow aSb, S \rightarrow \lambda\}) generates L(G)={anbn:n0}L(G) = \{a^n b^n : n \ge 0\}. Derivation of aabbaabb: SaSbaaSbbaabbS \Rightarrow aSb \Rightarrow aaSbb \Rightarrow aabb. The strings S,aSb,aaSbbS, aSb, aaSbb are sentential forms.

Automata

An automaton is an abstract computational model. Key components:

  • Input: A string read sequentially.
  • Control Unit: Has a finite set of internal states, including an initial state.
  • Storage: Temporary memory (its nature distinguishes automata types).
  • Transitions: Moves between states based on current state, input symbol, and storage contents.
  • Output: Can be a simple accept/reject (accepter) or a generated string (transducer).

Automata operate in discrete time steps. A configuration describes the state, remaining input, and storage content. A move is a transition between configurations.

2. Finite Automata (FA): Computation with Bounded Memory

Finite Automata have a control unit but no temporary storage beyond their current state. They recognize the family of regular languages.

Deterministic Finite Accepters (DFA)

A DFA makes precisely one move for each input symbol.

Formal Definition: A DFA is a 5-tuple M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F):

  • QQ: Finite set of states.
  • Σ\Sigma: Finite input alphabet.
  • δ:Q×ΣQ\delta: Q \times \Sigma \rightarrow Q: The transition function. Given state qQq \in Q and input aΣa \in \Sigma, δ(q,a)\delta(q, a) is the unique next state. δ\delta is a total function, defined for all pairs (q,a)(q, a).
  • q0Qq_0 \in Q: The initial state.
  • FQF \subseteq Q: The set of final (accepting) states.

Operation:

  1. Starts in state q0q_0.
  2. Reads the input string w=a1a2anw = a_1a_2\dots a_n from left to right.
  3. After reading aia_i, if in state qq, transitions to state δ(q,ai)\delta(q, a_i).
  4. After reading ana_n, if the current state is in FF, the string ww is accepted. Otherwise, it's rejected.

Extended Transition Function (δ\delta^*): To describe the state after processing a string ww, we define δ:Q×ΣQ\delta^*: Q \times \Sigma^* \rightarrow Q recursively:

  1. δ(q,λ)=q\delta^*(q, \lambda) = q (Base case: empty string doesn't change state).
  2. δ(q,wa)=δ(δ(q,w),a)\delta^*(q, wa) = \delta(\delta^*(q, w), a) (Recursive step: state after reading wawa is found by applying δ\delta to the state after reading ww and the final symbol aa).

Theorem: δ(qi,w)=qj\delta^*(q_i, w) = q_j if and only if there is a walk labeled ww from state qiq_i to state qjq_j in the DFA's transition graph.

Language Accepted (L(M)L(M)): L(M)={wΣ:δ(q0,w)F}L(M) = \{w \in \Sigma^* : \delta^*(q_0, w) \in F\}.

Regular Language: A language LL is regular if there exists a DFA MM such that L=L(M)L = L(M).

Example DFA Construction:

  • To accept strings starting with "ab": Need states to track "start", "seen a", "seen ab" (final trap), and "error" (non-final trap).
  • To accept strings not containing "001": Need states to remember relevant suffixes seen so far (λ, 0, 00). If "001" is formed, go to a non-final trap state. All other reachable states are final.

Nondeterministic Finite Accepters (NFA)

NFAs allow multiple possible moves from a state on a given input symbol, or moves on the empty string λ.

Formal Definition: An NFA is a 5-tuple M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F), where Q,Σ,q0,FQ, \Sigma, q_0, F are as in DFAs, but δ\delta is different:

  • δ:Q×(Σ{λ})2Q\delta: Q \times (\Sigma \cup \{\lambda\}) \rightarrow 2^Q. The function maps a state and an input symbol (or λ) to a set of possible next states.

Operation:

  • An NFA can be in multiple states simultaneously (conceptually).
  • If δ(q,a)\delta(q, a) contains p1,p2,,pkp_1, p_2, \dots, p_k, the NFA can transition to any of these states upon reading aa while in state qq.
  • If δ(q,λ)\delta(q, \lambda) contains pp, the NFA can transition from state qq to state pp without consuming any input symbol.
  • If δ(q,a)=\delta(q, a) = \emptyset, there is no move defined (dead configuration for that path).

Extended Transition Function (δ\delta^*): For an NFA, δ(qi,w)\delta^*(q_i, w) is the set of all states reachable from state qiq_i by following paths labeled ww (including λ-transitions).

Acceptance: An NFA accepts ww if any possible sequence of moves results in the NFA being in any state from FF after processing ww. Formally: L(M)={wΣ:δ(q0,w)F}L(M) = \{w \in \Sigma^* : \delta^*(q_0, w) \cap F \neq \emptyset\}.

Why Nondeterminism?

  • Modeling: Can model search/backtracking algorithms naturally.
  • Convenience: Often much simpler/smaller NFAs exist for a language compared to the minimal DFA (e.g., union of languages).
  • Theory: Easier to prove certain properties (like closure under union, concatenation, star) using NFAs.

Equivalence of DFA and NFA

Theorem: For every NFA MNM_N, there exists a DFA MDM_D such that L(MN)=L(MD)L(M_N) = L(M_D). The family of languages accepted by NFAs is exactly the family of regular languages.

Subset Construction (nfa-to-dfa procedure):

  1. The states of MDM_D are sets of states of MNM_N (i.e., elements of 2QN2^{Q_N}).
  2. The initial state of MDM_D is the set containing q0q_0 and all states reachable from q0q_0 via λ-transitions (the λ-closure of {q0}\{q_0\}).
  3. The transition δD({p1,,pk},a)\delta_D(\{p_1, \dots, p_k\}, a) is the set of all states in MNM_N reachable from any pip_i by reading aa, followed by any number of λ-transitions.
  4. A state {p1,,pk}\{p_1, \dots, p_k\} in MDM_D is final if at least one pip_i is a final state in MNM_N (piFNp_i \in F_N).

This construction might lead to a DFA with up to 2QN2^{|Q_N|} states, though often fewer are reachable.

Minimization of the Number of States in a DFA

Finding the unique minimal DFA for a regular language involves identifying and merging indistinguishable states.

Definitions:

  • Indistinguishable: States p,qp, q are indistinguishable if for all wΣw \in \Sigma^*, δ(p,w)F    δ(q,w)F\delta^*(p, w) \in F \iff \delta^*(q, w) \in F.
  • Distinguishable: States p,qp, q are distinguishable if there exists some wΣw \in \Sigma^* such that one of δ(p,w)\delta^*(p, w) and δ(q,w)\delta^*(q, w) is in FF and the other is not.

Algorithm (mark and reduce):

  1. Remove inaccessible states (states unreachable from q0q_0).
  2. Initialize: Mark all pairs (p,q)(p, q) where one is final (pFp \in F) and the other is not (qFq \notin F) as distinguishable.
  3. Iterate: Repeat until no new pairs are marked: For every unmarked pair (p,q)(p, q) and every aΣa \in \Sigma, let pa=δ(p,a)p_a = \delta(p, a) and qa=δ(q,a)q_a = \delta(q, a). If the pair (pa,qa)(p_a, q_a) is already marked as distinguishable, mark (p,q)(p, q).
  4. Merge: All pairs (p,q)(p, q) that remain unmarked are indistinguishable. Group states into equivalence classes based on indistinguishability.
  5. Construct Minimal DFA: Create one state for each equivalence class. The initial state is the class containing q0q_0. A state (class) is final if it contains any original final state. Transitions are defined naturally: if δ(p,a)=q\delta(p, a) = q, and pp is in class EiE_i and qq is in class EjE_j, then the transition for the minimal DFA is δ(Ei,a)=Ej\delta'(E_i, a) = E_j.

The resulting DFA is guaranteed to be minimal in the number of states and is unique (up to state relabeling).

3. Regular Expressions: An Algebraic Language Description

Regular expressions provide a text-based way to define regular languages.

Formal Definition:

  1. Primitives: ∅, λ, and aa (for aΣa \in \Sigma) are regular expressions.
  2. Recursive Step: If r1r_1 and r2r_2 are regular expressions, then (r1+r2)(r_1 + r_2), (r1r2)(r_1 \cdot r_2), and (r1)(r_1^*) are regular expressions.
  3. Only strings formed by rules 1 and 2 are regular expressions.

Language Denoted (L(r)L(r)):

  • L()=L(\emptyset) = \emptyset (the empty language)
  • L(λ)={λ}L(\lambda) = \{\lambda\} (the language containing only the empty string)
  • L(a)={a}L(a) = \{a\} (the language containing only the string 'a')
  • L(r1+r2)=L(r1)L(r2)L(r_1 + r_2) = L(r_1) \cup L(r_2) (Union)
  • L(r1r2)=L(r1)L(r2)L(r_1 \cdot r_2) = L(r_1) L(r_2) (Concatenation: {uvuL(r1),vL(r2)}\{uv | u \in L(r_1), v \in L(r_2)\})
  • L(r1)=(L(r1))L(r_1^*) = (L(r_1))^* (Kleene Star: zero or more concatenations)

Conventions:

  • Precedence: ^* (highest) > · (concatenation) > ++ (union).
  • Concatenation dot (·) is often omitted (e.g., r1r2r_1 r_2).
  • Parentheses (...)(...) group expressions.

Example: L((0+1)00(0+1))={w{0,1}w contains the substring 00}L((0+1)^*00(0+1)^*) = \{w \in \{0,1\}^* \mid w \text{ contains the substring } 00\}. Example: L((1011)(0+λ)+1(0+λ))L((1^*011^*)^*(0+\lambda) + 1^*(0+\lambda)) or L((1+01)(0+λ))L((1+01)^*(0+\lambda)) describes strings over {0,1}\{0,1\} with no consecutive zeros. (Different expressions can denote the same language).

4. The Grand Equivalence: Expressions, NFAs, and DFAs

These three formalisms are equally powerful in describing regular languages.

Theorem: A language L is regular if and only if:

  • L is accepted by some DFA.
  • L is accepted by some NFA.
  • L is described by some regular expression.

Constructions:

  • RegEx → NFA: Build NFAs for primitive expressions (∅, λ, a). Combine them using λ-transitions to simulate union (parallel paths), concatenation (series connection), and star (feedback loop).
  • NFA → DFA: Use the subset construction.
  • DFA → RegEx: One method uses Generalized Transition Graphs (GTGs) where edge labels are regular expressions. Start with the DFA's graph. Systematically eliminate states one by one, recalculating edge labels (which become more complex regular expressions) to represent the paths through the eliminated state. Continue until only the start state and a single final state (or just the start state if it's final) remain. The expression(s) on the remaining edge(s) define the language. The process involves rules like rpq=rpq+rpk(rkk)rkqr_{pq}' = r_{pq} + r_{pk} (r_{kk})^* r_{kq} when eliminating state qkq_k.

5. Exploring the Properties of Regular Languages

Understanding the characteristics shared by all regular languages helps differentiate them from other language families.

Closure Properties

Regular languages are robust under many operations:

  • Union, Concatenation, Kleene Star: Proved directly from the definition of regular expressions.
  • Complementation: If LL is accepted by DFA M=(Q,Σ,δ,q0,F)M=(Q, \Sigma, \delta, q_0, F), then L\overline{L} is accepted by M=(Q,Σ,δ,q0,QF)M'=(Q, \Sigma, \delta, q_0, Q-F). This relies on the DFA being deterministic and having a total transition function.
  • Intersection: Can be proved constructively by building a product DFA. If M1=(Q1,Σ,δ1,q0,1,F1)M_1=(Q_1, \Sigma, \delta_1, q_{0,1}, F_1) and M2=(Q2,Σ,δ2,q0,2,F2)M_2=(Q_2, \Sigma, \delta_2, q_{0,2}, F_2), the intersection DFA MM_\cap has states Q1×Q2Q_1 \times Q_2, initial state (q0,1,q0,2)(q_{0,1}, q_{0,2}), final states F1×F2F_1 \times F_2, and transitions δ((p,q),a)=(δ1(p,a),δ2(q,a))\delta_\cap((p,q), a) = (\delta_1(p,a), \delta_2(q,a)). Alternatively, use DeMorgan's law: L1L2=L1L2L_1 \cap L_2 = \overline{\overline{L_1} \cup \overline{L_2}} and closure under union/complement.
  • Difference: L1L2=L1L2L_1 - L_2 = L_1 \cap \overline{L_2}. Follows from closure under complement and intersection.
  • Reversal: Construct an NFA for LRL^R by reversing edges and swapping start/final states of an NFA for LL.
  • Homomorphism: Replace terminals in a regular expression for LL with their string images under the homomorphism hh to get a regular expression for h(L)h(L).
  • Right Quotient (L1/L2={x:xyL1 for some yL2}L_1/L_2 = \{x : xy \in L_1 \text{ for some } y \in L_2\}): If L1,L2L_1, L_2 are regular, L1/L2L_1/L_2 is regular. Construct a DFA for L1/L2L_1/L_2 by modifying the final states of a DFA for L1L_1. A state qq becomes final if δ(q0,x)=q\delta^*(q_0, x) = q and there exists some yL2y \in L_2 such that δ(q,y)\delta^*(q, y) is an original final state.

The Pumping Lemma for Regular Languages (Detailed)

This is a crucial tool for proving non-regularity.

Pumping Lemma: Let L be a regular language. Then there exists a constant m1m \ge 1 (the pumping length, related to the number of states in a DFA for L) such that for any string wLw \in L with wm|w| \ge m, ww can be divided into three substrings, w=xyzw = xyz, satisfying:

  1. y1|y| \ge 1 (the pumped part is non-empty).
  2. xym|xy| \le m (the part xyxy occurs within the first mm symbols).
  3. For all integers i0i \ge 0, the string wi=xyizw_i = xy^i z is also in LL. (y0=λy^0 = \lambda, so xzLxz \in L).

Proof Idea: If wm|w| \ge m (where mm is the number of states in a DFA for LL), the path traced by ww in the DFA must visit at least m+1m+1 states. By the pigeonhole principle, some state qrq_r must be visited twice within the first mm transitions. Let w=xyzw=xyz where xx takes the DFA from q0q_0 to the first occurrence of qrq_r, yy takes it from qrq_r back to qrq_r (the cycle), and zz takes it from qrq_r to a final state. Since the repeated state occurs within the first mm transitions, xym|xy| \le m. Since it's a cycle based on non-empty input, y1|y| \ge 1. Because yy loops from qrq_r to qrq_r, we can traverse this loop ii times (xyizxy^iz) and still end up in the same final state via zz.

Using the Pumping Lemma (The Game):

  1. Assume: LL is regular (for contradiction).
  2. Lemma Gives: An adversary provides the pumping length mm.
  3. You Choose: Select a specific string wLw \in L such that wm|w| \ge m. Your choice should strategically limit the adversary's options.
  4. Adversary Decomposes: The adversary chooses some decomposition w=xyzw=xyz satisfying y1|y| \ge 1 and xym|xy| \le m. You must show the lemma fails for all valid choices they could make.
  5. You Pump: Choose an integer i0i \ge 0 (often i=0i=0 or i=2i=2) such that the resulting string wi=xyizw_i = xy^iz is provably not in LL.
  6. Contradiction: This violates the Pumping Lemma, so the initial assumption that LL was regular must be false.

Common Mistakes:

  • Picking mm yourself.
  • Choosing a ww with w<m|w| < m.
  • Assuming a specific decomposition xyzxyz (you must counter all possible decompositions allowed by the lemma).
  • Picking an ii that does result in a string in LL.
  • Trying to use the lemma to prove a language is regular (it only proves non-regularity).

Myhill-Nerode Theorem and Regularity

The Myhill-Nerode theorem provides an alternative characterization of regular languages based on equivalence relations. For a language LL over Σ, define a relation RLR_L on Σ* such that xRLyx R_L y if and only if for all zΣz \in \Sigma^*, xzL    yzLxz \in L \iff yz \in L. This means xx and yy are indistinguishable with respect to LL if appending any suffix zz results in strings that are either both in LL or both not in LL.

Theorem (Myhill-Nerode): A language LL is regular if and only if the relation RLR_L has a finite number of equivalence classes (i.e., its index is finite).

The number of equivalence classes corresponds exactly to the number of states in the minimal DFA for LL. This theorem provides another way to prove non-regularity (by showing an infinite number of equivalence classes) and is the theoretical foundation for DFA minimization algorithms.

Identifying Non-Regular Languages: Summary

  1. Pumping Lemma: The primary method. Choose ww strategically, show all possible decompositions lead to a contradiction when pumped.
  2. Closure Properties: Show that LRL \cap R, h(L)h(L), L/RL/R, etc., where RR is regular and hh is a homomorphism, results in a known non-regular language. If LL were regular, the result would have to be regular due to closure.
  3. Myhill-Nerode: Show that LL induces an infinite number of equivalence classes under the relation RLR_L.

Decision Algorithms for Regular Languages

Key problems about regular languages are decidable, meaning algorithms exist:

  • Membership (wLw \in L?): Simulate the minimal DFA for LL on input ww. Time O(w)O(|w|).
  • Emptiness (L=L = \emptyset?): Check if any final state is reachable from the start state in the DFA graph (e.g., using BFS or DFS). Time polynomial in the number of states.
  • Finiteness/Infiniteness (L=|L| = \infty?): Check if there is a cycle in the DFA graph that is reachable from the start state and from which a final state can be reached. Time polynomial in the number of states.
  • Equality (L1=L2L_1 = L_2?): Construct the minimal DFAs M1,M2M_1, M_2. They accept the same language iff they are isomorphic (identical structure, perhaps different state names). Alternatively, construct a DFA for the symmetric difference LΔ=(L1L2)(L1L2)L_{\Delta} = (L_1 \cap \overline{L_2}) \cup (\overline{L_1} \cap L_2) using closure properties and check if LΔ=L_{\Delta} = \emptyset. Time polynomial in state counts.
  • Subset (L1L2L_1 \subseteq L_2?): Check if L1L2=L_1 \cap \overline{L_2} = \emptyset. Follows from closure and emptiness check.

6. Regular Grammars: Rules for Regular Languages

Regular languages can also be defined using restricted forms of grammars.

Definitions:

  • Right-Linear Grammar: All productions are AaBA \rightarrow aB, ABA \rightarrow B, AaA \rightarrow a, or AλA \rightarrow \lambda (where A,BV,aTA, B \in V, a \in T). Often simplified to AxBA \rightarrow xB or AxA \rightarrow x where xTx \in T^*. The key is at most one variable, always at the rightmost end.
  • Left-Linear Grammar: All productions are ABaA \rightarrow Ba, ABA \rightarrow B, AaA \rightarrow a, or AλA \rightarrow \lambda. Simplified: ABxA \rightarrow Bx or AxA \rightarrow x. At most one variable, always at the leftmost end.
  • Regular Grammar: A grammar that is either entirely right-linear or entirely left-linear.

Example Right-Linear: SabSaS \rightarrow abS | a generates L((ab)a)L((ab)^*a). Example Left-Linear: SS1abS \rightarrow S_1 ab, S1S1abS2S_1 \rightarrow S_1 ab | S_2, S2aS_2 \rightarrow a generates L(aab(ab))L(aab(ab)^*).

Equivalence: Regular Languages and Regular Grammars

Theorem: A language L is regular if and only if there exists a regular grammar G such that L = L(G).

Constructions:

  • Right-Linear Grammar → NFA:
    • Create NFA states corresponding to grammar variables VV, plus a dedicated final state FnewF_{new}.
    • The NFA start state corresponds to the grammar start variable SS.
    • For a production AxBA \rightarrow xB (x=a1akx=a_1\dots a_k), create a path of kk transitions labeled a1,,aka_1, \dots, a_k from state AA to state BB, possibly introducing intermediate states if k>1k>1. If x=λx=\lambda, add a λ-transition from AA to BB.
    • For a production AxA \rightarrow x (x=a1akx=a_1\dots a_k), create a path labeled a1,,aka_1, \dots, a_k from state AA to the final state FnewF_{new}. If x=λx=\lambda, add a λ-transition from AA to FnewF_{new}.
  • DFA → Right-Linear Grammar:
    • Grammar variables VV correspond to DFA states QQ.
    • Grammar terminals TT are the DFA alphabet Σ.
    • Grammar start variable SS corresponds to DFA start state q0q_0.
    • For each DFA transition δ(qi,a)=qk\delta(q_i, a) = q_k, add the production qiaqkq_i \rightarrow a q_k to the grammar's productions PP.
    • For each final state qfFq_f \in F in the DFA, add the production qfλq_f \rightarrow \lambda to PP. (Alternatively, to avoid λ-productions directly, if δ(qi,a)=qf\delta(q_i, a) = q_f with qfFq_f \in F, add both qiaqfq_i \rightarrow a q_f and qiaq_i \rightarrow a).
  • Left-Linear Grammars: They also generate exactly the regular languages. This can be shown by constructing an NFA that reads the input in reverse, simulating the left-linear derivation backwards, or by formally showing that L(Gleft)=(L(Gright))RL(G_{left}) = (L(G_{right}))^R where GrightG_{right} is obtained by reversing the right-hand sides of GleftG_{left}'s productions, and using the closure of regular languages under reversal.

This equivalence provides the final piece connecting the machine, expression, and grammatical views of regular languages.

Final Thoughts

We've significantly deepened our understanding of regular languages, formalizing their definitions via DFAs, NFAs, regular expressions, and regular grammars. We established their equivalence and explored key properties like closure, decidability, and the crucial Pumping Lemma for proving non-regularity. These concepts are the bedrock upon which more complex language families are built.