The Ultimate Machine? Turing Machines, Computability, and Its Limits

April 19, 2023 — #AFL #CS Basics

In our journey through formal languages, we've seen finite automata handling regular languages and pushdown automata tackling context-free languages. Yet, we know limitations exist – languages like $\{a^n b^n c^n\}$ or $\{ww\}$ are beyond the reach of PDAs. This pushes us to ask: what kind of machine represents the absolute limit of mechanical computation? Enter the Turing Machine.

Conceived by Alan Turing before modern computers existed, the Turing machine provides a simple yet profoundly powerful abstract model of computation. It consists of:

A finite control unit, holding the machine's current state ( $q \in Q$ , where $Q$ is finite).
An infinite tape, divided into cells, each capable of holding one symbol from a finite tape alphabet ( $\Gamma$ ). This tape serves as input, output, and unlimited memory.
A special blank symbol ( $\square \in \Gamma$ ), initially filling the infinite tape except for the input. The input alphabet $\Sigma$ is a subset of $\Gamma - \{\square\}$ .
A read-write head positioned over one tape cell at a time, capable of reading the symbol in that cell, writing a new symbol, and moving one cell left (L) or right (R).
A transition function ( $\delta$ ) which dictates the machine's behavior. For a standard deterministic Turing machine, $\delta$ maps the current state and the symbol under the head to a new state, a symbol to write, and a direction to move: $\delta: Q \times \Gamma \rightarrow Q \times \Gamma \times \{L, R\}$ . $\delta$ can be a partial function; if $\delta(q, a)$ is undefined for the current state $q$ and tape symbol $a$ , the machine halts.
An initial state $q_0 \in Q$ .
A set of final states $F \subseteq Q$ . We typically assume machines halt upon entering a final state.

An instantaneous description (ID) captures the machine's configuration: $x_1 q x_2$ , meaning the tape contains the string $x_1 x_2$ (surrounded by blanks), the control unit is in state $q$ , and the read-write head is positioned over the first symbol of $x_2$ . A move transitions between IDs based on $\delta$ . For example, if $\delta(q_1, a) = (q_2, b, R)$ , then $c q_1 a d \vdash c b q_2 d$ . We use $\vdash^*$ for a sequence of zero or more moves.

Turing Machines as Language Accepters

We can define the language accepted by a Turing machine $M$ . An input string $w \in \Sigma^+$ is placed on the otherwise blank tape. The machine starts in state $q_0$ with its head over the leftmost symbol of $w$ .

A string $w$ is accepted by $M$ if the computation halts in a final state. That is, $L(M) = \{w \in \Sigma^+ : q_0 w \vdash^* x_1 q_f x_2 \text{ for some } q_f \in F, x_1, x_2 \in \Gamma^*\}$ .

If, for an input $w$ , the machine halts in a non-final state, or if it never halts (enters an infinite loop, $q_0 w \vdash^* \infty$ ), then $w$ is not accepted.

Examples:

It's simple to design a TM for a regular language like $L(00^*)$ . It reads the first 0, moves right reading subsequent 0s, and accepts if it hits a blank. If it sees a 1, it halts without accepting.
For the non-regular language $L = \{a^n b^n : n \ge 1\}$ , a TM can work by repeatedly matching and marking off the leftmost 'a' with the leftmost 'b', shuttling back and forth. If all 'a's match all 'b's perfectly, it accepts. This requires more complex state logic than a DFA or PDA but is achievable.
Similarly, $L = \{a^n b^n c^n : n \ge 1\}$ (which is not context-free) can be accepted by a TM using a similar marking-and-matching strategy across three sections of the string.

These examples suggest TMs are more powerful than PDAs.

The Church-Turing Thesis: Defining Computation

Turing machines, despite their simplicity, can perform surprisingly complex tasks, including arithmetic, string manipulation, and decision-making (comparisons). By combining basic TM operations (like building blocks or subprograms), we can simulate highly intricate procedures.

This leads to a crucial hypothesis: the Church-Turing Thesis. It states:

Thesis: Any computation that can be carried out by mechanical means (i.e., by any algorithm) can be performed by some Turing machine.

This is not a theorem to be proven, but rather a foundational principle or definition. It defines what we mean by "algorithmic" or "mechanical" computation. The evidence for it is strong:

All known computational models (recursive functions, Post systems, lambda calculus, modern computers) have been shown to be equivalent in power to Turing machines (they can simulate each other).
No one has conceived of a computational task performable by an intuitive algorithm that cannot, in principle, be done by a Turing machine.

Accepting this thesis allows us to equate the intuitive notion of "algorithm" with the formal definition of a "Turing machine program."

Formal Definition of Algorithm: An algorithm for a function $f: D \rightarrow R$ is a Turing machine $M$ which, for any input $d \in D$ , halts with $f(d)$ on its tape. Specifically, $q_0 d \vdash^* q_f f(d)$ for some final state $q_f$ .

This lets us rigorously discuss the existence or non-existence of algorithms for certain problems.

Variations on the Turing Machine Model

Does changing the TM's structure affect its computational power? Generally, no. Several variations are equivalent to the standard model:

Multiple Tracks: Viewing each tape cell as having multiple tracks (e.g., storing pairs or triplets) is just a way to encode a larger tape alphabet Γ. No change in power.
Stay-Option: Allowing the head to stay put ( $S$ move) instead of just $L$ or $R$ . An $S$ move can be simulated by an $R$ move followed by an $L$ move in the standard TM. Equivalent.
Semi-Infinite Tape: A tape infinite only in one direction (e.g., right). A standard TM can be simulated using two tracks on the semi-infinite tape: one for the standard tape's right half, one for the left half stored in reverse. Equivalent.
Off-Line TM: A TM with a separate read-only input tape and a read-write work tape. A standard TM can simulate this using multiple tracks: one for the input, one to mark the input head position, one for the work tape, one for the work tape head position. Equivalent.
Multiple Tapes: A TM with $k$ tapes, each with its own independent head. A standard TM can simulate a $k$ -tape TM using $2k$ tracks (one track for tape contents, one track for head position for each of the $k$ tapes). Equivalent.
Multidimensional Tape: A tape extending infinitely in 2 (or more) dimensions (moves U, D, L, R). A standard TM can simulate this by storing cell contents along with their coordinates (e.g., (content, (x, y))) on a single tape, using delimiters. Calculating new coordinates and finding cells takes effort, but it's possible. Equivalent.
Nondeterministic Turing Machine (NTM): The transition function $\delta$ maps to a set of possible next moves: $\delta: Q \times \Gamma \rightarrow 2^{Q \times \Gamma \times \{L, R\}}$ . An NTM accepts $w$ if at least one possible computation path halts in a final state.

Equivalence of NTM and DTM: Surprisingly, nondeterminism does not increase the computational power of a Turing machine in terms of what languages can be accepted. A deterministic TM can simulate an NTM. One way is breadth-first simulation: the DTM keeps track of all possible configurations of the NTM after $k$ steps on its tape. To simulate step $k+1$ , it iterates through the configurations from step $k$ , computes all possible next configurations for each, and writes the new list. This is incredibly slow (potentially exponential time overhead) but shows that anything accepted by an NTM is also accepted by some DTM.

Conclusion: The fundamental computational power (what can be computed) seems remarkably robust and independent of these specific architectural variations. However, efficiency can be greatly affected (e.g., multi-tape vs single-tape).

Languages, Grammars, and the Chomsky Hierarchy

Turing machines define classes of languages based on acceptance criteria.

Recursively Enumerable (RE) Languages (Type 0): A language $L$ is RE if there exists a Turing machine $M$ that accepts it ( $L=L(M)$ ). For $w \in L$ , $M$ halts and accepts. For $w \notin L$ , $M$ might halt and reject, or it might loop forever. RE languages are precisely those generated by unrestricted grammars (productions $u \rightarrow v$ where $u \in (V \cup T)^+$ and $v \in (V \cup T)^*$ ).
Recursive Languages: A language $L$ is recursive if there exists a Turing machine $M$ that decides it. This means $M$ accepts $L$ and always halts on any input (either accepting $w \in L$ or rejecting $w \notin L$ ). A recursive language has a membership algorithm.

Key Relationships:

Every recursive language is recursively enumerable.
If $L$ and its complement $\overline{L}$ are both RE, then $L$ (and $\overline{L}$ ) must be recursive. (We can run TMs for $L$ and $\overline{L}$ in parallel; one must eventually halt and accept).
There exist languages that are RE but not recursive. (Proven via diagonalization, related to the Halting Problem).
There exist languages that are not RE. (Proven via diagonalization: the set of all possible languages $2^{\Sigma^*}$ is uncountable, while the set of all TMs/RE languages is countable).

Context-Sensitive Languages (CSL) (Type 1): These languages sit between CFLs and recursive languages. They are generated by context-sensitive grammars (productions $x \rightarrow y$ with $|x| \le |y|$ , or $S \rightarrow \lambda$ if needed). Equivalently, they are accepted by Linear Bounded Automata (LBA) – nondeterministic TMs whose tape head is restricted to the portion of the tape initially containing the input (plus end markers). Every CSL is recursive. It is known that there are recursive languages that are not context-sensitive. It is an open question whether deterministic LBAs accept the same class of languages as nondeterministic LBAs.

The Chomsky Hierarchy: This organizes these major language families:

Type 3: Regular Languages (Accepted by FA, generated by regular grammars)
Type 2: Context-Free Languages (Accepted by PDA, generated by CFGs)
Type 1: Context-Sensitive Languages (Accepted by LBA, generated by CSGs)
Type 0: Recursively Enumerable Languages (Accepted by TM, generated by unrestricted grammars)

Each class is a proper subset of the one above it ( $Type 3 \subset Type 2 \subset Type 1 \subset Type 0$ ). Furthermore, the Recursive languages form a proper subset of the RE languages and properly contain the CSLs. The Deterministic CFLs form a proper subset of the CFLs.

Computability, Decidability, and the Unsolvable

The Church-Turing thesis equates "computable function" and "decidable problem" with "solvable by a Turing machine."

Computable Function: A function $f: D \rightarrow R$ for which a TM exists that halts with $f(d)$ for every $d \in D$ .
Decidable Problem: A problem (a set of questions with yes/no answers) for which a TM exists that halts with the correct yes/no answer for every instance of the problem.
Uncomputable/Undecidable: If no such TM exists.

The Halting Problem

Perhaps the most famous undecidable problem.

Problem: Given the description of an arbitrary Turing machine $M$ and an arbitrary input string $w$ , will $M$ halt when started on input $w$ ?

Undecidability Proof Outline (Diagonalization):

Assume a TM $H$ exists that solves the halting problem. $H$ takes $w_M w$ (encoded descriptions) and halts in state $q_y$ if $M$ halts on $w$ , or $q_n$ if $M$ loops on $w$ .
Modify $H$ to create $H'$ : if $H$ would halt in $q_y$ , make $H'$ loop infinitely. If $H$ would halt in $q_n$ , make $H'$ halt.
Construct $H''$ : takes $w_M$ as input, copies it to make $w_M w_M$ , then runs $H'$ on this.
- So, $H''$ applied to $w_M$ loops infinitely if $M$ halts on $w_M$ .
- $H''$ applied to $w_M$ halts if $M$ loops on $w_M$ .
Let $w_{H''}$ be the description of $H''$ . What happens when we run $H''$ on its own description $w_{H''}$ ?
- If $H''$ halts on $w_{H''}$ , then by its definition, it should loop infinitely. Contradiction.
- If $H''$ loops on $w_{H''}$ , then by its definition, it should halt. Contradiction.
The contradiction implies the initial assumption (existence of $H$ ) is false. The halting problem is undecidable.

Reduction and Other Undecidable Problems

Undecidability is often proven by reduction: showing that if problem B were decidable, we could use it to build an algorithm for a known undecidable problem A. Since A is undecidable, B must be too.

Many problems related to Turing machines and RE languages are undecidable, often by reduction from the halting problem:

State-Entry Problem: Does TM $M$ ever enter state $q$ on input $w$ ? (Undecidable: Reduce halting problem by modifying $M$ to enter a specific new state $q_{halt}$ just before halting).
Blank-Tape Halting Problem: Does TM $M$ halt if started on a completely blank tape? (Undecidable: Reduce halting problem $(M, w)$ by constructing $M_w$ that first writes $w$ on the blank tape, then simulates $M$ ).
Emptiness Problem for RE Languages: Is $L(M) = \emptyset$ ? (Undecidable).
Finiteness Problem for RE Languages: Is $L(M)$ finite? (Undecidable).
Equality Problem for RE Languages: Is $L(M_1) = L(M_2)$ ? (Undecidable).
Any Nontrivial Property of RE Languages (Rice's Theorem): Any property that is true for some RE languages but not all is undecidable. (e.g., "Is $L(M)$ regular?", "Is $L(M)$ context-free?", "Does $L(M)$ contain $\lambda$ ?")

Undecidability also extends to context-free languages:

Is CFG $G$ ambiguous? (Undecidable)
Is $L(G_1) \cap L(G_2) = \emptyset$ ? (Undecidable)
Is $L(G_1) = L(G_2)$ ? (Undecidable)
Is $L(G)$ regular? (Undecidable)

Conclusion: The Power and the Limits

Turing machines provide a robust, universal model for computation, equivalent in power to any realistic computing device and various other theoretical models. They define the recursively enumerable and recursive languages, sitting atop the Chomsky hierarchy. However, the very power that allows them to model any algorithm also leads to fundamental limitations. The existence of undecidable problems, starting with the Halting Problem, demonstrates that there are well-defined questions that no algorithm, no Turing machine, no computer, can ever consistently answer. Understanding these limits is as crucial as understanding the power of computation itself.