On The Unreasonably Broad Uses of Diagonalization

Robert Cunningham

May 16, 2023 • 11 min read

Here's a surprising result. Any program that transforms source code must have an input that behaves the same before and after being transformed. In other words, for this input, called a fixed point, the input and output programs behave exactly the same way.

Sometimes fixed points are easy to find. The code console.log("hello") is an obvious fixed point of a transformation that swaps the characters 'a' and 'z':

const swapAandZ = (input) => input.replace(/[az]/g, c => c === 'a' ? 'z' : 'a')

const input: string = 'console.log("hello")'

eval(input) === eval(swapAandZ(input))
// our === implies the same return value, console outputs, error text, etc.

Some are slightly harder; can you think of a fixed point of "append the source to itself"?

It turns out that while true {...} (or the empty program) are both fixed points.

const double = (source: string) => source + source

const input = "while (true) {console.log('loop');} "

eval(input) // => loop loop loop loop loop ...
eval(double(input)) // => loop loop loop loop loop ...

But some examples are extremely non-obvious. Can you think of a fixed point of "hash the source and execute that hash"? The program would need to behave exactly the same hashed and unhashed.

const hash = (source: string) => createHash('sha256').update(source).digest('hex');

const input = "some crazy program"

// eval(input) === eval(hash(input))

But we'll prove that in general, any transformation, no matter how crazy, must have a fixed point.

// for any transformation
const transform: (source: string) => string;

// there exists an input
const input: string;

// where executing the program is the same as executing its transformation
eval(input) === eval(transform(input))

The technique we'll use to prove this remarkable truth is diagonalization. To warm up to the technique, we'll start with two simple examples: the proof that there are more reals than natural numbers, and a proof that a seemingly powerful class of functions, the primitive recursive functions, can't be complete.

We'll use that newfound expertise with diagonalization to show the above result, that all code transformation programs have a fixed point. This result is called Rogers's Theorem.

Rogers's theorem will then help us understand a terse and opaque piece of math called the Diagonal Lemma. We'll close by using the Diagonal Lemma for a pair of excellent proofs of Gödel's Incompleteness Theorem (all theories must be either incomplete or inconsistent) and Tarski's Undefinability of Truth (theories cannot define an internal notion of truth).

The Diagonalization Trilemma

Diagonalization. Let's say you have an "evaluation" function $f: X \times X \rightarrow Y$. You can think of $f(3)(5)$ as selecting the 3rd object from a long list of objects, and evaluating it on 5. [0]

Let's say you also have a $\delta: Y \rightarrow Y$ with no fixed point. You can think of $\delta$ as a "give me something different" function: it changes every $y$ into a different $y$.

Then there exists a function $g: X \rightarrow Y$ that isn't equal to $f(k)$ for any $k$. In other words, there's at least one new object $g$ that isn't equal to f(3) or f(55) or f(574), or any other object $f(x)$ in the original list of objects. (Note that $f(55)$ and $g$ are both functions representing objects.)

Proof. Take $g(x)$ as $\delta(f(x)(x))$. For any $k$, $g \neq f(k)$, since $g(k) \neq f(k)(k)$. $\blacksquare$

Spelling out the proof, if you're concerned that maybe $g = f(234)$, you can check that at $g$ and $f(234)$ differ at index 234: $g(234) \neq f(234)(234)$. That's because $g(234)$ does equal $\delta(f(234)(234))$, which is different from $f(234)(234)$ because $\delta$ is the "give me something different" function.

The most important part of this proof is that it establishes a crucial trilemma, which we'll use for the rest of this post.

Trilemma. For all sets $X$ and $Y$ with a function $f: X \times X \rightarrow Y$, and a function $\delta: Y \rightarrow Y$, at least one of the following three things must be true:

$\delta: Y \rightarrow Y$ has a fixed point.
There exists some $g: X \rightarrow Y$ which isn't found in $f(n)$ for $n \in \mathbb{N}$, but $g$ is obviously the right kind of object, so the list of objects $f(n)$ must not be exhaustive.
There exists some $g: X \rightarrow Y$ which isn't found in $f(n)$ for $n \in \mathbb{N}$, but $f(n)$ obviously lists all the possible objects, so $g$ must actually be a new and different kind of object.

Example: Primitive Recursion

The first objects we'll look at are the primitive recursive functions. Primitive recursive functions are all the functions made of combinations five functional building blocks: $z, s, p, c$ and $\rho$. Don't worry if you haven't seen them before. The upshot is that you can easily list them, numbering them sequentially:

1 => z() // zero function
2 => s() // +1 function
3 => p()
...
372 => c(s, z) // composition of 0 and +1 (1)
373 => c(s, s) // composition of +1 and +1 (+2)
...

We'll choose $f(i)(j)$ as the evaluation function, which takes the $i$th primitive recursive function and computes it on $j$. For example, evaluating the 373rd function at the value 3:

f(373, 3) => c(s, s)(3) => s(s(3)) => 3 + 1 + 1 => 5

We can easily build a function $\delta(x) = x + 1$ that has no fixed point.

So we consult our trilemma, and decide that either (1) $\delta$ is broken, (2) the list $f(i)$ isn't complete, or (3) $g$ is a new kind of object.

$\delta$ has no fixed point, and $f(n)$ obviously enumerates all the primitive recursive functions (by definition). So it must be that $g(i) = \delta(f(i)(i))$ is a new kind of object: a non-primitive-recursive (but computable) function.

This might surprise you if you've only played with primitive recursive functions for a little bit: they can compute functions like plus, times, isPrime, nthPrime, ifelse, and more. But it turns out that they aren't Turing complete, since they can't simulate the eval function.

Example: Reals and Naturals

You might recognize the name diagonalization from a proof you saw in math class. Georg Cantor famously used diagonalization to prove that there are more real numbers than natural numbers. In our trilemma framing, the argument goes like this.

Start with a list of rational numbers.

1 => 0.139042986720983745...
2 => 0.349857294587293845...
3 => 0.239841509204872018...

Our evaluation function $f(i)(j)$ takes the $i$th real number and then gets its $j$th decimal digit:

f(3)(3) => 0.23[9]841509204872018... => 9

We can build a function delta = (x) => x == 1 ? 2 : 1 which maps everything (except 1) to 1, and then sends 1 to 2. It definitely has no fixed point.

We find that we've eliminated two options from our trilemma. $\delta$ has no fixed point. Our new $g$ is definitely a real number, since we can get its decimal digit at any location: the first three are $0.211...$. So then it must be (2) to blame–our original list of real numbers couldn't have been exhaustive.

And so we see that there were more reals than we could list countably.

Example: Church-Turing & Rogers' Theorem

This time we're going to list all the JavaScript programs, again alphabetically:

1 => eval('a')
2 => eval('b')
3 => eval('c')
...
10348423 => eval('x => x + x')

Predictably, our evaluation function $f(i)(j)$ takes the $i$th program in the list and runs it on the input $j$.

f(10348423)(1) => eval('x => x + x')((eval('a')) => 'aa'

We're going to use delta = (x) => x + " + 1". Recall that $\delta$ operates on the source, so for example

delta(f(10348423)(1)) // => eval('x => x + x')((eval('a')) + 1 => 'aa' + 1 => 'aa1'

But something seems wrong here. Our argument above says that either:

$\delta$ has a fixed point. It doesn't, right?
Our list of JavaScript programs isn't exhaustive. But surely we'll eventually print out even the longest of JavaScript programs by iterating alphabetically?
Our newly manufactured JavaScript program, $g$, has no string representation. But surely every JavaScript program is representable by a string?

Do you see the issue? It turns out that our $\delta$ does have a fixed point. In particular, eval("while (true) {}") is a fixed point of $\delta$. If your program loops forever, and you add one to the result, it still loops forever.

Not only that, but all viable $\delta$s share this flaw. To build the "change that output" program $\delta$, you need to know whether your original program halts or not. And in general, you can't know whether it halts or loops, so you can't build a good $\delta$. [1]

This constitutes a somewhat shocking proof that it's impossible to build a function $\delta$ that takes any program and modifies it to produce a different result. In other words, all programs mapping functions to functions have a fixed point. This result is called Rogers's Theorem.

There's a nice immediate corollary: any language allows you either solve the halting problem or build an interpreter for itself. JavaScript allows interpreters but not halting-solvers. Primitive recursive functions allow halting-solvers (all P.R. functions halt), but can't build their own interpreters. These correspond to the (1) and (3) branches of the trilemma respectively.

New Tool: Diagonal Lemma

We've just seen that all programs that map programs to other programs have a fixed point. Next, we're going to see a very similar result, but this time applied to mathematical theorems. It's wrapped up in an opaque piece of math called the Diagonal Lemma. But once we understand the Diagonal Lemma, it opens the door to amazingly short demonstrations of Godel's Theorem and Tarski's Theorem.

Let's say you're working in some proof system, which is basically some laws ("axioms") plus some deductive rules. Call that system $T$.

Give me a statement $A(x)$. Then $T$ can prove that there exists a sentence $\phi$ where $\phi$ is only true if $A(\ulcorner \phi \urcorner)$ is true, where $\ulcorner \phi \urcorner$ is the numerical representation of the bytes in $\phi$. Mathematically, we can express the same as follows, with $T \vdash$ read "T proves:"

$$\forall A \, \exists \phi \quad T \vdash \phi \iff A(\ulcorner \phi \urcorner)$$

So let's say that we're dealing with $A(x) := x * 2 = 4$. Then there's a sentence $\phi$ where strong enough theories can prove that $\phi \iff \ulcorner \phi \urcorner * 2 = 4$. In this case, we can choose $\phi$ to be the sentence $0 = 1$. Then $\ulcorner \phi \urcorner = 3161393$, and indeed $T$ can prove that

$$0 = 1 \iff 3161393 * 2 = 4$$

since both sides are false.

In the domain of programs, we've seen that any program you can name that transform programs into other programs has a fixed point. The Diagonal Lemma says something extremely similar for math: any statement you can name that maps the numerical representation of a statement to another statement has a fixed point. And as a bonus, strong enough theories can prove that.

Proof Sketch: Diagonal Lemma

Consider a sufficiently strong theory $T$ and any arbitrary formula $A$ with one free variable. Then there exists a $\psi$ so that $T$ proves that $\psi \iff A(\ulcorner \psi \urcorner)$.

The set of formulas with one free variable are enumerable (for example, by printing all strings and then discarding any that have more than one variable symbol):

274 => "x = 1"
...
672 => "x = x"
...
1837501 => "5x = x && 4 = 5"

We're going to define a function $f(s)(s')$ which evaluates $s$ on the byte-representation of $s'$. For example, f($x$ is prime)($x$ * $x$ = 100) evaluates to "33823693116354608 is prime" (the string "x*x= 100" in UTF-8 has the decimal representation 33823693116354608). Notice that $f$ maps two formulas with a free variable into a single sentence with no free variables.

Let $\delta(\phi) = A(\ulcorner \phi \urcorner)$.

Now for the trilemma. All the formulas in our system definitely have a written representation, and all written representations with one free variable definitely count as formulas in our system.

So it must be that $\delta$ has a fixed point. And that fixed point is exactly what we're after: some $\psi$ so that $\psi$ and $A(\ulcorner \psi \urcorner)$ are logically equivalent. (Notice that we've shifted from "output equivalence" to "logical equivalence.")

So we've seen how we can prove this. We're going to omit the proof that $T$ itself can also prove this, since it's a bit too hairy for our taste. If you look into it, it basically involves proving that $T$ can construct and prove this same argument inside itself.

Proof Sketch: Gödel's Incompleteness Theorem

We've finally arrived at Gödel's Incompleteness Theorem. Here's a quick reminder of the statement.

Gödel's First Incompleteness Theorem. Any sufficiently powerful theory must be either inconsistent or incomplete.

The core idea is to construct a sentence $S$ like "this sentence is unprovable." Then if $S$ is provable, it's wrong (the surrounding theory can prove things that are contradictory; this is called being inconsistent). If $T$ can't prove $S$, then $S$ is true (the surrounding theory has truths that it can't prove; this is called being incomplete).

That's a slick argument, but are sentences like "this sentence is unprovable" even allowed? How do you express something like "this sentence" formally? The Diagonal Lemma allows us to do just that.

Let $A(\ulcorner x \urcorner) = \forall p$, the string $p$ is not a proof of $x$. In other words, $A(\ulcorner x \urcorner)$ = "$x$ is unprovable".

Then by the Diagonal Lemma, we can use $T$ to prove that there exists an $x$ with $x \iff A(\ulcorner x \urcorner)$, or $x \iff$ ($x$ is unprovable). And "$x \iff$ ($x$ is unprovable)" is exactly "this sentence is unprovable." So we can build these sentences!

Proof Sketch: Tarski's Theorem

Here's a quick reminder of what Tarski's Theorem says.

Tarski's Theorem. Any sufficiently powerful theory cannot define its internal notion of truth.

To prove it, we take $A(\ulcorner x \urcorner)$ = ($x$ is not true). The Diagonal Lemma guarantees that we can prove the existence of an $x$ such that $x \iff$ ($x$ is not true). But "$x$ is true $\iff$ $x$ is not true" is obviously contradictory.

The only thing we can point to is our original $A$, which must have been flawed. It must be that theories cannot define predicates like $x$ is true or $x$ is not true.

Conclusion

I hope you're convinced that diagonalization is an extremely powerful tool.

From our original diagonalization trilemma, we were able to immediately show that primitive recursive functions aren't complete, that there are more reals than naturals, and that program-transforming-programs must all have fixed points.

Then we used the same trilemma to construct a new tool, the Diagonal Lemma, and closed out the post by using it to construct the strange sentences required for Gödel and Tarski's famous theorems.

We've kept the math relatively light, leaning on intuition and code where necessary. If you want a more rigorous treatment of this topic, I highly recommend continuing with Yanofsky [0], upon which this post is based.

Notes

[0] Note that we'll use the curried $f(3)(5)$ and uncurried $f(3, 5)$ forms interchangeably in this post.

[1] A delightful corollary: if your language is weak enough that you can solve the halting problem, then it must not be able to build an interpreter for itself. That's what happens with P.R. functions, which all halt.

Appendix

Gödel Numbers

Gödel numbers are a way of converting formula into numbers. We denote the Gödel number of $x$ as $\ulcorner x \urcorner$. You can imagine $\ulcorner x \urcorner$ just interpreting the underlying bytes of $x$ as an integer instead of a string.

const gödel = (s: string): bigint => {
	const byteArray = new TextEncoder().encode(s);
	const hexString = Array.from(byteArray).map(byte => byte.toString(16).padStart(2, '0')).join('');
	return BigInt(`0x${hexString}`);
}

const unGödel = (n: bigint): string => {
	const hexString = n.toString(16);
	const byteArray = new Uint8Array(hexString.match(/.{1,2}/g)!.map(byte => parseInt(byte, 16)));
	return new TextDecoder().decode(byteArray);
}
  
console.log(gödel("x * 2 = 4")) // 2215926989203043065908n
console.log(unGödel(2215926989203043065908n)) // x * 2 = 4

Diagonalization Proof (no exposition)

Here's the straight math version of the general diagonalization argument.

Diagonalization. For all sets $X$ and $Y$ with a function $f: X \times X \rightarrow Y$ and a function $\delta: Y \rightarrow Y$ with no fixed point, there exists at least one function $g: X \rightarrow Y$ such that there exists no $k$ such that $f(k) = g$.

Proof. Let $g(x) = \delta(f(x, x))$. For any $k$, $g \neq f(k)$, since $g(k) \neq f(k)(k)$.

Acknowledgements

This post (and the primitive recursive post) were prepared as part of my final work for the MIT class Seminar in Logic (18.504). Thanks to Henry Cohn, Lisa Kondrich, and Laker Newhouse for their extensive suggestions, feedback, and help with both this post and the primitive recursive introduction post.

References

[0] Noson S. Yanofsky (2003). A Universal Approach to Self-Referential Paradoxes, Incompleteness and Fixed Points. Bulletin of Symbolic Logic, 9(3), 362–386.

[1] Garrabrant, S., & Eisenstat, S. (n.d.). Diagonalization fixed point exercises. Retrieved from https://www.lesswrong.com/posts/FZkLa3GRLW97fpknG/diagonalization-fixed-point-exercises

[2] Kelly, K. T. (n.d.). Computability and Learnability.

[3] Primitive recursive function. (2023). Retrieved from https://en.wikipedia.org/wiki/Primitive_recursive_function

[4] Smith, P. (2013). An Introduction to Gödel’s Theorems. Cambridge, England: Cambridge University Press.

[5] Solomon, E. (1958). Unpacking the diagonal lemma. Retrieved from https://math.stackexchange.com/questions/127852/unpacking-the-diagonal-lemma