Robert Cunningham's Blog

Aug 12, 2022 · 4 min read

Notes on the Best Lectures at MIT

This semester I took 18.404 (“Theory of Computation”). It’s widely known as being among the best classes at MIT.

18.404 vis-a-vis 1100 other classes. "Adjusted ratings" means that ratings are adjusted to give more weight to large classes with high ratings; this reflects the intuition that 30 ratings averaging 6.9 gives more signal than one rating averaging 7.0.

Prof. Sipser teaches 18.404 in the standard format, with eighty minute lectures on Tuesdays and Thursdays. Perhaps it’s because the format is standard that the rating is impressive. It’s like winning a Michelin star for boiling an egg. What makes a large lecture amazing?

If there’s one thing that distinguishes Sipser’s teaching style, it’s this: the main thrust of the lectures is clear, and he does not get bogged down by anything. Because of this, he manages to keep the interest and attention of all 200 students, instead of only the first two rows.

For example, instead of encouraging esoteric and nitpicky questions, he frequently prompts by asking whether “anyone has a question that will help others to understand” or whether “anyone who doesn’t quite understand can ask a clarifying question.” One time someone asked about whether [some quantity] should be [that quantity]+1. He said something along the lines of “in this case, I happen to know that there’s no +1, but really, what’s a +1 between friends?”

Sometimes he’ll prove a result in the $\implies$ direction but refer to the book for the $\impliedby$ direction. Sometimes he proves the one-dimensional case, which captures the intuition, and refers us to the two-dimensional generalization in the book. Sometimes he’ll cite a result and say “this isn’t too important; you can go back later and prove it to yourself if you need to, but it’s not critical.” [1]

He frequently restates important earlier results from the lecture, because he says that students sometimes space out for a few minutes. Every once in a while, he stops and says “so far we haven’t done anything crazy; if you’ve fallen off here, we can get you back on the train pretty easily. But in 10 minutes we’ll be too far gone.”

So the lectures are tightly focused on the big picture, and Sipser aggressively manages questions, proof scope, and attention to proof details to achieve this focus.

In addition to the focus on the big picture, the delivery of the main ideas is meticulously planned. It’s as if he takes all of the attention that usually goes into the detail of proofs and redirects it into the details of formulating the main ideas clearly. When I asked him about it, he explicitly mentioned the following:

He tries to formulate theorem statements without negation, which can be hard to wrap your head around.
He tries to lay out the structure of analogies on the board in a way that communicates their meaning.
He wants [the algorithm], which is the main point of the lecture, to be on the central board so it draws attention. Expecting it to take two blackboard panes, he skips over them in the first part of the lecture.

Some of this, I’m sure, comes from practice. He’s been teaching 18.404 since at least 2001. But he also mentions that teaching well is a huge time commitment; apparently he spent roughly fifteen hours preparing for each lecture over the pandemic. At eighty minutes per lecture, that’s eleven minutes of preparation for every minute spent lecturing.

So why are the lectures in 18.404 rated so well? I think it’s a combination of an unwavering focus on the big picture, and close attention to detail in how to most effectively communicate the big picture*.* This combination allows Prof. Sipser to teach effectively to most of the audience in a huge class, who then turn around and give 18.404 stellar ratings.

[1] A common failure mode for lecturers is to introduce a theorem that only 2% of the class understands and then spend three lectures proving it in detail. In short, 18.404 is the exact opposite of this.

Jun 30, 2022 · 16 min read

Hacker's Guide to Making Circuit Boards (Part 1)

Hey! The purpose of this series is to teach you how to design, manufacture, and program a circuit board. Afterwards, you’ll be well positioned to work on a simple circuit board that drives something like bluetooth headphones, a quadcopter, or an electric skateboard.

My first circuit board <3 (and the one we're making in this tutorial!)

The focus throughout the series is on teaching a small set of skills that can get you surprisingly far: the equivalent of pip install and for-loops for electrical engineering.

I’m still quite new to this, but making a circuit board is way easier than I thought before I started. I’ve written this tutorial with the goal of making it widely accessible. Notably,

You don’t need to understand advanced stuff about circuits, like inductance. It might be useful if you remember some high school physics, like Ohm’s law, but you can pick this up as you need it. The most advanced individual circuits we’ll see look a lot like this.
You don’t need to know how to solder. I don’t. You can get your boards manufactured overseas with all the components perfectly inserted and soldered by a machine.
It doesn’t cost a lot of money. I spent less than $100 total on this project. That bought me five fully assembled boards and all the equipment necessary to program them. If you grew up with Arduino and SparkFun, you might be surprised (like I was) that a MPU6050 inertial measurement unit ($25 on a breakout board from SparkFun) actually costs $2 as a bare component (and less in bulk).

Concretely, we’ll be designing a circuit board that controls a robot. It’ll read data from a gyroscope and drive four servo motors. We’ll control it with a few buttons, power it with four AA batteries, and program it in Rust.

Let’s dive in!

Creating a Schematic

Our first order of business will be creating a schematic of our circuit board. A schematic is an abstract diagram that indicates which components we want to use, and how they should be electrically connected to one another.

Inertial Measurement Unit

The first component we’ll add to our schematic will be an inertial measurement unit (IMU). IMUs contain several sensors like gyroscopes and accelerometers integrated into a single chip, and tell you how fast you’re turning, where you’re pointing, and whether you’re accelerating.

We’ll start by choosing a specific IMU. Luckily, there’s a great third party comparison of the gyroscopes on many popular IMUs. You can go through this list and choose any that strikes your preferred price-performance-functionality tradeoff; we’ll go with the MPU 6050, which is a classic. [1]

Open up your schematic editor of choice (this tutorial uses EasyEDA [2]). Create a new schematic, then go to Place > Symbol, and search for the MPU 6050. Once you place it, you’ll see an abstract representation of the MPU 6050 with all its pins. To figure out what these pins do, you’ll need to take a look at the part’s datasheet (Google is your friend). These tend to be long and verbose; the MPU6050 datasheet is more than 50 pages long. A good place to start is by looking at the pin out table. This table describes the purpose of each of the pins on the bottom of the MPU6050. Some of these descriptions are self-explanatory; pin 18 is labelled “GND” and is supposed to be connected to ground. We can start by connecting that one in our schematic editor.

We can also put the X symbol on all eight NC pins and both RESV pins, which the pinout chart labels as “Not internally connected.” and “Reserved. Do not connect.” respectively. Although the X symbol doesn’t do anything, it’s a good reminder that we don’t need to connect anything there.

We can X out all the unused pins.

Soon, though, we're going to start running into less obvious pins. For example, it's totally unclear how the **REGOUT** pin, which the datasheet labels as "Regulator filter capacitor connection," is supposed to connect.

That’s where the Typical Operating Circuit comes in handy. This is a different diagram in the datasheet where the manufacturer basically lets you know how reasonable people set up their ICs. There might be three pages of information on which capacitor values are acceptable, but the typical operating circuit should give you reasonable defaults. Here, we can see that pin 10 (REGOUT) is supposed to be connected to a 0.1uF capacitor and then to GND. Let’s add that to the schematic (recall that 0.1uF = 100nF).

You’ll notice that there are lots of 100nF capacitors in the parts library. The ones that we want are in the “Basic Part” class (these are much cheaper than the “Extended Part” versions in low volumes). The four basic 100nF capacitors differ in the 4-number code that comes afterward; those are sizes. We chose the 0603 size, which is 0.06 inches by 0.03 inches. Then we connect it to REGOUT on one side and ground on the other.

You might wonder what the purpose of this capacitor is. It’s called a bypass capacitor, which you can think of like a little backup battery kept close to the chip. When the electrical inputs to the chip are noisy, the bypass capacitors effectively absorb and supply power to smooth over those irregularities.

You can copy capacitors from the typical application circuit for VLOGIC, CPOUT, and VDD. What about the rest of the pins?

CLKIN and FSYNC say “optional, connect to ground if unused.” We’re not trying to do anything fancy, so we ground them.

SDA and SCL implement a communication protocol called I2C. We’ll use these two pins to send questions and instructions from our microcontroller to our MPU6050.

How do we send bits over these wires? We start by connecting SDA and SCL to +3.3v on the schematic. To send a logical zero, the microcontroller connects its side of the wire to ground internally, which forces the line to zero volts. To send a logical one, the microcontroller does nothing with the pin, and the +3.3v connection tugs the line high automatically.

There’s one catch. If you just connect +3.3v directly to ground inside your microcontroller, you have a short circuit and your chip will get hot and fry itself. To avoid this problem, we put a “pull-up” resistor between the power supply and the microcontroller. This sharply limits how much power an internal ground can pull from the 3.3v supply.

This all sounds complicated, but it’s not so bad once you do it once. Here’s how it’ll look when you’re done. Notice the 4.7K pull-up resistors between the power supply (V3.3) and the pins. The SDA and SCL labels will eventually connect to SDA and SCL pins on the microcontroller. The V3.3 label indicates a connection to +3.3 volts, supplied from somewhere else on the board. AUX_DA and AUX_CL are another I2C pair for “connecting external sensors,” which we don’t have. We shouldn’t leave them unconnected, since I2C lines need to be kept at 3.3v by default, so we connect them to our 3.3v power through another pair of 4.7k pull-up resistors.

AD0 controls the “I2C slave address LSB.” If you search the datasheet for AD0, you’ll find this diagram, which says that the MPU6050’s I2C address is 1101000 when AD0 is low, and 1101001 when AD0 is high. Either is fine for us; we’ll choose to set it low by connecting it to ground. INT is an interrupt pin; the gyroscope can activate it to quickly signal something like “this chip is in free-fall.” We won’t use it.

EP is on the schematic, but not on the datasheet pinout. Stack Exchange says that it refers to the big pad on the bottom of the chip, which should sometimes be soldered to ground as a heatsink. However, looking through the datasheet, we see: We therefore choose not to connect EP.

When all of that is said and done, your schematic should look something like this. Congratulations! You’ve set up your first chip.

Power (3.3v LDO)

We said before that the 3.3V label indicates a connection to +3.3 volts, generated somewhere else on the board. Let’s tackle that now.

The MPU6050 needs 3.3 volts to run, but we’re planning to run our controller on a set of four AA batteries, which together will output 4 * 1.5 volts = 6 volts.

To decrease the voltage from 6v to 3.3v, we’ll use a LDO (Low Dropout regulator). LDOs are simple, cheap, and inefficient: they take an unknown input voltage and output a constant output voltage. If we input 6 volts, our LDO should output 3.3 volts and burn off the other 2.7 volts as heat.

We’ll try to choose a popular and well-documented LDO, so go to this website and search for “3.3v LDO,” then sort down by stock. You’ll see at the top there are a bunch of 1117-series parts (notice how they all have 1117 in their part number). We’ll grab one of these 1117 chips. Digging through its datasheet, we find this wonderful typical operating circuit in Chinese. We copy that diagram into our schematic as follows.

Here’s our final LDO setup.

Battery Connector

Next, we’ll connect the input side of the LDO to our battery with a battery connector. Our generic battery pack comes with a 2.54-mm spaced connector. Using the same parts website, I eventually found a matching header. It’s pretty simple to connect in the schematic. Make sure you put the VBAT connection on the + side of your battery, and the GND connection for the - side.

Microcontroller: Selection

Next we’ll select a microcontroller, which is the star of the show.

We’ll use a chip from the STM32 series, which is perhaps the most popular modern line of microcontrollers. More specifically, we’ll choose the STM32F3 series, which is based on a Cortex-M4 core (mid-range) and has floating point support. It also has a good Rust HAL for later!

Reading the STM32F3xx documentation page, it seems like the F302 is a standard choice without bells and whistles like high resolution timers.

Finally, the STM32F302 can be configured with varying amount of flash memory, RAM, and pins. We’ll ultimately go with the STM32F302C8T6, which has 64k of flash, 16k of RAM, and comes in a 48 pin package. Here she is: Phew! That’s a lot of pins to connect.

Microcontroller: Power

Let’s start with power. Looking at the STM32F3xx datasheet section on power supply, we see the following diagram. Fortunately, everything on the right is inside the microcontroller; we only have to worry about the left side. According to the top of the diagram, VBAT should be connected directly to the battery.

According to the middle of the diagram, we should connect each of the VDD pins to 3.3v, and each of the VSS pins to ground. We’re also supposed to add four 100nF and one 4.7uF bypass capacitors.

According to the bottom of the diagram, we should also attach VDDA and VSSA to power and ground respectively, with one 10nF and one 1uF capacitor. (If we wanted particularly accurate analog readings, we could take further care to isolate these pins from other electrical noise on the board.) The schematic can be deceiving; although these look like four separate circuits, the V3.3s and GNDs all link together. This is why it doesn’t matter that the 4.7uF capacitor is only graphically attached to the bottom pins; it’s electrically connected to all of them. [3]

Microcontroller: Clocks

The STM32F302 has high speed and low speed internal clocks. They’re not particularly accurate. If we need higher timing accuracy, we can choose to add external crystal oscillators to the OSC_IN, OSC_OUT, OSC32_IN, and OSC32_OUT pins. Since we don’t need particularly high timing accuracy, we’ll save the cost and leave these as generic I/O pins.

Microcontroller: Reset

We need a way to restart the microcontroller; that’s handled by the NRST pin. According to the STM32F302 datasheet, NRST is active low. This means that the pin should usually be high (at 3.3v). When we push the reset button, it should temporarily drop to ground to restart the board. We’ll do that with the following circuit. SW1 is a generic pushbutton; when you push, it connects the 1/2 side to the 3/4 side. By default, NRST is held to 3.3v, since it’s connected to V3.3 through that 47k resistor. When SW1 is held, NRST is also connected to ground through a 10k resistor. Since the ground resistor is smaller, the voltage on NRST jumps closer to ground. This gadget is called a voltage divider. [4]

Microcontroller: Boot

When the microcontroller starts, where should it execute code from? Most often, it should execute from main memory (where our programs are written). But sometimes we might want to boot from a stable system bootloader instead.

This choice is controlled by the Boot0 pin. So usually Boot0 should usually be low, corresponding to main flash memory, but when we hold a button, we’ll set Boot0 high to boot from system memory instead. We won’t worry about the third chart entry, embedded SRAM, for now.

To achieve this, we’ll use the same gadget we used for NRST, but with V3.3 and GND reversed. If we want to boot from system memory, we’ll hold down the SW2 button during boot up.

Microcontroller: Debug

Now we need a way to program our microcontroller. We’ll do that using a debug tool that speaks the SWD (serial wire debug) protocol. The connector for that tool is a 2x5 male header with 1.27mm spacing, laid out like this: None of our microcontroller pins are labelled “SWDIO”, but we can reference this handy table to see which pin handles that function. In our case, it looks like the PA13 pin handles SWDIO. We also need to attach SWDCLK, SWO, and NRST. First we’ll put little tags on the relevant pins: Then we’ll grab a 2x5 pin header with 1.27mm spacing and attach those tags to it, along with power and ground. We now have the basics of the microcontroller fully configured. All that’s left is to attach inputs and outputs for the IMU and servos.

Microcontroller: IMU I2C

Recall that our IMU speaks to our microcontroller through the I2C protocol, using an SDA and an SCL pin. Where can we find SDA and SCL on our microcontroller?

Scrolling through the pinout table, we notice that PA10 has an alternate function as “I2C2_SDA”, and PA9 is similarly listed for “I2C2_SCL”. Since we don’t need PA9 and PA10 for anything else, we’ll use the second I2C interface, “I2C2”. (If we had needed PA9 for a different alternate function, we could have used I2C1 or I2C3 instead.) We connect PA9 and PA10 to the IMU with the same SDA and SCL tags we used back then.

Servos

Finally, for the servos. We can control servos by sending them pulses of varying lengths, using a system called pulse width modulation (PWM). We’ll discuss this more in part 3, when we program the microcontroller. For now, we need to take care to attach the servos to IO pins with the hardware needed to generate PWM signals.

Searching for “PWM” in the datasheet, we find the following. And looking at TIM2, we’re in luck! It can drive four independent PWM channels for our four servos. Checking the pinout table, we see that TIM2_CH1 corresponds to PA0, TIM2_CH2 to PA1, and so forth. We connect some labels as usual. Our servos have a 3x1 female connector with 2.54mm spaced pins. They accept power, ground, and signal on those three lines, and power in the middle. We grab four corresponding male headers from the parts library, and GND, VBAT, and each signal line. That’s all; the finished schematic should look like this. A word of congratulations: designing schematics is probably the hardest part of this project, especially for someone with programming background. And designing your first schematic is much harder than your second: when you design your second schematic, you’ll already understand LDOs, SWD, timers, pull-up resistors, etc. Now you’re over the hump!

Unfortunately, we’re not ready to send this off to a fabrication facility yet. Schematics represent electrical connections, but they don’t say anything about how the PCB should be laid out in physical space. In part two, we’ll cover routing, which is the process of flattening that electrical graph onto a series of copper traces on a two dimensional piece of fiberglass. Then we’ll submit it for manufacturing. In part three, we’ll program our new PCB in Rust!

Notes

[1] Instead of buying a MPU 6050 on a breakout board from Adafruit, of course, we’re going to buy just the little black chip directly from LCSC.

[2] You’ll need a schematic editor (a piece of EDA, or electronic design automation, software). Although it pains me, for beginners I recommend EasyEDA over alternatives like KiCad and anything from AutoCAD. Hardcore professionals seem to use either KiCad or an AutoCAD product for schematic design. Unfortunately, if you do that, “please place a 10K ohm resistor here” is a three step process:

Place a resistor symbol onto your schematic.
Specify the resistor’s footprint. The footprint answers questions like “how wide is this resistor?” and “where should the conductive pads be placed?” and “how wide are the conductive pads?”
Choose which exact 10K resistor you want (e.g. a CR0402FF2103G), after making sure your manufacturer has enough of them in stock and at a reasonable price.

The alternative is EasyEDA, which is owned by LCSC, a very large Chinese electronics manufacturer. It’s pretty mediocre software overall, but placing a resistor is a single step. You choose the exact model immediately (CR0402FF2103G) with access to LCSC’s current stock and pricing information, and it automatically knows the footprint. As you’d expect, getting the whole thing manufactured by LCSC is extremely easy.

I’m no KiCAD expert; if there’s some way to get KiCAD to integrate more easily with manufacturers, or if I’m missing something else, please let me know!

[3] However, it’s good to eventually have the bypass capacitors be physically close to the power pins, and this layout will helps me remember that when I do the routing.

[4] This gadget is sometimes called a voltage divider. We can quickly calculate the low voltage with ohm’s law (given that the currents are equal): 10 / (47 + 10) * 3.3v = 0.58v.

May 17, 2023 · 17 min read

On The Unreasonably Broad Uses of Diagonalization

Think of any program which mangles source code. This program should take source code as a string and do something with it: hash it, encrypt it, append it to itself, delete it, or anything else.

Here’s a very surprising result. Each of those mangling transformations is weak to a specific string of source code, whose runtime behavior will be completely unchanged by the mangling.

In other words, regardless of how intense your mangling is, it is guaranteed to not to be able to affect some strangely strong and robust program. We call that specific program a fixed point of your transformation.

Your transformation might be “encrypt the source code, interpret that as an integer, then print the third largest prime factor of that integer twelve times.” And yet, I tell you, you’re in luck: I have a program in my back pocket which will behave exactly the same before and after.

Let’s take a basic example: the transformation that swaps ‘a’ and ‘z’. What are its fixed points? As expected console.log("hello"), as well as every other program with neither ‘a’s nor ‘z’s, are unchanged.

const swapAandZ = (input: string) =>
  input.replace(/[az]/g, c => c === 'a' ? 'z' : 'a')

const input = 'console.log("hello")'

eval(input) === eval(swapAandZ(input))
// our === implies the same return value, console outputs, error text, etc.

Some source transformations are slightly more stubborn; can you think of a program which will be unchanged by “append the source to itself?”

It turns out that while (true) {...} (or the empty program) are both fixed points of the doubling transformation.

const double = (source: string) => source + source

const input = "while (true) {console.log('loop');} "

eval(input) // => loop loop loop loop loop ...
eval(double(input)) // => loop loop loop loop loop ...

But then there are transformations whose fixed points are extremely non-obvious. Can you think of a fixed point of “hash the source and execute that hash”? The program would need to behave exactly the same hashed and unhashed.

const hash = (source: string) => 
  createHash('sha256').update(source).digest('hex');

const input = "some crazy program"

// eval(input) === eval(hash(input))

Nevertheless, as promised, we will prove that in general, any transformation, no matter how crazy, must have a fixed point.

// for any transformation
const transform: (source: string) => string;

// there exists an input
const input: string;

// where executing the program is the same as executing its transformation
eval(input) === eval(transform(input))

The technique we’ll use to prove this remarkable truth is a diagonalization trilemma. To warm up to the technique, we’ll start with a reminder of the proof that there are more reals than natural numbers. Then we’ll prove the above result, that all code transformation programs have a fixed point. This result is called Rogers’s Theorem. Finally, we’ll prove that a seemingly powerful class of functions, the primitive recursive functions, can’t be complete.

We’ll then examine a terse and opaque piece of math called the Diagonal Lemma, which we will understand by analogy, as Rogers’s theorem applied to math instead of code. We’ll then prove it with our familiar diagonalization trilemma technique.

Finally, with the powerful Diagonal Lemma in hand, we’ll sketch a pair of excellent proofs of Gödel’s First Incompleteness Theorem (all theories must be either incomplete or inconsistent) and Tarski’s Undefinability of Truth (theories cannot define an internal notion of truth).

Warm Up: Cantor

You might recognize diagonalization from the famous proof that there are more real numbers than natural numbers. We’ll walk through it again here, with an eye towards introducing the trilemma that will guide us through Rogers and the Diagonal Lemma, into Gödel and Tarski.

The argument starts by building a complete numbered list of every real number between 0 and 1:

1 => 0.139042986720983745...
2 => 0.349857294587293845...
3 => 0.239841509204872018...

We’ll define an evaluation function $f(i)(j)$ that takes the $i$ th real number and returns its $j$ th decimal digit: [0]

f(3)(3) => 0.23[9]841509204872018... => 9

We also create a “make it different” function $\delta$ , which takes a digit and returns a different digit. Many rules works; we’ll map everything to 1, except 1 gets mapped to 2:

const delta = (x: number) => x === 1 ? 2 : 1

Now, we’ll build a new real number $g$ by going through the diagonal values ( $f(1)(1)$ , $f(2)(2)$ , etc) and applying $\delta$ to choose a different digit at each location for $g$ : $g(j) = \delta(f(j)(j))$ . Reading the 1st digit of the 1st number, the 2nd digit of the 2nd, the 3rd of the 3rd, and so on, the diagonal starts 1, 4, 9, and so $g$ starts 0.211…

d(f(1)(1)) => d(0.[1]39042986720983745...)  => d(1) => 2
d(f(2)(2)) => d(0.3[4]9857294587293845...)  => d(4) => 1
d(f(3)(3)) => d(0.23[9]841509204872018...)  => d(9) => 1

$g$ differs from the first number at digit 1 (it must differ, because $\delta$ always changes its input, and the first digit of g is $\delta$ (the first digit of f), the second number at digit 2, and so on. But we started by listing all the rational numbers between zero and one, so how could $g$ not be in this list?

One of three things must have gone wrong:

Maybe $\delta$ is broken, and doesn’t actually change every digit. If $\delta$ has a fixed point, $g$ might simply be hidden somewhere in the list.
Maybe $g$ is a new kind of object, and does not belong in the list.
Maybe the list is incomplete.

This is the diagonalization trilemma.

Well, $\delta$ is clearly not broken, and $g$ is definitely a real number, so it must that our list of real numbers is incomplete.

And since every natural number is listed on the left side of the list, but there’s at least one real number not listed on the right side of the list, there must be more reals than naturals.

Rogers’ Theorem: Fixed Points of Program Transformations

Now let’s try the exact same trick, but on programs.

This time we’re going to list all the JavaScript programs, alphabetically:

1 => eval('a')
2 => eval('b')
3 => eval('c')
...
10348423 => eval('x => x + x')

Our evaluation function $f(i)(j)$ takes the $i$ th program in the list and runs it on the input $j$ .

f(10348423)(1)
  => eval('x => x + x')(eval('a'))
  => 'aa'

We’re going to use const delta = (x: string) => x + " + 1". Recall that $\delta$ operates on the source, so for example

delta(f(10348423)(1))
  => eval('x => x + x')((eval('a')) + 1
  => 'aa' + 1
  => 'aa1'

Just like before, we build a new object $g$ (in this case a program and not a real number), with $g(j) = \delta(f(j)(j))$ . And just like before, $g$ can’t be any program on our list — it disagrees with every program at the diagonal position.

Check this by arguing to yourself: why is $g$ not the program x => x + 4, for example? Because that program appears in our list at position 10521488, and when evaluated at 10521488 returns 10521488 + 4 = 10521492. But $g(10521488)$ returns $\delta(10521492)$ , which is anything other than 10521492. So they can’t be the same program.

So $g$ isn’t on the list. What went wrong? Again, our three suspects are the $\delta$ , the $g$ , and the list. It must be one of:

$\delta$ has a fixed point. It doesn’t, right?
$g$ might be misbehaved or not writable in JavaScript. But we can describe how to compute $g$ : “look up the $n$ th program, run it on input $n$ , append 1.” That sure sounds like something you could write in JavaScript.
Our list of JavaScript programs might not be exhaustive. But surely we’ll eventually print out even the longest of JavaScript programs by iterating alphabetically?

Do you see the issue? It turns out that our $\delta$ does have a fixed point. In particular, eval("while (true) {}") is a fixed point of $\delta$ . If your program loops forever, and you add one to the result, it still loops forever.

Not only that, but all viable $\delta$ s share this flaw. To build the “change that output” program $\delta$ , you need to know whether your original program halts or not. And in general, you can’t know whether it halts or loops, so you can’t build a good $\delta$ . [1]

This constitutes a somewhat shocking proof that it’s impossible to build a function $\delta$ that takes any program and modifies it to produce a different result. In other words, all programs mapping functions to functions have a fixed point. This result is called Rogers’s Theorem.

There’s a nice immediate corollary: any language allows you either solve the halting problem or build an interpreter for itself. JavaScript allows interpreters but not halting-solvers. Primitive recursive functions, which we’ll examine next, allow halting-solvers (all P.R. functions halt), but can’t build their own interpreters. These correspond to the (1) and (3) branches of the trilemma respectively.

Example: Primitive Recursion

Next, we’ll look at primitive recursive functions. Primitive recursive functions are all the functions made of combinations five functional building blocks: $z, s, p, c$ and $\rho$ . Don’t worry if you haven’t seen them before. The upshot is that you can easily list them, numbering them sequentially:

1 => z() // zero function
2 => s() // +1 function
3 => p() // "projection" function
...
372 => c(s, z) // composition of 0 and +1 (1)
373 => c(s, s) // composition of +1 and +1 (+2)
...

We’ll choose $f(i)(j)$ as the evaluation function, which takes the $i$ th primitive recursive function and computes it on $j$ . For example, evaluating the 373rd function at the value 3:

f(373, 3) 
  => c(s, s)(3)
  => s(s(3))
  => 3 + 1 + 1
  => 5

We can easily build a function $\delta(x) = x + 1$ that has no fixed point.

So we consult our trilemma, and decide that either (1) $\delta$ is broken, (2) the list $f(i)$ isn’t complete, or (3) $g$ is a new kind of object.

$\delta$ has no fixed point, and $f(n)$ obviously enumerates all the primitive recursive functions (by definition). So it must be that $g(i) = \delta(f(i)(i))$ is a new kind of object: a non-primitive-recursive (but computable) function.

This might surprise you if you’ve only played with primitive recursive functions for a little bit: they can compute functions like plus, times, isPrime, nthPrime, ifElse, and more. But it turns out that they aren’t Turing complete, since they can’t simulate the eval function.

Diagonalization Trilemma

We’re going to prove this by asserting that one of three things must always be true. Situation: you have a list of all the objects of some type, and a “make it different” function, which takes an output value and returns a different output value in that group. We will build a new object “g”, which has the same shape but is definitely not on the list. Therefore, we live in one of three worlds:

The list is broken and does not include every object in the group. “g” is a well behaved object of the type which you somehow failed to list.
“g” is somehow of a different type from the list objects, even though it looks similar.
The “make it different” function is broken, and does not always “make it different”; for some objects it actually returns the same thing.

Trilemma Proof

We’ll make a quick stop to formalize and prove this trilemma that we’ve been using.

Diagonalization. For all sets $X$ and $Y$ with a function $f: X \times X \rightarrow Y$ and a function $\delta: Y \rightarrow Y$ with no fixed point, there exists at least one function $g: X \rightarrow Y$ such that there exists no $k$ such that $f(k) = g$ .

Proof. Let $g(x) = \delta(f(x, x))$ . For any $k$ , $g \neq f(k)$ , since $g(k) \neq f(k)(k)$ .

The important part of this proof is that it sets up our favorite trilemma:

Trilemma. For all sets $X$ and $Y$ with a function $f: X \times X \rightarrow Y$ , and a function $\delta: Y \rightarrow Y$ , at least one of the following three things must be true:

$\delta: Y \rightarrow Y$ has a fixed point.
There exists some $g: X \rightarrow Y$ which isn’t found in $f(n)$ for $n \in \mathbb{N}$ , but $g$ is obviously the right kind of object, so the list of objects $f(n)$ must not be exhaustive.
There exists some $g: X \rightarrow Y$ which isn’t found in $f(n)$ for $n \in \mathbb{N}$ , but $f(n)$ obviously lists all the possible objects, so $g$ must actually be a new and different kind of object.

New Tool: Diagonal Lemma

We’ve just seen that all programs that map programs to other programs have a fixed point. Next, we’re going to see a very similar result, but this time applied to mathematical theorems. It’s wrapped up in an opaque piece of math called the Diagonal Lemma. But once we understand the Diagonal Lemma, it opens the door to amazingly short demonstrations of Godel’s Theorem and Tarski’s Theorem.

Let’s say you’re working in some proof system, which is basically some laws (“axioms”) plus some deductive rules. Call that system $T$ .

Give me a statement $A(x)$ . Then $T$ can prove that there exists a sentence $\phi$ where $\phi$ is only true if $A(\ulcorner \phi \urcorner)$ is true, where $\ulcorner \phi \urcorner$ is the numerical representation of the bytes in $\phi$ . [2]

Mathematically, we can express the same as follows, with $T \vdash$ read “T proves:”

\forall A \, \exists \phi \quad T \vdash \phi \iff A(\ulcorner \phi \urcorner)

Suppose we’re dealing with $A(x) := x * 2 = 4$ . Then there’s a sentence $\phi$ where strong enough theories can prove that $\phi \iff \ulcorner \phi \urcorner * 2 = 4$ . In this case, we can choose $\phi$ to be the sentence $0 = 1$ . Then $\ulcorner \phi \urcorner = 3161393$ , and indeed $T$ can prove that

0 = 1 \iff 3161393 * 2 = 4

since both sides are false.

In the domain of programs, we’ve seen that any program you can name that transform programs into other programs has a fixed point. The Diagonal Lemma says something extremely similar for math: any statement you can name that maps the numerical representation of a statement to another statement has a fixed point. And as a bonus, strong enough theories can prove that.

Proof Sketch: Diagonal Lemma

Consider a sufficiently strong theory $T$ and any arbitrary formula $A$ with one free variable. Then there exists a $\psi$ so that $T$ proves that $\psi \iff A(\ulcorner \psi \urcorner)$ .

The set of formulas with one free variable are enumerable (for example, by printing all strings and then discarding any that have more than one variable symbol):

274 => "x = 1"
...
672 => "x = x"
...
1837501 => "5x = x && 4 = 5"

We’re going to define a function $f(s)(s')$ which evaluates $s$ on the byte-representation of $s'$ . For example, f( $x$ is prime)( $x$ * $x$ = 100) evaluates to “33823693116354608 is prime” (the string x*x=100 in UTF-8 has the decimal representation 33823693116354608). Notice that $f$ maps two formulas with a free variable into a single sentence with no free variables.

Let $\delta(\phi) = A(\ulcorner \phi \urcorner)$ .

Now for the trilemma. All the formulas in our system definitely have a written representation, and all written representations with one free variable definitely count as formulas in our system.

So it must be that $\delta$ has a fixed point. And that fixed point is exactly what we’re after: some $\psi$ so that $\psi$ and $A(\ulcorner \psi \urcorner)$ are logically equivalent. (Notice that we’ve shifted from “output equivalence” to “logical equivalence.”)

So we’ve seen how we can prove this. We’re going to omit the proof that $T$ itself can also prove this, since it’s a bit too hairy for our taste. If you look into it, it basically involves proving that $T$ can construct and prove this same argument inside itself.

Proof Sketch: Gödel’s Incompleteness Theorem

We’ve finally arrived at Gödel’s Incompleteness Theorem. Here’s a quick reminder of the statement.

Gödel’s First Incompleteness Theorem. Any sufficiently powerful theory must be either inconsistent or incomplete.

The core idea is to construct a sentence $S$ like “this sentence is unprovable.” Then if $S$ is provable, it’s wrong (the surrounding theory can prove things that are contradictory; this is called being inconsistent). If $T$ can’t prove $S$ , then $S$ is true (the surrounding theory has truths that it can’t prove; this is called being incomplete).

That’s a slick argument, but all of the complexity hides inside questions like, are sentences like “this sentence is unprovable” even allowed? How do you express something like “this sentence” formally? To our happy surprise, the Diagonal Lemma allows us to build precisely these sentences.

Let $A(\ulcorner x \urcorner) =$ $\forall p$ , the string $p$ is not a proof of $x$ . In other words, $A(\ulcorner x \urcorner)$ = $x$ is unprovable. Then by the Diagonal Lemma, we can use $T$ to prove that there exists an $x$ with $x \iff A(\ulcorner x \urcorner)$ , or $x \iff$ ( $x$ is unprovable). And ” $x \iff$ ( $x$ is unprovable)” is exactly “this sentence is unprovable.” So we can build these sentences!

Proof Sketch: Tarski’s Theorem

Here’s a quick reminder of what Tarski’s Theorem says.

Tarski’s Theorem. Any sufficiently powerful theory cannot define its internal notion of truth.

To prove it, we take $A(\ulcorner x \urcorner)$ = ( $x$ is not true). The Diagonal Lemma guarantees that we can prove the existence of an $x$ such that $x \iff$ ( $x$ is not true). But ” $x$ is true $\iff$ $x$ is not true” is obviously contradictory.

The only thing we can point to is our original $A$ , which must have been flawed. It must be that theories cannot define predicates like $x$ is true or $x$ is not true.

Conclusion

I hope you’re convinced that diagonalization is an extremely powerful tool.

From our original diagonalization trilemma, we were able to immediately show that primitive recursive functions aren’t complete, that there are more reals than naturals, and that program-transforming-programs must all have fixed points.

Then we used the same trilemma to construct a new tool, the Diagonal Lemma, and closed out the post by using it to construct the strange sentences required for Gödel and Tarski’s famous theorems.

We’ve kept the math relatively light, leaning on intuition and code where necessary. If you want a more rigorous treatment of this topic, I highly recommend continuing with Yanofsky [0], upon which this post is based.

Notes

[0] Note that we’ll use the curried $f(3)(5)$ and uncurried $f(3, 5)$ forms interchangeably in this post.

[1] A delightful corollary: if your language is weak enough that you can solve the halting problem, then it must not be able to build an interpreter for itself. That’s what happens with P.R. functions, which all halt.

Appendix

Gödel Numbers

Gödel numbers are a way of converting formula into numbers. We denote the Gödel number of $x$ as $\ulcorner x \urcorner$ . You can imagine $\ulcorner x \urcorner$ just interpreting the underlying bytes of $x$ as an integer instead of a string.

const gödel = (s: string): bigint => {
  const byteArray = new TextEncoder().encode(s);
  const hexString = Array.from(byteArray)
    .map((byte) => byte.toString(16).padStart(2, "0"))
    .join("");
  return BigInt(`0x${hexString}`);
};

const unGödel = (n: bigint): string => {
  const hexString = n.toString(16);
  const byteArray = new Uint8Array(
    hexString.match(/.{1,2}/g)!.map((byte) => parseInt(byte, 16)),
  );
  return new TextDecoder().decode(byteArray);
};

console.log(gödel("x * 2 = 4")); // 2215926989203043065908n
console.log(unGödel(2215926989203043065908n)); // x * 2 = 4

Diagonalization Proof

Here’s the straight math version of the general diagonalization argument.

Proof. Let $g(x) = \delta(f(x, x))$ . For any $k$ , $g \neq f(k)$ , since $g(k) \neq f(k)(k)$ .

The Diagonalization Trilemma with Exposition

Diagonalization. Let’s say you have an “evaluation” function $f: X \times X \rightarrow Y$ . You can think of $f(3)(5)$ as selecting the 3rd object from a long list of objects, and evaluating it on 5. [0]

Let’s say you also have a $\delta: Y \rightarrow Y$ with no fixed point. You can think of $\delta$ as a “give me something different” function: it changes every $y$ into a different $y$ .

Then there exists a function $g: X \rightarrow Y$ that isn’t equal to $f(k)$ for any $k$ . In other words, there’s at least one new object $g$ that isn’t equal to $f(3)$ or $f(55)$ or $f(574)$ , or any other object $f(x)$ in the original list of objects. (Note that $f(55)$ and $g$ are both functions representing objects.)

Proof. Take $g(x)$ as $\delta(f(x)(x))$ . For any $k$ , $g \neq f(k)$ , since $g(k) \neq f(k)(k)$ . $\blacksquare$

Spelling out the proof, if you’re concerned that maybe $g = f(234)$ , you can check that at $g$ and $f(234)$ differ at index 234: $g(234) \neq f(234)(234)$ . That’s because $g(234)$ does equal $\delta(f(234)(234))$ , which is different from $f(234)(234)$ because $\delta$ is the “give me something different” function.

The most important part of this proof is that it establishes a crucial trilemma, which we’ve used repeatedly above.

Acknowledgements

This post (and the primitive recursive post) were prepared as part of my final work for the MIT class Seminar in Logic (18.504). Thanks to Henry Cohn, Lisa Kondrich, and Laker Newhouse for their extensive suggestions, feedback, and help with both this post and the primitive recursive introduction post.

References

[0] Noson S. Yanofsky (2003). A Universal Approach to Self-Referential Paradoxes, Incompleteness and Fixed Points*. Bulletin of Symbolic Logic, 9(3), 362–386.*

[1] Garrabrant, S., & Eisenstat, S. (n.d.). Diagonalization fixed point exercises. Retrieved from https://www.lesswrong.com/posts/FZkLa3GRLW97fpknG/diagonalization-fixed-point-exercises

[2] Kelly, K. T. (n.d.). Computability and Learnability.

[3] Primitive recursive function. (2023). Retrieved from https://en.wikipedia.org/wiki/Primitive_recursive_function

[4] Smith, P. (2013). An Introduction to Gödel’s Theorems. Cambridge, England: Cambridge University Press.

[5] Solomon, E. (1958). Unpacking the diagonal lemma. Retrieved from https://math.stackexchange.com/questions/127852/unpacking-the-diagonal-lemma