This semester I took 18.404 ("Theory of Computation"). It's widely known as being among the best classes at MIT.
Prof. Sipser teaches 18.404 in the standard format, with eighty minute lectures on Tuesdays and Thursdays. Perhaps it's because the format is standard that the rating is impressive. It's like winning a Michelin star for boiling an egg. What makes a large lecture amazing?
If there's one thing that distinguishes Sipser's teaching style, it's this: the main thrust of the lectures is clear, and he does not get bogged down by anything. Because of this, he manages to keep the interest and attention of all 200 students, instead of only the first two rows.
For example, instead of encouraging esoteric and nitpicky questions, he frequently prompts by asking whether "anyone has a question that will help others to understand" or whether "anyone who doesn't quite understand can ask a clarifying question." One time someone asked about whether [some quantity] should be [that quantity]+1. He said something along the lines of "in this case, I happen to know that there's no +1, but really, what's a +1 between friends?"
Sometimes he'll prove a result in the $\implies$ direction but refer to the book for the $\impliedby$ direction. Sometimes he proves the one-dimensional case, which captures the intuition, and refers us to the two-dimensional generalization in the book. Sometimes he'll cite a result and say "this isn't too important; you can go back later and prove it to yourself if you need to, but it's not critical." 
He frequently restates important earlier results from the lecture, because he says that students sometimes space out for a few minutes. Every once in a while, he stops and says "so far we haven't done anything crazy; if you've fallen off here, we can get you back on the train pretty easily. But in 10 minutes we'll be too far gone."
So the lectures are tightly focused on the big picture, and Sipser aggressively manages questions, proof scope, and attention to proof details to achieve this focus.
In addition to the focus on the big picture, the delivery of the main ideas is meticulously planned. It's as if he takes all of the attention that usually goes into the detail of proofs and redirects it into the details of formulating the main ideas clearly. When I asked him about it, he explicitly mentioned the following:
- He tries to formulate theorem statements without negation, which can be hard to wrap your head around.
- He tries to lay out the structure of analogies on the board in a way that communicates their meaning.
- He wants [the algorithm], which is the main point of the lecture, to be on the central board so it draws attention. Expecting it to take two blackboard panes, he skips over them in the first part of the lecture.
Some of this, I'm sure, comes from practice. He's been teaching 18.404 since at least 2001. But he also mentions that teaching well is a huge time commitment; apparently he spent roughly fifteen hours preparing for each lecture over the pandemic. At eighty minutes per lecture, that's eleven minutes of preparation for every minute spent lecturing.
So why are the lectures in 18.404 rated so well? I think it's a combination of an unwavering focus on the big picture, and close attention to detail in how to most effectively communicate the big picture. This combination allows Prof. Sipser to teach effectively to most of the audience in a huge class, who then turn around and give 18.404 stellar ratings.
 A common failure mode for lecturers is to introduce a theorem that only 2% of the class understands and then spend three lectures proving it in detail. In short, 18.404 is the exact opposite of this.