More Was Possible: A Review of If Anyone Builds It, Everyone Dies

Clara Collier

Eliezer Yudkowsky and Nate Soares have written a new book. Should we take it seriously? 

I am not the most qualified person to answer this question. If Anyone Builds It, Everyone Dies was not written for me. It’s addressed to the sane and happy majority who haven’t already waded through millions of words of internecine AI safety debates. I can’t begin to guess if they’ll find it convincing. It’s true that the book is more up-to-date and accessible than the authors’ vast corpus of prior writings, not to mention marginally less condescending. Unfortunately, it is also significantly less coherent. The book is full of examples that don’t quite make sense and premises that aren’t fully explained. But its biggest weakness was described many years ago by a young blogger named Eliezer Yudkowsky: both authors are persistently unable to update their priors. 

Yudkowsky has been thinking about AI for a very long time. He founded the Singularity Institute for Artificial Intelligence in 2000 — his goal, at that point, being to build an artificial superintelligence rather than prevent it. That soon changed. Yudkowsky became more concerned with the risks of advanced AI and retooled the organization to match. SIAI was renamed the Machine Intelligence Research Institute in 2005. Yudkowsky’s writings over the next few years established the core components of the MIRI worldview. (Soares joined the organization in 2014, became the executive director the year after, and is now is its president.)

In a nutshell: AIs are capable of becoming much faster and more efficient than human minds. A sufficiently intelligent AI will be better than humans at the project of AI research. This, in turn, will lead to a feedback loop where AIs rapidly improve their own capabilities, yielding smarter agents, which are even better at AI research, and so on, and so forth — escalating uncontrollably until it yields a single AI agent which exceeds all humans, collectively, in all mental abilities. This process is called an intelligence explosion, or, colloquially, FOOM (rhymes with “doom”). It’s probably the single most controversial premise inside the community of people who seriously worry about superintelligent AIs (which, in the interests of full disclosure, does include me). It’s also essential to the next element of the MIRI story: The resulting superintelligence will inevitably want to kill us all. This isn't because the AI will be malicious — instead, we should expect it to be incomprehensibly alien. Whatever values we try to program into it, they will be warped unrecognizably in the process of FOOM. And whatever it might end up wanting, keeping humans alive won't be the most efficient way to get it. (The memetic quote: "The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.") The default outcome of building a superintelligence is extinction. Our only hope is not to try.

This is the basic story, as expressed in listservs, white papers, and especially the group blogs Overcoming Bias and LessWrong, mostly between 2005 and 2008. It’s also roughly the first two thirds of If Anyone Builds It, Everyone Dies. 

Some things have happened since then. 

First, a lot more people got interested in AI safety. (Then a lot more people got interested in AI, period). Many of them agreed with the basics of the MIRI worldview, but plenty others didn’t. Today, there is a thriving field of AI safety researchers who broadly agree that if progress continues at its current rate, we’ll soon have to contend with AI systems that match or exceed human intelligence. Many of them are worried that these systems will develop the ability to form goals and execute long-term plans — something current AIs struggle with — because this is an explicit goal of every major existing AI lab. As these AIs get more capable, more autonomous, and more deeply integrated into our economies and militaries, they stand a real chance of taking over the world. Some researchers think all this could lead to human extinction. Some even think it will happen soon. But most have important disagreements with MIRI. On the whole, they’re much less convinced that there will be an uncontrollable intelligence explosion in which a single AI rapidly gains the ability to defeat all of humanity combined. Even proponents of AI 2027, itself a relatively extreme view, believe that there will be several competing AIs which will gradually become more capable and more power-seeking over a period of years, giving humans plenty of chances to catch them in the act and take precautionary measures. 

The subtext of If Anyone Builds It — occasionally rising to text — is that Yudkowsky thinks that all of these people are idiots whose various research projects stand no chance of preventing our destruction and are (intentionally or unintentionally) aimed more at making the idea of a possible AI catastrophe respectable than at actually preventing it. And as you might imagine, MIRI occupies an uncomfortable place in the world of AI safety. Yudkowsky is inarguably the founder of the field; even today, it’s likely that most people chose to work in it because of him, in some way or another. At the same time, the organization he founded has become increasingly isolated. MIRI’s institutional stance is that the rest of the field is delusional because they don’t want to acknowledge that we’re obviously doomed. In the book, Yudkowsky and Soares argue that it should be illegal to own more than eight of the most powerful GPUs available in 2024 without international monitoring (frontier training runs today use tens of thousands). To more mainstream AI safety organizations, this position entails rejecting valuable work in favor of policies that can’t be implemented and would tank the global economy if they were. 

To be sure: there is still widespread agreement among AI safety advocates that an artificial superintelligence would be capable of and inclined towards world domination. (Whether this would lead to human extinction is more controversial.) This has led to a natural temptation to embrace If Anyone Builds It on the grounds that anything which raises the salience of AI risk is all to the good. Prominent bloggers like Scott Alexander and Zvi Mowshowitz both recommend the book for general audiences, even as they note that they don’t agree on every detail and are much less certain about the main premise. 

I take a different perspective: Details matter. They matter in the first place because these mildly different stories of AI takeover suggest substantively different research and policy agendas. Yudkowsky and Soares’ draconian chip monitoring proposals aren’t neatly separable from their arguments about the nature of artificial intelligence. They are the conclusion of a very specific set of beliefs — for example, that danger will come in the form of a single superintelligent AI, which can’t be monitored or countered by other systems, and that its ability to improve its own code will make the amount of physical compute it has access to largely irrelevant. These points are implied by the MIRI worldview, but Yudkowsky and Soares largely breeze past them. The book spends whole chapters unpacking the motivations of future superintelligent AIs, but devotes very little space to justifying its own account of how those AIs will be created. 

This is not a minor point. Take the idea of an intelligence explosion — a key plank of the MIRI story. It plays a major role in the fictionalized description of a possible AI takeover scenario that makes up chapters seven through nine (and which has some elements that sound plausible, even if it does also include the Yudkowsky classic "tiny molecular machines with the strength of diamond and corresponding mechanical advantages in their speed and resilience"). It is central to their belief that mass extinction is a nearly inevitable consequence of building AGI. This is hammered home in chapters 10 and 11, which are about why none of the currently proposed plans to make benign AI will work. We are told: "humanity only gets one shot at the real test." That is, we will have one opportunity to align our superintelligence. That's why we'll fail. It's almost impossible to succeed at a difficult technical challenge when we have no opportunity to learn from our mistakes. But this rests on another implicit claim: Currently existing AIs are so dissimilar to the thing on the other side of FOOM that any work we do now is irrelevant. 

Most people working on this problem today think that AIs will get smarter, but still retain enough fundamental continuity with existing systems that we can do useful work now, while taking on an acceptably low risk of disaster. That's why they bother. Yudkowsky and Soares dismiss these (relative) optimists by stating that "these are not what engineers sound like when they respect the problem, when they know exactly what they're doing. These are what the alchemists of old sounded like when they were proclaiming their grand philosophical principles about how to turn lead into gold." 1   I would argue that the disagreement here has less to do with fundamental respect for the problem than specific empirical beliefs about how AI capabilities will progress and what it will take to control them. If one believes that AI progress will be slow and continuous, or even relatively fast and continuous, it follows that we’ll have more than one shot at the goal. 

Even if Yudkowsky and Soares don’t want to debate their critics — forgivable in a pop science book — one would think they’d devote some space to explaining why they think an intelligence explosion is likely to occur. Remarkably, they don’t. The concept gets two sentences in the introduction. They don't even explain why it's relevant. It is barely introduced, let alone justified or defended. And it’s certainly not obvious enough to go without saying, because advances in the neural networks which constitute current advanced AI have been continuous. The combination of steady algorithmic progress and increasing computational resources have produced years of predictable advances. Of course, this can’t rule out the possibility of a future intelligence explosion, but the decision not to explain why they think this might happen is utterly baffling, as it’s load-bearing for everything that follows. 

It’s also characteristic of the book’s general attitude towards facts about contemporary AI. We’ve learned a lot since 2008. The models Yudkowsky describes in those old posts on LessWrong and Overcoming Bias were hand-coded, each one running on its own bespoke internal architecture. Like mainstream AI researchers at the time, he didn’t think deep learning had much potential, and for years he was highly skeptical of neural networks. (To his credit, he’s admitted that that was a mistake.) But If Anyone Builds It, Everyone Dies very much is about deep learning-based neural networks. The authors discuss these systems extensively — and come to the exact same conclusions they always have. The fundamental architecture, training methods and requirements for progress for modern AI systems are all completely different from the technology Yudkowsky imagined in 2008, yet nothing about the core MIRI story has changed. 

We could say — and certainly Yudkowsky and Soares would say — that this isn’t important, because the essential dynamics of superintelligence don’t depend on any particular architecture. But that just raises a different question: why does the rest of the book talk about particular architectures so much? Chapter two, for example, is all about contingent properties of present day AI systems. It focuses on the fact that AIs are grown, not crafted — that is, they emerge through opaque machine learning processes instead of being designed like traditional computer programs. This is used as evidence that we should expect AIs to have strange alien values that we can't control or predict, since the humans who “grow” AIs can’t exactly input ethics or morals by hand. This might seem broadly reasonable — except that this was also Yudkowsky’s conclusion in 2006, when he assumed that AIs would be crafted. Back then, his argument was that during takeoff, when an AI rapidly self-improves into superintelligence, it would undergo a sudden and extreme value shift. Yudkowsky and Soares still believe this argument, or at least Soares did as of 2022. But if this is true, then the techniques used to build older, dumber systems are irrelevant — the risk comes from the fundamental nature of superintelligence, not any specific architecture.

In fact, there are plenty of reasons why the fact that AIs are grown and not crafted might cut against the MIRI argument. For one: The most advanced, generally capable AI systems around today are trained on human-generated text, encoding human values and modes of thought. So far, when these AIs have acted against the interests of humans, the motives haven’t exactly been alien. If sycophantic chatbots tempt users into dependency and even psychosis, it’s for the very comprehensible reason that sycophancy increases engagement, which makes the models more profitable. As an example of AIs growing in alien shapes, the authors cite “Sydney” — the persona adopted by the Microsoft Bing chatbot that famously flew into a jealous rage after failing to break up New York Times journalist Kevin Roose’s marriage. But even if jealous ex-girlfriends were alien and incomprehensible,there’s the inconvenient fact that currently available techniques do a reasonably good job of addressing this problem. ChatGPT currently has 700 million weekly active users, and overtly hostile behavior like Sydney’s is vanishingly rare. 

Yudkowsky and Soares might respond that we shouldn’t expect the techniques that worked on a relatively tiny model from 2023 to scale to more capable, autonomous future systems. I’d actually agree with them. But it is at the very least rhetorically unconvincing to base an argument for future danger on properties of present systems without ever mentioning the well-known fact that present solutions exist. Later in the book, they do say what they actually believe: AI systems in the future will be so radically different from current models that current empirical research is useless. But this puts the authors in a bind. By far the most compelling argument that extraordinarily advanced AIs might exist in the future is that pretty advanced AIs exist right now, and they’re getting more advanced all the time. One can’t write a book arguing for the danger of superintelligence without mentioning this fact. But contemporary pretty advanced AIs and their neat, predictable scaling laws aren’t a part of the MIRI story. Yudkowsky and Soares don’t believe that studying these systems can teach us anything at all.

All this creates the impression that their thesis is selectively permeable to evidence. When facts about the world support their argument, they’re included. When they don't, the authors retreat to the realm of pure reason. Chapter five, for example, is devoted to the argument that AI's alien values will necessarily lead it to kill us all: 

Making a future full of flourishing people is not the best, most efficient way to fulfill strange alien purposes. So it wouldn't happen to do that, any more than we'd happen to ensure that our dwellings always contained a prime number of stones. In a sense, that's all there is to it. We could end the chapter here. But over decades of experience, we have found that this bitter pill is often hard for people to swallow.

Did you catch that? This point is obvious. It is a priori obvious. It is so a priori obvious that the authors visibly resent being called on to defend it. But at least they do respond to counter-arguments. These are: 1) humans might be useful to AI, 2) humans might be trading partners for AI, 3) another variant of humans might be useful to it, 4) we might be good pets, and 5) the AI might not care enough to kill us. I don't want to get into whether their responses are good. In some cases, I think they are , in others, I’m less convinced. What’s more interesting to me is the fact that all of them could have been made — in fact, were made — in 2008. And that, more than anything, is the problem with this book: It is a regression. If Anyone Builds It could have been an explanation for why the MIRI worldview is still relevant nearly two decades later, in a world where we know so much more about AI. Instead, the authors spend all their time shadowboxing against opponents they’ve been bored of for decades, and fail to make their own case in the process.

Yudkowsky and Soares are capable of making a cogent, internally consistent argument for AI doom. They have done so at enormous length. I disagree with it, but I can acknowledge that it makes sense on its own terms. And that’s more than I can say for If Anyone Builds It, Everyone Dies. Like Yudkowsky and Soares, I’m a rationalist. This means that I am not very much fun at parties and in my heart of hearts care less about whether their book successfully raises awareness of AI existential risk than if the arguments are solid. It brings me no joy to report that they are not. If you’re new to all this and want to understand why Eliezer Yudkowsky thinks we’re all going to die, here’s my advice: just read his old blog posts. Yes, they’re long, and the style can be a turnoff, but at least you’ll be getting the real argument. It doesn’t matter that they’re almost 20 years old. The beliefs that formed them are fixed, resilient, with the strength of diamond.

  1. They’re also wrong about alchemists. Real early modern alchemy texts are  painfully specific and process oriented. There is no significant tonal difference between Isaac Newton, physicist, and Isaac Newton, alchemist: less grand theorizing, more lab notes. Sometimes these works sound florid to modern ears, but that's also true of early modern century scientific writing and for that matter Eliezer Yudkowsky.

Published September 2025

Have something to say? Email us at letters@asteriskmag.com.

Further Reading