Contents:
You convene an assembly of 100 engineers and ask them to choose between two different plans for building a bridge. 49 of the engineers tell you plan A is horribly defective and would likely lead to a tragic bridge collapse with numerous fatalities, whereas plan B is good and safe. The other 51 scoff at this; they assure you plan A is a model of sound engineering and plan B is the real deathtrap.
51 is greater than 49, so you implement plan A and celebrate the triumph of democracy. Hooray!
…that’s obviously a crazy way to resolve drastic disagreements, right? So why do we run the government that way, when alternatives like approval voting exist? It drives me nuts.
Every once in a while I find myself thinking about Pascal’s Wager again. (See previously: my review of Gambling on God.) It’s an unpersuasive argument, but a puzzling one: rejecting it seems to require rejecting the principle that we should always try to maximize expected value. I think we should reject that principle, but I don’t have a clear idea of what to replace it with.
The version of the argument I’m interested in goes something like this:
A certain religion claims that nonbelievers will go to hell forever. So the consequences of being a nonbeliever if the religion turns out to be true are infinitely worse than the consequences of being a believer if it turns out to be false. Thus, if there’s even a tiny chance it’s true, you rationally ought to try to make yourself believe it.
I realize this may be quite different from Pascal’s original argument. I think I’ve read, for example, that his goal was more to help motivate people who already saw religion as plausible, whereas this version seeks to convert people who see religion as implausible.
Though I’m speaking of “forever” and infinities, which may involve extra complexities, I think the main puzzle I’m interested in would still apply if we chose arbitrarily-large finite values.
A common but inadequate reply is:
There are multiple, mutually incompatible religions that threaten damnation.
That’s a lazy response because the logical next step would be:
Follow whichever one of those religions is most likely to be true.
This wouldn’t mean you should try to believe the most plausible of all religions. If you wanted to do that, you’d probably choose a religion that doesn’t threaten eternal damnation, since the threat itself is wildly implausible and thus reduces the plausibility of any religion which makes it. Rather, it seems like the logic of the Wager would produce this advice:
Of the religions which threaten infinite punishment for nonbelievers, believe whichever one is the least unlikely to be true.
This brings us to one of the reasons the Wager is psychologically unpersuasive. I think it’s obvious that if someone says, “God told me he’ll torture you forever if you don’t do XYZ”, it’s more likely that they made up the threat in order to get you to do XYZ than it is that God really spoke to them. When you think about the Wager argument your attention is drawn to this dynamic. You become more acutely aware that you’re probably being taken advantage of by someone who predicted that they could exploit the fact that you’re susceptible to Wager-style reasoning (see: Pascal’s Mugging). And when you’re aware of this, even if it would still be rational to accept the risk of being fooled in order to avoid the tiny but more catastrophic risk that the danger is real, it’s just really hard to take that seriously.
Discussions of Pascal’s Wager also often ignore any ethical considerations. It’s just assumed that we should do whatever it takes to avoid hell. But many of us who reject certain (sects of) religions don’t just do so because we think they’re false; we also think some of the doctrines of those sects cause harm. Adopting a belief system that involves what seem to be moral errors, just so I can avoid a tiny chance of something very bad happening to me, feels very selfish. And this brings up another reason that, psychologically, the Wager feels absurd: religions generally promote themselves as being tightly connected to morality, but the Wager asks us to prioritize the selfish reasons for following a religion over any moral reasons for rejecting it.
Appealing to ethics doesn’t necessarily defeat the logic of the Wager though. Defenders could reply:
Whatever harm you may do to someone else in this life pales in comparison to the harm of them going to hell forever. If there’s even a tiny chance that converting them to a certain religion will save them from damnation, then trying to convert them is the best thing you can do for them, even if it’ll be bad for them in the more likely case that the religion isn’t true. And by extension, if believing the religion yourself will help you convert others, you should try to believe.
The Wager—in the form I’m currently discussing—also assumes we should prioritize our own welfare over having true beliefs. It’s addressed to those of us who think religion is probably not true; it tells us to try to believe a thing even though it seems probably false, because it’s in our best interest to do so. But I think we’re often just not very moved by such arguments, or are actively repelled by them. We would, for example, prefer to know that our lovers were cheating on us, even if our lives might be happier if we never found out. To some extent we may tacitly avoid learning information that would put us into conflict with our peers—for example, we might not put any effort into learning the truth about some politically contentious event, because there’s only one belief about it that’s acceptable in our social group—but if we’re confronted with the fact that we’re doing so, we’re unlikely to consciously go out of our way to remain ignorant.
All of this to say, it’s easy to see why the Wager is unpersuasive. But the puzzle remains: how do we rationally justify rejecting it?
Obviously it’s only a puzzle in the first place because we generally accept that both the probability of an event and its potential severity should be factored into decisions about how hard we should try to avoid it. Often this intuition is formalized by saying we should always make the choice with the highest expected value—the choice which, if we multiply our estimate of the probability of each possible outcome times a quantity representing the value of that outcome and add these products together, yields a higher sum than any other choice.
That’s not the only possible principle which could justify accepting the Wager. For example, an extreme form of value lexicality could hold that any reduction in risk of the worst possible kind of event is worth any increase in the risk of less-bad kinds of events. On that view, no calculation of probability times value is involved in the decision; you’re simply always required to reduce the risk of hell as much as possible because that’s inherently a more important goal than anything else.
Thought experiments like Pascal’s Mugging highlight how the always-maximize-expected-value principle has absurd implications in extreme scenarios. But what better rule is there for making decisions under uncertainty?
Maybe the search for a definitive rule about how we should rationally handle uncertainty is quixotic. The desire for such a rule could be seen as just another manifestation of our desire to convince ourselves that we’re more in control than we really are: having admitted (begrudgingly) that we have to make choices without perfect information about the consequences of those choices, we want to at least believe there’s an optimal way to use what information we do have. But how well any given decision-making procedure will pay off depends, ultimately, on facts about reality that we can’t be sure of. Trying to take (e.g.) one-in-a-billion possibilities regarding the ultimate nature of the universe into account will prove wise if we turn out to live in the one-in-a-billion universe, and wasteful if we live in any other universe, and maybe that’s all there is to say about it. Calculating expected values will tell us what we should do to maximize our average payoff across all possibilities, but unless we think there actually are a billion real universes drawn randomly from that pool of possibilities, the significance of average payoff seems debatable.
One reason people sometimes give to be afraid of superintelligent AI is that it might have a superhuman ability to manipulate us. What if it has such a comprehensive understanding of human psychology that it can talk its way out of any constraints and convince us to act in ways that—perhaps without us even suspecting—serve to further its goals and disempower us? I don’t get the impression that Hao would take such a fear very seriously, but there’s an interesting echo of it in her portrayal of Sam Altman as a master manipulator and habitual liar. Her telling of the abortive coup against Altman has a chilling undertone that reminds me vaguely of the sad ending of AI 2027, where the power-seeking superintelligence makes enough mistakes to raise suspicion but the world doesn’t take sufficiently decisive action to interrupt its rise to power while there’s still the chance. (I know very little about Altman and have no idea whether to trust Hao’s portrayal of him.)
One thing the book brings up which I hadn’t really considered before is how the need for increasing quantities of data drives AI companies to compromise on safety concerns. Hao says GPT-2’s training data represented “peak data quality” for OpenAI; the larger volume of training data required for later models meant they had to incorporate increasingly sketchy sources. This is concerning since the data is such a huge factor in how the models behave. And, Hao emphasizes, OpenAI is secretive about their training data; the public cannot analyze it for risks and biases.
Hao covers not just the corporate drama and research progress but also the physical and economic inputs to model development, including the colossal data centers and the impoverished laborers used to provide custom training data. The latter includes workers helping to train content moderation systems, who were subjected not only to the filth of the Internet but also fresh depravities generated by the models themselves.
I’ve generally felt that calls for slowing or pausing AI development are naive—surely if one company or country stops pushing forward, another will quickly take their place. But Hao argues that the initial development of massive LLMs actually required unusual circumstances:
As ChatGPT swept the world by storm in early 2023, a Chinese AI researcher would share with me a clear-eyed analysis that unraveled OpenAI’s inevitability argument. What OpenAI did never could have happened anywhere but Silicon Valley, he said. In China, which rivals the US in AI talent, no team of researchers and engineers, no matter how impressive, would get $1 billion, let alone ten times more, to develop a massively expensive technology without an articulated vision of exactly what it would look like and what it would be good for. Only after ChatGPT’s release did Chinese companies and investors begin funding the development of gargantuan models with gusto, having now seen enough evidence that they could recoup their investments through commercial applications.
…I would come to conclude something even more startling. Not even in Silicon Valley did other companies and investors move until after ChatGPT to funnel unqualified sums into scaling. … It was specifically OpenAI, with its billionaire origins, unique ideological bent, and Altman’s singular drive, network, and fundraising talent, that created a ripe combination for its particular vision to emerge and take over. …everything OpenAI did was the opposite of inevitable; the explosive global costs of its massive deep learning models, and the perilous race it sparked across the industry to scale such models to planetary limits, could only have ever arisen from the one place it actually did. (p. 132)
Even if that’s true, though, I’m not sure what can be learned from it. Maybe others wouldn’t have thought to invest so much in this technology if OpenAI hadn’t demonstrated its potential first, but it also seems unlikely that the public would have taken the technology seriously as a threat before we’d seen precisely that sort of demonstration—at which point the cat was out of the bag.
Adopting a historical perspective can help us appreciate what is so hard to see from the perspective of our own short lifespans: Nature permits disruption. Nature permits calamity. Nature permits the world to never be the same again. (p. 10)
This book changed my perspective: I’m kind of a doomer now, maybe? In theory I already acknowledged there was some small chance of an AI apocalypse, but the reasons for thinking of it as something we should expect to happen unless we get lucky, rather than something that may happen if we’re particularly unlucky, didn’t click for me until now.
I still think the authors come across as very overconfident. Even if their arguments seem to make total sense, arguments which seem to make total sense often turn out to be completely wrong. Even if you couldn’t find specific weaknesses, you’d want to retain a healthy level of skepticism because of how unavoidably speculative the discussion is.
And it’s not difficult to find specific weaknesses in their arguments—to imagine ways that the future might go differently. That’s been my impulse in the past when I talked with AI doomers: I’d object that there might be external factors limiting how quickly AI can self-improve, or there might be radically diminishing returns on increased intelligence, or… But such responses miss the main point. The important question isn’t whether there’s a good chance of us being OK, it’s whether there’s a nontrivial risk of us not being OK.
If you think all of the following have a nontrivial chance of being true:
…then it seems to follow that AI research should be approached with extraordinary levels of caution. (Silicon Valley is not known for its extraordinary levels of caution.)
I think it’s crazy to be anywhere near sure of any of those things, but I also think it’s crazy to dismiss any of them as being extremely low-probability. Humans are a lot smarter than other animals, and it’s pretty speculative to imagine we’re as smart as it’s possible to get. Human intelligence has allowed us to devastate many other species, so it’s reckless to assume more intelligent creatures couldn’t do the same to us. And as the book puts it, AIs are “grown, not crafted”: we have very little understanding of how they work and we create them through a process that’s inherently inimical to rigorous quality control. (Not to mention that even software systems that are “crafted” do things we don’t want them to all the time.)
The book argues that proceeding cautiously wouldn’t be good enough anyway, at least in the near future. Even if we spent vast sums on AI safety research and followed the researchers’ recommendations diligently, we’d still make some mistakes, and any mistake in setting the goals/desires of a being that can think circles around us is very dangerous. The authors want to mobilize people to advocate for a ban.
Chapter 1 says intelligence consists of both “predicting” and “steering” the world, and while we can expect all sorts of intelligent beings to converge on the same answers when making predictions about any given question, we can’t expect them to make the same choices about what outcomes to steer toward. Nothing prevents equally intelligent beings from having radically different and conflicting goals from each other.
That seems clearly true, though it might be in tension with another philosophical position I’m drawn to, moral realism. A moral realist believes there are objectively true answers not just about what will best achieve your goals, but about what goals you ought to have. Is this view compatible with the view that intelligence does not dictate your goals? I think this might depend on how the realist explains our ability to access objective moral truth. If those truths are supposed to be somehow built into the structure of logic, so that thinking in a fully consistent way requires recognizing them, then it seems like we should expect the existence of those truths to drive convergence on certain goals among all intelligent beings.
On the other hand, if moral truths depend on a certain faculty of perception, then you could have superintelligent beings which simply lack that particular faculty, just as they might lack a faculty for hearing or proprioception. I think that’s very tentatively what I believe: the fundamental moral truths that we have access to are happiness is good and suffering is bad, and we know those because they’re sort of embedded in the qualia that we feel; but a being could be intelligent (in the sense of making predictions about the world and guiding the world toward desired outcomes) without being conscious at all and thus without having any access to those normative facts.
I like this line about why the idea of comparative advantage isn’t a good reason to think an AI-dominated economy would have any place for us:
Comparative advantage doesn’t prove that humans can always benefit from “trading” room and board to horses in exchange for labor; if a horse starts costing more to feed than it can produce in labor, the horse is sent off to the glue factory. (p. 87)
Main points:
The book uses the extractive-vs-inclusive lens to look at the histories of many different nations and explain why some have become rich and others poor.
I found it pretty depressing. The overall picture is of a world where a strong gravity pulls human society toward a miserable, exploitative equilibrium, and liberal and prosperous societies are just fortuitous aberrations. (On the bright side, the book does argue that inclusive regimes are fairly sticky when they arise, too.)
I also found it really interesting—it felt like one of the most mind-expanding books I read this year. Unfortunately my knowledge of history is woefully inadequate for assessing how strong its arguments are.
The first volume (see my review) was focused on normative ethics: Parfit evaluated varieties of three ethical systems (Kantianism, contractualism, and consequentialism) by looking at lots of thought experiments about specific ethical dilemmas to see how intuitively plausible each system’s prescriptions are. He wanted to show that if you adjust each system in ways that are necessary for them to be plausible, they all converge toward one unified theory. This was important to him because he worried that disagreement on major ethical issues called into question whether there was any truth of the matter at all.
This second volume is more about metaethics: Parfit is looking at theories about what it means to claim that something is right/wrong/good/bad and what makes such claims true or false. On that question, he doesn’t try to reconcile conflicting views, but rather to defend one view as correct: “non-naturalist cognitivism”, which holds that “normative claims [are] intended or believed to state truths” (not just express mental attitudes), that some such truths actually exist, and that “these truths [are] irreducibly normative” (they are not equivalent to truths about the natural world) (p. 263). I agree with this conclusion and I think Parfit does a good job arguing against alternative views; for example, in section 93 he has a thoughtful discussion of why past scientific discoveries of identities—such as between water and H20—are not grounds for imagining that we will somehow, some day discover identities between normative properties and natural properties.
But Parfit is still concerned with the problem of disagreement: why do many thoughtful people reject non-naturalist cognitivism? I didn’t find what he has to say on this very satisfying. In chapter 30 he puts forward the idea that some philosophers, like Bernard Williams, don’t even understand the concept that Parfit has in mind when he talks about normative reasons, and thus are not in as good a position as he is to judge whether that concept is instantiated. I’m very skeptical of this. I think it’s more likely that they do understand the concept intuitively just as much (or as little) as non-naturalist cognitivists do, but believe that reflecting on it shows it to be an absurd or empty concept.
In chapter 31 Parfit responds to the worry that believing in “[i]rreducibly normative truths”(p. 464) requires believing in weird metaphysical entities. He doesn’t think so; he puts forward what he calls “Non-Metaphysical Cognitivism”:
There are some claims that are irreducibly normative in the reason-involving sense, and are in the strongest sense true. But these truths have no ontological implications. For such claims to be true, these reason-involving properties need not exist either as natural properties in the spatio-temporal world, or in some non-spatio-temporal part of reality. (p. 486)
It’s an interesting discussion but it feels fundamentally negative, i.e., mostly concerned with rejecting arguments for the impossibility of such normative truths—I was left wanting a more positive account of what makes such truths true. I’m also frustrated by the lack of a positive account, in chapter 33, of how humanity came to have its normative beliefs. Parfit argues that they can’t be entirely attributed to natural selection and thus shouldn’t be dismissed as mere useful delusions introduced by evolution, but seems to punt on “how we became [sic] to be able to have these clear beliefs about these necessary truths” (p. 520 - note that at that point he’s talking about “modal” truths like mathematical claims, which he brings up because they have some similarities to normative truths).
Chapter 34 returns to the problem of disagreement in ethics. Parfit argues:
We can reasonably predict or hope that, in ideal conditions, we would nearly all have sufficiently similar moral beliefs. Though there have been many moral disagreements, most of these disagreements do not, I believe, count strongly against this prediction. In most cases, some of the ideal conditions are not met. (p. 552)
It definitely seems like “hope” is a more accurate word here than “predict”. But I think I at least agree that it would be premature to conclude that humanity’s ethical disagreements are fundamentally irreconcilable.
One interesting move Parfit makes in that chapter is to say that we need to control for people’s meta-ethical beliefs when evaluating whether they deeply disagree on ethical questions (see p. 548). I think that makes sense, but shouldn’t Parfit also be concerned about the problem of disagreement on meta-ethics? He does address the latter to some extent, in chapters 30 as well as 35, via the my-opponents-aren’t-even-using-the-same-concepts-as-me argument mentioned above, but I think he leans on that too much.
As always Parfit is very detail-oriented and says lots of interesting stuff, and I might have failed to appreciate some of it fully. I think chapters 31-34 would be worth revisiting some day.
There’s a tangential appendix—Appendix D, “Why Anything? Why This?”—about how to explain the existence and nature of the universe which I thought was actually one of the most interesting parts of the book. Parfit thinks the chain of explanation must ultimately end in a brute fact, though it may be a brute fact about what principle determines the rest of reality (e.g. that the simplest possibility becomes actual, or the best possibility becomes actual, or…).
I absolutely loved the original Hades, and this sequel lives up to the legacy. I’m a sucker for roguelikes in general, though I think my engagement with them tends to follow a less-than-totally-healthy arc: initially they’re fun and addictive, but the fun slowly declines after I’ve put in a lot of hours, while the addictiveness remains, so it transitions into a sort of unsatisfying compulsion. I want to cut the cycle short with this one: I got the main(?) ending and I’m trying to resist playing any more, even though there’s a bunch of stuff I still haven’t unlocked.
...I lasted less than a week before giving in and playing it some more.
I vaguely remember trying a Telltale game once and being immediately bored. But I decided, under the influence of the advertising-industrial complex, to give the genre a second chance with Dispatch. And… I really liked it. (Turning off quick-time events probably helped. I find those really tedious.)
Often it’s more like watching a show than playing a game, but it’s a well-done show—though the interactive aspects make me more forgiving of some cliches than I might otherwise be. And the hacking and team-management minigames are enjoyable.
As I read this I was frequently thinking of a snide comment from an old Paul Graham essay: “stories in which nothing happened except that someone was unhappy in a way that seemed deep.” The melancholy tone of so much short literary fiction just gets a bit tedious sometimes. In fairness, life is also tedious and suffused with deep unhappiness, and I can’t fault artists for wanting to reflect life.
Anyway, I did actually enjoy most of the stories in this, I just didn’t love them. Here are some of the bits I found most memorable:
I’d quibble with the subtitle: it says “a novel in ghost stories”, but it’s less novel and more loosely-connected-stories-in-a-shared-universe. I’d also quibble with the blurbs using adjectives like “bone-chilling” or “terrifying”—these stories aren’t scary at all. I might go with bittersweet; sad in a quiet and hopeful way. (Perhaps less hopeful if you think too much about how the human actions which cause misery in these stories are perfectly real and ordinary, while the healing and justice come from imaginary supernatural forces. These are ghost stories where you root for the ghosts.) In any event, I enjoyed the book.
Since I don’t know much about Korea it was interesting to get some glimpses into what issues are on a Korean author’s mind, like this:
As with many murders where the reason is stated to be “She refused to see me,” the incident was considered compulsive and unpremeditated in a court of law or by law enforcement. The history of such compulsive and unpremeditated “She refused to see me” murders is long and varied, with recent examples including a man going to the house of the woman he considered his possession and threatening her with a weapon (April 2021), detonating a homemade bomb (October 2020), or killing the woman and all of her family (too many to cite). (p. 152 - I interpreted these as references to specific real cases, but haven’t been able to track them down so I may be misunderstanding)
I also didn’t know that debt passes legally to one’s heirs in South Korea. This seems horrifying and deeply unjust to me, but apparently you have a 3-month period after learning about the inheritance in which you can choose to renounce it.
She was looking at everything. Analyzing it. Processing it. Her life wasn’t just how things were anymore—it was only one possible way for things to be. And there were other possibilities, now. She had ceased to be trapped here, in this stuffy room. Ceased to be trapped inside her life, yes, but she had also ceased to be able to live her life without analyzing it. … She could not just be anymore. (p. 96)
Cool premise: an expert on elephants is put into a mammoth body to help the deextincted herd redevelop mammoth culture. Good story that doesn’t overstay its welcome.
I kinda get the impression that Bob thinks being a Star Trek fan and a space geek makes him qualified to run the universe. And I kinda suspect the author is sympathetic with Bob on that point. Teenage-me would have identified with Bob real hard; adult-me was cringing a lot. But if you can tame the urge to constantly roll your eyes at the protagonist, this is a fun and blissfully low-tension adventure. I think I’ll read the sequels at some point.
Sanderson novels are like popcorn, but, like, the special chocolate popcorn from that one theater that everyone in your town thinks of as the place with the really good popcorn. Compared to the others I’ve read (Mistborn 1-3, Way of Kings, Tress), I think this one is (just) a bit weaker overall but might have the most interesting magic system: people gain powers by collecting large numbers of “breaths”, which can theoretically be accumulated by anyone. But you’re only born with one, and the only way to get more is for someone else to voluntarily give you theirs. This system isn’t necessarily explored very deeply but I did enjoy how the inability to make partial transfers set things up for Lightsong’s dramatic self-sacrifice at the climax.
The obvious goal of this story is to inspire you to live in the moment, live life to the fullest, etc., but I found it more cliche than inspiring.
Also, a minor complaint: I don’t appreciate the positive portrayal of violence near the end. The protagonist grabs an annoying character by the throat, and onlookers cheer. Though he apologizes for this action, the point of the scene still seems to be that he’s finally gotten out of his funk and is engaging with life with laudable intensity. What it actually shows is that he’s dangerously unstable.
Jacob Williams • 2025-12-14 • brokensandals.net
feedback: jacob@brokensandals.net