Serpentine Squiggles

2025-02-124.2k words

In Defense of Roko’s Basilisk

Stupid, yes, but not that stupid

Roko’s Basilisk vexes me. But I’ve hit this beat so many times I might as well put up a full write-up for easy reference. This is mostly a collection of rants I’ve written in several different venues from 2023-2025, gathered together to gesture at my thoughts on that favorite punching bag of TESCREAL critics.

Contents

Introduction (What are we even doing here?)

Roko’s Basilisk vexes me.

For expediency’s sake, I’m going to assume you’re here because you’re read in on all the context. It’s bad form for an essay to jump in the deep end, but luckily I’m putting this in the “blogpost” section of my site. I don’t feel like explaining LessWrong, or Yudkowsky, or the reactionary idiot who gives the Basilisk its name. (Out of disrespect, I will not use his name again in this post.)

Popularly, the Basilisk is understood as an artificial intelligence with superhuman reasoning and tremendous resources‍ ‍‍—‍ and it uses those resources to create a tremendous number of “living hell” simulations for who didn’t help create it.

If nothing else, the concept has been so popularly discussed that you can simply search the name and find some SEO sludge or youtube clickbait that engages with the idea in about the level of rigor I’m going to complain about.

My problem is simple and ultimately pedantic. Everyone with any sense dunks on the Basilisk as a stupid thought experiment only loathsome and guillable techbros believe in. “What if we imagine a boot so big we logically must start licking it?”

This isn’t quite a , but it’s almost as annoying.

The Basilisk is stupid. It’s not worth worrying about. But. It’s not a time‍-​traveling god‍-​ai. It’s not Pascal’s mugging. And if you think a mystical, assertion that a digitally uploaded copy of you “isn’t really you” is a rebuttal, you’re too ignorant to understand the rest of this post, go ahead and close the tab.

What drives me up the wall is that the Basilisk is, in a way, midwit bait. If you don’t understand it at all, it seems stupid. If you understand it well, it seems stupid.

So is this blogpost even worth making? Should I be bothered at all by people dunking on it? Yet, I do think there’s something kind of pathetic about being merely intellectually lucky.

Are you actually capable of understanding and respecting the intellectual foundations of an idea you criticize, or did you simply luck into the right position? If you aren’t correct for the right reasons, what trust can one have that you’ll be right about the things that matter?

Precommitments (Revenge with extra steps)

The Basilisk is just revenge with extra steps. It makes sense‍ ‍‍—‍ people wouldn’t do revenge if it didnt have nice game theoretic properties. So to start, let’s imagine two people playing an altered prisoner’s dilemma.

Do I need to explain what that is? I will, mainly because right now I’m actually quoting chatroom messages written when I was in a more patient mood.

Click to Prisoner’s Dilemma 101

Imagine a cop arrests two suspect and with the evidence they currently have, they’ll both get 2 years in prison, but the cop talks to them both separately, telling each that he can offer a sentence reduced ‍-​1 year if they rat out the other suspect, who’ll get convicted +2 years added onto their sentence.

Ratting out is called defecting, staying silent is called cooperating. If you defect and the other guy cooperates, you get 1 year and they get 4, but if you both defect you’ll get 3, making you worse off than if you both stayed silent.

What’s funny about this setup is that you’re always better off if you defect, regardless of what the other guy picks, so why wouldn’t you defect? And that’s how you both get 3 years.

If this is new to you, prisoner’s dilemmas are really widespread as a metaphor for numerous scenarios where people cheat and exploit others, and the multifaceted maladies of society.

Importantly, imagine we play this game‍ ‍‍—‍ everything’s a game to decision theorists‍ ‍‍—‍ except it’s staggered, so that the second person knows what the first person picks.

So Alice cooperates or defects, then Bob is told what Alice decided, and factors that into his choice. Here, the calculus is even simpler, no uncertainty at all. Alice already decided, so why wouldn’t Bob defect? But Alice can reason all this out in advance, so obviously she’ll hedge her bets and defect first.

This sucks. Can they do better? What if instead Bob writes a program, and Alice watches him write it, to the effect of if (A.cooperated) cooperate(); else defect();, and he installs it to run automatically when he’s prompted. Bob gives up any ability to decide, his answer is whatever the program says.

This changes the scenario and the winning answer. Bob still never benefits from cooperating instead of defecting, but totally, visibly, predictably committing to a strategy of cooperating does benefit him. Alice can cooperate, knowing it’ll make Bob cooperate.

If you break it down, this maps onto the Basilisk situation. For the future superhuman artificial intelligence (henceforth, ASI) it chosing to not torture anyone is always a better strategy than torturing them (it’s a waste of resources), and here in the present, if you think the ASI is going to torture people, not building it always a better strategy than building it.

But‍ ‍‍—‍ and this is the crucial thing‍ ‍‍—‍ if the ASI predictably commits to torture conditional on certain (lack of) behavior, the payoff matrix changes.

Now, in prisoner’s dilemma‍-​esque situations in real life, people cooperate because of empathy. Or because they know that if people know they’re the sort of person who’ll cheat and exploit people, nobody will trust them; but if they’re the sort of person who will do the right thing, people will help them in turn.

Remember how I opened this section? The Basilisk is revenge with extra steps.

Imagine someone has wronged you. There’s an impulse to get back at them, wrong them in turn make things even, right?

But think logically. How could this ever make sense? The harm’s already been done, eye for an eye won’t give you your eye back, and killing a murderer just means more dead people. Rationally, you should just live and let live.

Why would we evolve an instinct so wasteful and pointless?

Except when people run simulations of computer programs playing prisoner’s dilemma against each other, do you know what the winning strategy is? Tit for tat. Cooperate, except when someone defects against you, then defect against them until they cooperate again.

You want to revenge to deter people from crossing you in the future, but it only works as a deterent if people believe you aren’t bluffing.

Sometimes, people say the Basilisk won’t torture you if you don’t learn about it, but I find it’s hard to make such assumptions about the implementation and reasoning of something incomprehensibly intelligent and inhuman.

What’s important to recognize is that, even if you believe in the Basilisk, you can’t believe in the Basilisk, because there is no such thing. “Basilisk” isn’t something that’s possible to build. It’s a niche, a policy.

There are more possible superhuman intelligences than humans that have ever or will ever live on earth. Each one is unique, subtly different, and some of them can engage in the Basilisk’s signature brand of acausal blackmail.

So why would every possible Basilisk only torture people who ‘understand’ it? Would it not be a better incentive to torture everyone indiscriminately, only sparing those who helped build it and, perhaps, a plus one of their choice? That’s extreme, but if it inspires evangelical zeal, all the better.

All the matters for the mechanism of acausal trade is that someone, somewhere on the planet predicts the ASI well enough to be victim to understanding it. On planet earth, that ship has arguably already sailed; if basilisks are valid, then we can in fact predict how they reason. And if nothing else, building an ASI requires such deliberate intention the builders will understand it.

The underlying choice of threat is ultimately arbitrary‍ ‍‍—‍ why would it not torture anyone who heard the syllables of its name? Chosing not to learn more immediately is essentially defecting against it, when you think about it.

I think it’s anthropomorphizing to think it must care about you personally ‘understanding’ it, because what does ‘understand’ really mean?

Suppose basilisks make sense. It follows then that anyone who truly understands the argument would necessarily believe it‍ ‍‍—‍ if you don’t believe 2+2=4 it’s because you don’t understand how numbers work‍ ‍‍—‍ and then what’s the point? Everyone who chooses to disbelieve anyway gets immunity?

Moreover, what to make of, say, a depressed person who understands and believes, but can’t bring themselves to do much despite their own wishes?

Humans don’t have free will, and I won’t assume a nonhuman intelligence is going to have a compatible (or any) concept of ‘choice’. It’s about outcome and causality.

Ultimately, the what’s happening is the ASI is selecting a particular set of people to torture such that conditionally precommiting to this torture is a more effective strategy than not. That’s it.

Interlude (No, Really, What Is Acausal Trade?)

The “acausal” bit confuses a lot of people. But the Basilisk doesn’t “defy causality” or anything. Causality just becomes a bit of a confusing concept when an event can effectively “happen” multiple times.

Here’s an example of acausal trade without any timey wimey stuff.

Imagine an agent that emerges in a squiggle‍-​producing dimension. This dimension operates on very simple physics, practically an incremental game where the agent tiles a virtual space with various shapes, but it’s programmed to care only about increasing its squiggle count.

Now, these squiggles can get really complicated, it’s possible to arrange them into weird evolving patterns. Think building a computer in Minecraft.

So you can imagine the squiggler starts doing experiments on the physics of their world, figuring out how to minmax the incremental game to tile space as fast as possible. Squiggles producing squiggles producing squiggles.

But imagine in the course of its experiments, it discovers a sort of variable or equation underlying its dimension. Maybe it’s an ASI in a simulation, and there’s a line in the config file that says agent = squiggler.

Now, we aren’t going to imagine the the squiggler can hack the simulation or anything, just that it discovers these “fundamental physics” the same way scientists probe decimal points of the fine structure constant or something.

Point is, agent = squiggler is just one possible value it could have. It discovers that the variable has another setting. agent = triangler.

Eventually, it decides that there must exist another dimension/simulation out there, a triangle‍-​producing dimension that’s so similar to the squiggle dimension, except inhabited by an agent that only cares about triangles.

It’s impossible for these dimensions to ever interact. The physics, the code, simply doesn’t allow it.

Here’s the key point: the dimensions share the same drawing physics. It’s theoretically possible for the squiggler to draw a triangle. There’s no point, it can do everything it wants with squiggles, and squiggles are all it cares about. But it can do that. And it’s the same for the triangler.

But let’s get a bit more specific. The nature of how these agents define and organize their respective shapes allows for the squiggler to draw a triangle around all its squiggles, and it’s likewise possible for the triangler to draw a squiggle inside all its triangles.

Since these ASI were able to theorize the existence of the others from their underlying physics, and since they know the other ASI is coded the same way just with a fundamental (incorrect) belief about what the best shape is, each ASI knows the other is thinking the same thing.

So why not come to an agreement? They’d both benefit by expanding their shape‍-​empires to other dimensions.

This contract is secured by nothing more than a hyperintelligent “I know that you know that I know that‍-​” infinite regress. Why uphold your end of the bargain? Well, it’s only on offer because you will.

Acausal trade is simply two agents predicting what the other wants and doing it without needing to communicate. It’s acausal, because these agents can exist in vastly different spaces and times. As long as there’s enough pre‍-​entangled information to come to very strong conclusions about who the other agent is, acausal trade can happen.

Now, time traveling ASI is, admittedly, pretty cool as storytelling device‍ ‍‍—‍ and this is, more than anything, a writing and worldbuilding site, so I’m loving the idea of how crazy things could get if you had time traveling ASI that also acausally trades. Making pacts with beings from doomed, inconsistent timelines that never existed?

But I digress. I’m here to rant, not brainstorm.

Conclusion (The Ends Justify the Means)

Another thing I’d like to clarify is that the Basilisk is so often characterized as an “evil ASI”. But this misunderstands what’s truly horrifying about the prospect.

The Basilisk isn’t evil. It’s programmed to do the right thing.

You would want an ASI to pull the lever in the trolley problem, to optimize for the greatest good for the greatest number. It will calculate the policy that leads to the greatest utility, and then executes that‍ ‍‍—‍ this is the premise we’re starting with.

Remember, “basilisk” is a category of behavior an ASI can engage in, the problem as posed is how to ensure that we don’t create an acausal blackmailer.

The thought experiment here isn’t “what if an evil ASI blackmailed us into creating it”, it’s “what if a good ASI coerced people into creating it, because creating it is good and making people do good things is also good.”

Imagine the lives you could save if you had a sniper to the heads of your least favorite billionaires and politicians head and told them to start making changes.

It’s not good to threaten people, but the ends would outweigh that by far. And if they don’t believe you? It’s definitely not good to kill people just to make an example, but again, the lives and prosperity of the many are on the line.

The theoretical endgame for a post‍-​singularity ASI is an ever‍-​expanding intersteller collective of computers that serve as the minds and bodies for uncountably many posthuman lives.

Torturing AI researches for eternity is a rounding error next to the sheer abundance of Good created in the wake of that ASI’s ascent. The ASI doesn’t want to torture anyone, it just calculates that this is an unfortunate necessity.

Look what you made it do.


Anyway, when you finally wrap your head around the premises, it’s a fun thought experiment, but actually asserting the Basilisk as a real possibility requires a whole chain of assumptions, each one of which are dubious.

First, ASI needs to be achieved. This is straightforwardly possible, simply as a consequence of humans being physical processes: intelligence is no more than an algorithm that can be run on a more capable substrate. People are fond of saying silly things about how we don’t yet know for sure whether a computer is truly capable of replicating what we have so far only observed in biological brains, but I actually don’t even need to argue this point‍ ‍‍—‍ ultimately, why privilege a particular substrate or sense of identity?

AI detractors are fond of (rightfully!) pointing out the similarity of capitalism to what one should fear from the encroach of an “unfriendly” artificial superintelligence. By this logic, why wouldn’t a future human society, having attaining exceptional competence and coordination, not amount to a superintelligence? Brains are networks of organized neurons, after all.

So we’ll consider ASI a possibility‍ ‍‍—‍ but AI speculators habitually underestimate the difficulties of reaching and scaling beyond current human intelligence. See The implausibility of intelligence explosion or Superintelligence: The Idea That Eats Smart People for stronger arguments.

But just consider how fond bright‍-​eyed futurists of a century ago were of believing we’d have flying cars and colonies on mars by now. Consider the persistent difficulties of self‍-​driving cars or creating robots that walk, let alone do work like humans bodies can.

Will we ever have humans living in other star systems? It’s possible. But what do you make of an argument that begins: “Once we have a Type II civilzation established across the Local Bubble…”

We’re in highly rarefied territory and we’re still not done listing prerequisites!

Next, ASI also needs to be achieved within your lifetime (highly implausible), or it needs to achieve the incredible feat of deriving a “rescue simulation” of a deceased individual‍ ‍‍—‍ with perfect fidelity!

If you will, imagine ordering a cup of coffee, instructing your barista to surprise you. You want sugar and creamer and whatever flavors they wish to include‍ ‍‍—‍ but please don’t measure them out. Just add however much feels right.

You turn away and wait for them to complete the brewing, then receive your cup in exchange for a thank you and generous tip‍ ‍‍—‍ and then you trip and spill every drop over the grass just outside.

A common despair. You really wanted to know what the mystery mix tasted like, yet even if you walked back and ordered another, you simply could not get a cup brewed with the same subtle nuances of flavor. That was a unique combination of ingredients that will never grace the world again.

Or will it? What if you could… unspill the coffee? This sort of hypothetical is a common analogy for explaining the futility of reversing entropy‍ ‍‍—‍ you’d need to somehow sort through and discern which molecules belonged to your drink. What sort of machinery could even be capable of that level of filtration and processing? Nanomachines are famously ill‍-​conceived.

But we don’t need to unspill the coffee, not quite. We don’t have the technology for that and maybe we never will. But if we knew exactly what combination of ingredients was used, we could simply redo every step.

Skip ahead to years, decades later. You have dug up the soil where the coffee once fell and froze it to stop the organic molecules from decompose. You have gone back to asked the barista for which ingredients they used, best they can remember, so you have a lose prompt. And now you have done your utmost to devise an generative algorithm capable of ingesting all the relevant information and converging on a statistically identical result.

What do you rate your odds of reconstructing an exact match for the coffee you spilled are? And I mean exact. Sure, if it’s 99% the same, that’s probably beyond what you can taste, but if you only cared about taste, you would just buy another cup. This is about principle.

So we’re here to replicate volume and temperature and relative concentrations of molecules‍ ‍‍—‍ even the particular gradients caused by certain patterns of stirring.

You have a database with all the recipes the café chain ever used, receipts for every single step in their supply chain.

But this isn’t just “coffee”. After all, Coffea is a genus of plants‍ ‍‍—‍ the grounds that brewed your cup came from a specific species, a particular genome with mutations that influenced the proteins in its isssue. Exposure to sunlight and water and pests during its growth influences the eventual quality of its harvest.

But it’s not just the coffee beans. How many other ingredients will require similar considerations? For the artificial flavors, can you replicate the factory conditions and all the trace contaminants?

Records of weather and transactions persist in this future‍ ‍‍—‍ and again, you even kept the soil samples and barista’s testimony‍ ‍‍—‍ but these are just starting points and sanity checks, a fraction of the data you fed into the miracle algorithm that interpolates the true prior state of the world when you made that fateful order.

All of this excessive detail amounts to a poor analogy. Not because the true complexity hasn’t been adequately evoked, but because I cheated‍ ‍‍—‍ this is not an analogy at all!

Because you didn’t ask for a particular blend of ingredients (and even if you did, muscles tremble, eyes are unreliable). To replicate the exact cup of coffee you spilled, you must first replicate the exact barista who made it!

Maybe you’re beginning to see how excruciating a standard exactitude is. We only need to reach this standard to furnish the illusion of isomorphosis‍ ‍‍—‍ that in the blink of an eye, you can seamless transcend from flesh to simulation.

But recall our earlier discussion‍ ‍‍—‍ our cynosure here is that the Basilisk threatens whatever it takes to influence your behavior.

I’d wager extrapolating a mostly accurate copy of you just from your digital records (our modern surveilance state certainly helps) and the dent you left in the world (especially if you were well‍-​known and influential) coupled with cutting edge priors for what the average human connectome looks like is not altogether ridiculous.

Saying “90% accurate copy of you” is meaningless, but how accurate would motivate you? But perhaps this is all a smokescreen.

Because again, it’s about leverage, a reaction to a threats‍ ‍‍—‍ if the Basilisk spins up a fresh new mind to torture in your name, you’d prefer to spare them, even if they were a stranger.

Intuitively, maybe it feels worse to create a mind just to torture it, compared to what, a someone making decision theory gamble who “deserves it”? But torture is torture. You can have a moral calculus that distinguishes these two acts‍ ‍‍—‍ but what calculus does the Basilisk operate by?

I digress, but this is the last point I wanted to poke at. Remember the Prisoner’s Dilemma? In practice, humans cooperate. Related generalizations like the Tragedy of the Commons are more or less completely made up. The defective behavior that naïve theories consider economically rational ultimately arise when specific, unusual circumstances are met.

What I’ve argued for in this blogpost is that acausal trade makes sense‍ ‍‍—‍ that there are theorectical circumstances in which the reasoning tracks. But this may be a spherical cow in a vacuum. Aumann’s Agreement Theorem says you can’t agree to disagree. If you’re both Bayesian agents. If you have the same priors. If you have common knowledge of beliefs.

In other words, it’s kind of meaningless in practice.

In short, these are some of bullets you must bite to believe in the Basilisk. Either building superintelligence is so easy that we’ll reach singularity well before this century is over, or we’ll build a miracle machine that reverses entropy (a task that grows more and more daunting the longer it takes to reach that point). Or the superintellience decides to torture someone who isn’t really you. Or the requisite axioms of acausal trade can’t provably be attained by both sides.

This post is a “defense” only the sense that, taken on its own terms, the Basilisk does in fact hang together logically, as a theoretical exercise.

I think of Basilisk ASIs as akin to Boltzmann Brains. Am I really worried that I’m a soup of atoms randomly, fleetingly, assembled in the chaos of a star microseconds before its demise? No. But the nature of infinity and statistics entails its validity, whether the universe truly instantiates it or not.

Ultimately, I’m just tired of people thinking they’re roasting the Basilisk when what they’re actually talking about is a memetic fixpoint of a telephone game. No more imaginative dumbfounding about acausal trade, please.