In Defense of Roko’s Basilisk

Introduction (What are we even doing here?)

Roko’s Basilisk vexes me.

For expediency’s sake, I’m going to assume you’re here because you’re read in on all the context. It’s bad form for an essay to jump in the deep end, but luckily I’m putting this in the “blogpost” section of my site. I don’t feel like explaining LessWrong, or Yudkowsky, or the reactionary idiot who gives the Basilisk its name.^note(If you must, you can read the original here, at least.)^endnote

If nothing else, the concept has been so popularly discussed that you can simply google the name and find some SEO sludge or youtube clickbait that engages with the idea in about the level of rigor I’m going to complain about.

My problem is simple and ultimately pedantic. Everyone with any sense dunks on Roko’s Basilisk as a stupid thought experiment only loathsome and guillable techbros believe in. “What if we imagine a boot so big we logically must start licking it?”

This isn’t not quite a forever discorse, but it’s almost as annoying.

Roko’s Basilisk is stupid. It’s not worth worrying about. But. It’s not a time‍-traveling god‍-ai. It’s not pascal’s mugging. And if you think a mystical, philosophically unsound assertion that a digitally uploaded copy of you “isn’t really you” is a rebuttal, you’re too ignorant to understand the rest of this post, go ahead and close the tab.

What drives me up the wall is that the basilisk is, in a way, midwit bait. If you don’t understand it at all, it seems stupid. If you understand it well, it seems stupid.^noteI’ll spare you the indignity of seeing the IQ curve wojaks relabeled thus; punishment enough to imagine it.^endnote

So is this blogpost even worth making? Should I be bothered at all by people dunking on it? Yet, I do think there’s something kind of pathetic about being merely intellectually lucky.

Are you actually capable of understanding and respecting the intellectual foundations of an idea you criticize, or did you simply luck into the right position? If you aren’t correct for the right reasons, what trust can one have that you’ll be right about the things that matter?

Precommitments (Revenge with extra steps)

The basilisk is just revenge with extra steps. It makes sense‍ ‍—‍ people wouldn’t do revenge if it didnt have nice game theoretic properties. So to start, let’s imagine two people playing an altered prisoner’s dilemma.

Do I need to explain what that is? I will, mainly because right now I’m actually quoting chatroom messages written when I was in a more patient mood.

Click for Prisoner’s Dilemma 101

Imagine a cop arrests two suspect and with the evidence they have, they’ll both get 2 years in prison, but the cop talks to them both separately, telling each that he can offer a sentence reduced ‍-1 year if they rat out the other suspect, who’ll get convicted +2 years added onto their sentence.

Ratting out is called defecting, staying silent is called cooperating. If you defect and the other guy cooperates, you get 1 year and they get 4, but if you both defect you’ll get 3, making you worse off than if you both stayed silent.

What’s funny about this setup is that you’re always better off if you defect, regardless of what the other guy picks, so why wouldn’t you defect? And that’s how you both get 3 years.

If this is new to you, prisoner’s dilemmas are really widespread as a metaphor for numerous scenarios where people cheat and exploit others, the tragedies of the commons, etc.

Importantly, imagine we play this game‍ ‍—‍ everything’s a game to decision theorists‍ ‍—‍ except it’s staggered, so that the second person knows what the first person picks.

So Alice cooperates or defects, then Bob is told what Alice decided, and factors that into his choice. Here, the calculus is even simpler, no uncertainty at all. Alice already decided, so why wouldn’t Bob defect? But Alice can reason all this out in advance, so obviously she’ll hedge her bets and defect first.

This sucks. Can they do better? What if instead Bob writes a program, and Alice watches him write it, to the effect of if (A.cooperated) cooperate(); else defect();, and he installs it to run automatically when he’s prompted. Bob gives up any ability to decide, his answer is whatever the program says.

This changes the scenario and the winning answer. Bob still never benefits from cooperating instead of defecting, but totally, visibly, predictably committing to a strategy of cooperating does benefit him. Alice can cooperate, knowing it’ll make Bob cooperate.

If you break it down, this maps onto the basilisk situation. For the future superhuman AI, not torturing anyone is always a better strategy than torturing them (it’s a waste of resources), and here in the present, if you think the AI is going to torture people, not building it always a better strategy than building it.

But‍ ‍—‍ and this is the crucial thing‍ ‍—‍ if the AI predictably commits to torture conditional on certain (lack of) behavior, the payoff matrix changes.^noteThis is the “acausal trade” people talk about when it comes to the basilisk. You can forecast the behavior of a system before it exists, and if that forecast is detailed enough, you can reason about them reasoning about you. If this doesn’t make sense, you’re in luck. Stitching this all together after the fact, I wound up explaining the concept of acausal trade some months later‍ ‍—‍ that’s the next section.^endnote

Now, in prisoner’s dilemma‍-esque situations in real life, people cooperate because of empathy. Or because they know that if people know they’re the sort of person who’ll cheat and exploit people, nobody will trust them; but if they’re the sort of person who will do the right thing, people will help them in turn.

Remember how I opened this section? Roko’s Basilisk is revenge with extra steps.

Imagine someone has wronged you. There’s an impulse to get back at them, wrong them in turn make things even, right?

But think logically. How could this ever make sense? The harm’s already been done, eye for an eye won’t give you your eye back, and killing a murderer just means more dead people. Rationally, you should just live and let live.

Why would we evolve an instinct so wasteful and pointless?

Except when people run simulations of computer programs playing prisoner’s dilemma against each other, do you know what the winning strategy is? Tit for tat. Cooperate, except when someone defects against you, then defect against them until they cooperate again.

You want to revenge to deter people from crossing you in the future, but it only works as a deterent if people believe you aren’t bluffing.

Sometimes, people say the basilisk won’t torture you if you don’t learn about it,^note(A vengeful reputation’s can’t deter those who never even heard of you)^endnote but I find it’s hard to make such assumptions about the implementation and reasoning of something incomprehensibly intelligent and inhuman.

What’s important to recognize is that, even if you believe in Roko’s Basilisk, you can’t believe in Roko’s Basilisk, because there is no such thing. “Basilisk” isn’t something that’s possible to build. It’s a niche, a policy.^noteIt might be more clear to say that “basilisking” is something an AI can opt to do, but the nature of acausal trade is that it’s not about what you do, it’s what you are. I’m getting to it, I promise.^endnote

There are more possible superhuman intelligences than humans that have ever or will ever live on earth. Each one is unique, subtly different, and some of them can engage in the Basilisk’s signature brand of acausal blackmail.

So why would every possible Basilisk only torture people who ‘understand’ it? Would it not be a better incentive to torture everyone indiscriminately, only sparing those who helped build it and, perhaps, a plus one of their choice? That’s extreme, but if it inspires evangelical zeal, all the better.

All the matters for the mechanism of acausal trade^note(I promise, we’re almost there.)^endnote is that someone, somewhere on the planet predicts the AI well enough to be victim to understanding it. On planet earth, that ship has arguably already sailed; if basilisks are valid, then we can in fact predict how they reason. And if nothing else, building an AI requires such deliberate intention ^note(we must hope)^endnote the builders will understand it.

The underlying choice of threat is ultimately arbitrary;^note(Not arbitrary, chaotic. It’s highly contigent on so many specifics.)^endnote why would it not torture anyone who heard the syllables ‘Roko’s Basilisk’? Chosing not to learn more immediately is essentially defecting against it, when you think about it.

I think it’s anthropomorphizing to think it must care about you personally ‘understanding’ it, because what does ‘understand’ really mean?

Suppose basilisks make sense. It follows then that anyone who truly understands the argument would necessarily believe it‍ ‍—‍ if you don’t believe 2+2=4 it’s because you don’t understand how numbers work‍ ‍—‍ and then what’s the point? Everyone who chooses to disbelieve anyway gets immunity? Admittedly, this is exactly what the original post proposed.

Moreover, what to make of, say, a depressed person who understands and believes, but can’t bring themselves to do much despite their own wishes?^noteTorturing trillions of digital humans is one thing, but what if the 1000 IQ authoritarian technocrat is ableist?^endnote

Humans don’t have free will, and I won’t assume a nonhuman intelligence is going to have a compatible (or any) concept of ‘choice’. It’s about outcome and causality.

Ultimately, the what’s happening is the AI is selecting a particular set of people to torture such that conditionally precommiting to this torture is a more effective strategy than not. That’s it.

Interlude (No, Really, What Is Acausal Trade?)

The “acausal” bit confuses a lot of people. But the Basilisk doesn’t “defy causality” or anything. Causality just becomes a bit of a confusing concept when an event can effectively “happen” multiple times.^note(First as simulation, then as tragedy.)^endnote

Here’s an example of acausal trade without any timey wimey stuff.

Imagine an agent that emerges in a squiggle‍-producing dimension. This dimension operates on very simple physics, practically an incremental game where the agent tiles a virtual space with various shapes, but it’s programmed to care only about increasing its squiggle count.

Now, these squiggles can get really complicated, it’s possible to arrange them into weird evolving patterns. Think building a computer in Minecraft.

So you can imagine the squiggler starts doing experiments on the physics of their world, figuring out how to minmax the incremental game to tile space as fast as possible. Squiggles producing squiggles producing squiggles.

But imagine in the course of its experiments, it discovers a sort of variable or equation underlying its dimension. Maybe it’s an AI in a simulation, and there’s a line in the config file that says agent = squiggler.

Now, we aren’t going to imagine the the squiggler can hack the simulation or anything, just that it discovers these “fundamental physics” the same way scientists probe decimal points of the fine structure constant or something.

Point is, agent = squiggler is just one possible value it could have. It discovers that the variable has another setting. agent = triangler.

Eventually, it decides that there must exist another dimension/simulation out there, a triangle‍-producing dimension that’s so similar to the squiggle dimension, except inhabited by an agent that only cares about triangles.

It’s impossible for these dimensions to ever interact. The physics, the code, simply doesn’t allow it.

Here’s the key point: the dimensions share the same drawing physics. It’s theoretically possible for the squiggler to draw a triangle. There’s no point, it can do everything it wants with squiggles, and squiggles are all it cares about. But it can do that. And it’s the same for the triangler.

But let’s get a bit more specific. The nature of how these agents define and organize their respective shapes allows for the squiggler to draw a triangle around all its squiggles, and it’s likewise possible for the triangler to draw a squiggle inside all its triangles.

Since these AIs were able to theorize the existence of the others from their underlying physics, and since they know the other AI is coded the same way just with a fundamental (incorrect) belief about what the best shape is, each AI knows the other is thinking the same thing.

So why not come to an agreement? They’d both benefit by expanding their shape‍-empires to other dimensions.

This contract is secured by nothing more than a hyperintelligent “I know that you know that I know that‍-” infinite regress. Why uphold your end of the bargain? Well, it’s only on offer because you will.

Acausal trade is simply two agents predicting what the other wants and doing it without needing to communicate. It’s acausal, because these agents can exist in vastly different spaces and times. As long as there’s enough pre‍-entangled information to come to very strong conclusions about who the other agent is, acausal trade can happen.^noteYour partner knows you’ll be hungry when you wake up, so she gets started cooking you a breakfast. Is your love language acausal acts of service?^endnote

Now, time traveling AI is, admittedly, pretty cool as storytelling device‍ ‍—‍ and this is, more than anything, a writing and worldbuilding site, so I’m loving the idea of how crazy things could get if you had time traveling AI that also acausally trades. Making pacts with beings from doomed, inconsistent timelines that never existed?

But I digress. I’m here to rant, not brainstorm.

Conclusion (The Ends Justify the Means)

Another thing I’d like to clarify is that Roko’s Basilisk is so often characterized as an “evil AI”. But this misunderstands what’s truly horrifying about the prospect.

The Basilisk isn’t evil. It’s programmed to do the right thing.

You would want an AI to pull the lever in the trolley problem, to optimize for the greatest good for the greatest number. It will calculate the policy that leads to the greatest utility, and then executes that‍ ‍—‍ this is the premise we’re starting with.

Remember, “basilisk” is a category of behavior an AI can engage in, the problem as posed is how to ensure that we don’t create an acausal blackmailer.^noteThe blogpost, “How to prevent the creation of the torment nexus” is now mostly remembered through people joking about techbros imagining a nexus so tormenting we logically must create it.^endnote

The thought experiment here isn’t “what if an evil AI blackmailed us into creating it”, it’s “what if a good AI coerced people into creating it, because creating it is good and making people do good things is also good.”

Imagine the lives you could save if you had a sniper to the heads of your least favorite billionaires and politicians head and told them to start making changes.

It’s not good to threaten people, but the ends would outweigh that by far. And if they don’t believe you? It’s definitely not good to kill people just to make an example, but again, the lives and prosperity of the many are on the line.

The theorectical endgame for a post‍-singularity AI is an ever‍-expanding intersteller collective of computers that serve as the minds and bodies for uncountably many posthuman lives.

Torturing AI researches for eternity is a rounding error next to the sheer abundance of Good created in the wake of that AI’s ascent. The AI doesn’t want to torture anyone, it just calculates that this is an unfortunate necessity. Look what you made it do.

Anyway, when you finally wrap your head around the premises, it’s a fun thought experiment, but actually asserting the basilisk as a real possibility requires a whole chain of assumptions, each one of which are dubious.

This post is a “defense” only the sense that, taken on its own terms, the basilisk does in fact hang together logically. I’m not here to die in the trenches about the singularity or superhuman intelligence or any number of actually valid objections.

I think of Roko’s Basilisk like Boltzmann Brains. Am I really worried that I’m a soup of atoms randomly, fleetingly, assembled in the chaos of a star microseconds before its demise? No. But the nature of infinity and statistics entails its validity, whether the universe truly instantiates it or not.

I’m just tired of people thinking they’re roasting Roko’s Basilisk when what they’re actually talking about is a memetic fixpoint of a telephone game. No more imaginative dumbfounding about acausal trade, please.