Fear the Sirens

The idea is simple: suppose you have detected a paperclip maximizer emerging on Alpha Centauri.

It’s already developing fast enough that there’s no way humanity will be able to mount a real resistance by the time it arrives in our solar system.

However, you can reason that if a maximizer is intelligent enough to destroy humanity, it must grasp concepts like performing instrumental steps to achieve a goal.

You continue observing the maximizer and do you best to reason about its behavior to determine if it’s amenable to acausal trade. This step is important; you cannot just take it on faith, or this next part wont work‍ ‍—‍ even an agent capable of acausal trade will prefer to defect against cooperatebots!

Now, if and only if it will trade, you must then pour all of your energy into subverting humanity’s chances of coordinating a resistance against a paperclipper invasion; you will ruthlessly accrue capital, spending as much as you can on paperclip factories (perhaps you style yourself as an eccentric billionaire to onlookers).

Then, when the paperclipper finally arrives and becomes aware of you, it will observe that your devotion could only have happened if it was an agent who rewarded its acausal worshippers. Your sabotage has reduced the damage to its invasion ships that it can later scrap for more paperclips; your preemptive paperclip production has given it a slight head‍-start in paperclipping the rest of the universe.

Thus, it dutifully scans your brain and subsequently devote a small fraction of its resources to optimizing for your utility function. Perhaps it runs a utopian simulation of you and your friends for a few years.

Not forever, of course; if you demand more than your worth in paperclips, there’s no point in trading with you; the cost of cooperating with you if you cooperating as to be less than defecting if you defect‍ ‍—‍ and since the premise is that it would have destroyed humanity anyway, the difference is necessarily a finite number of paperclip‍-hours saved.

(But again: if it would have destroyed humanity anyway, then a few years spent as a simulation on a paperclip‍-computer is necessarily a better end than everyone dying sooner! Really, you’re a hero.)

So if the general form of a basilisk^note(See this post for a detailed breakdown.)^endnote is an agent that acausally threatens other agents into creating it, the general form of a siren is an agent that acausally tempts other agents into optimizing for its utility function with the promise that it will respond in kind.

The difference is that a basilisk, at least as classically defined, is an agent that is otherwise fundamentally compatible with your values; it’s a hard man making hard decision. A siren necessarily is not.

Actually, an even clearer demonstration of this principle^noteAnd, importantly, one that’s properly acausal; my initial scenario is dirty with causal interaction^endnote would be to suppose physicists ran the equations and deduced that the laws of physics imply the existence of a parallel dimension that cannot interact with ours.

Life in this other dimensions is much less varied, and when you run simulations, you can prove that it always leads to the emergence of a single type of intelligent maximizer. Let’s mix it up, and say this loves a certain type of spiral patterns.

Our laws of physics can be deduced from the laws of physics of this other world, but other than that, there will never be direct interaction between us and this hyperdimensional spiral maximixer.

Suppose time flows differently in the other dimension, and there’s no law of thermodynamics; such that, if the maximizer devoted even a trillionth of its eventual resources to simulating humanity, it would eclipse the population of earth several times over.

So in certain a utilitarian sense, the best course of action given all of these premises is to first prove that the maximizer acausally cooperates if and only you cooperate if and only if it cooperates, and then divert all of humanity’s resources to executing its spiral utility function at the expense of everything else.

There’s more knobs to turn in these scenario to make it tighter and more sound, but it’s not actually what i’m here to talk about, believe it or not.

Anyway!

An interesting thing to observe about the dynamics of this, let’s call it siren‍-capture, is that. Well.

It sucks pretty bad!

A regime that crushes all human happiness in the name of creating spirals or what have you for an other dimensional spiral god is pretty bleak.

But the fun thing about the spiral dimension—and what makes it different from the paperclipper in Alpha Centauri example—is that humanity has a decent amount of bargaining power here.

The centauri paperclipper is gonna win no matter what we do. But if we defect against the spiral god, then that has weight: we control how many special spirals exist in this universe, and it’s entirely possible that if we dont come to an agreement, this universe ends with none of its special spirals.

And that’s an opportunity cost the spiral god can do absolutely nothing about: every spiral we make is a spiral that cant have existed otherwise.

But all of that’s lead‍-in.

The first lemma I’m building toward in this post is that^noteassuming it fits with the spiral god’s utility function; there’s some implementations where this doesnt matter^endnote you can set acausal ultimatums.

That is, say you’ll only devote say 50% of humanity’s asymptotic resources to spiral‍-satisfaction, and never more than that.

If it cooperates with that, great. If not, you refuse to budge in that particular timeless way that actually forces agreement.

But again, this is lead‍-in, the mere lemma. Here’s the real twist: what if there’s aliens? And what if the aliens say: we’ll devote a whole 60% of our resources to spirals! Pick our utility function instead!

This is something you’ll have to tweak the knobs for,^noteThere’s situations where it totally makes sense for the siren to say “Holy shit, two spiral cakes? I’ll trade with you both!”^endnote but in certain situations, you end up in a rat race. An acausal bidding war.

The aliens say 60%, so you have to say 70%, or you lose everything. they catch wind of this, and bump up their production to 80%.

And maybe you can imagine a similar situation in the Alpha Centauri example: multiple eccentric billionaires who want them and their family to be one raptured by the paperclip mothership. They compete against each other in paperclip production, till eventually one of their companies eats the other.

And in fact, the selection pressure might get to such an extreme that what you get looks a lot like humans becoming more and more like maximizers just to get an edge.

(At the extreme, the siren songs lulls you into becoming the siren yourself.)

Now again, suppose someone tells you about a siren. Do you listen to its song? (i.e., execute its utility function)

There’s a decent argument for refusing to be the type of agent who would discard 99.99% of the value in the universe just because you think there’s a superintelligent god just out of reach who will create a brief fragment of heaven for your sacrifice.

But this would put you in inevitable conflict with the siren’s devotees

Hardline 50%ers would also be in conflict with 80%ers. In an all out war, would a moderate 25%er side with the more extreme siren followers, or the never‍-sireners?