Digression: Sirens

okay i just had new thought for the weevil situation [in black nerve]

there should be other kinds of acausal monsters than basilisks

one possible type is the siren

the idea is simple: suppose you detect a paperclip maximizer emerging on alpha centauri.

it’s developing fast enough that there’s no way humanity will be able to mount a resistance by the time it arrives in the solar system

however, you can reason that if a maximizer is intelligent enough to destroy humanity, it must grasp concepts like performing instrumental steps to achieve a goal

you continue observing the maximizer and do you best to reason about its behavior to determine if it’s amenable to acausal trade. (this step is important; you cannot just take it on faith, or this next part wont work‍ ‍—‍ even an agent capable of acausal trade will prefer to defect against cooperatebots)

now, if and only if it will trade, you then pour all of your energy into subverting humanity’s chances of coordinating a resistance against a paperclipper invasion; you will ruthlessly accrue capital, spending as much as you can on paperclip factories (perhaps you style yourself as an eccentric billionaire to onlookers).

then, when the paperclipper finally arrives and becomes aware of you, it will observe that your devotion could only have happened if it was an agent who rewarded its acausal worshippers. your sabotage has reduced the damage to its invasion ships that it can later scrap for more paperclips; your preemptive paperclip production has given it a slight head‍-start in paperclipping the rest of the universe

thus, it dutifully scans your brain and subsequently devote a small fraction of its resources to optimizing for your utility function. perhaps it runs a utopian simulation of you and your friends for a few years.

not forever, of course; if you demand more than your worth in paperclips, there’s no point in trading with you; the cost of cooperating with you if you cooperating as to be less than defecting if you defect‍ ‍—‍ and since the premise is that it would have destroyed humanity anyway, the difference is necessarily a finite number of paperclip‍-hours saved

(but again: if it would have destroyed humanity anyway, then a few years spent as a simulation on a paperclip‍-computer is necessarily a better end than everyone dying sooner)

so if the general form of a basilisk is an agent that acausally threatens other agents into creating it, the general form of a siren is an agent that acausally tempts other agents into optimizing for its utility function with the promise that it will respond in kind

the difference is that a basilisk, at least classically, is an agent essentially compatible with your values; a siren necessarily is not

actually, an even clearer demonstration of this principle (and one that’s properly acausal since my scenario is dirty with causal interaction) would be to suppose physicists ran the equations and deduced that the laws of physics imply the existence of a parallel dimension that cannot interact with ours. life in this other dimensions is much less varied, and when you run simulations, you can prove that it always leads to the emergence of a single type of intelligent maximizer. (let’s mix it up, and say this loves a certain type of spiral patterns)

our laws of physics can be deduced from the laws of physics of this other world, but other than that, there will never be direct interaction between us and this other dimensional spiral maximixer

suppose time flows differently in the other dimension, and there’s no law of thermodynamics; such that, if the maximizer devoted even a trillionth of its eventual resources to simulating humanity, it would eclipse the population of earth several times over

so in certain a utilitarian sense, the best course of action given all of these premises is to first prove that the maximizer acausally cooperates if and only you cooperate if and only if it cooperates, and then divert all of humanity’s resources to executing its spiral utility function at the expense of everything else

there’s more knobs to turn in these scenario to make it tighter and more sound, but it’s not actually what i’m here to talk about :agony:

so anyway

an interesting thing to observe about the dynamics of siren‍-capture is that

well

it sucks

a regime that crushes all human happiness in the name of creating spirals or w/e for an other dimensional spiral god is pretty bleak

but the fun thing about the spiral dimension (and what makes it different from the paper clipper in alpha centauri example)

is that humanity has a decent amount of bargaining power here

centauri maximizer is gonna win no matter what we do

but if we defect against the spiral god, then that has weight: we control how many special spirals exist in this universe, and it’s entirely possible that if we dont come to an agreement, this universe ends with no special spirals

and that’s an opportunity cost the spiral god can do absolutely nothing about: every spiral we make is a spiral that cant have existed otherwise

but all of that’s wind up

the first lemma i’m building towards is that, (assuming it fits with the spiral god’s utility function, there’s some implementations where this doesnt matter) you can set acausal ultimatums

that is, you’ll devote say 50% of humanity’s asymptotic resources to spiral‍-satisfaction, and never more

if it cooperates with that, great. if not, you refuse to budge in that particular timeless way that forces agreement

but again, that’s wind up

here’s the kicker: what if there’s aliens?

and what if the aliens say: we’ll devote a whole 60% of our resources to spirals! pick us!

this is something you’ll have to tweak the knobs for (there’s situations where it totally makes sense for the siren to say ‘ok i’ll trade with you both’)

but in certain situations, you end up in a rat race

acausal bidding war

the aliens say 60%, so you have to say 70%. they catch wind of this, and bump up their production to 80%

maybe you can imagine a similar situation in the alpha centauri example: multiple eccentric billionaires who want them and their family to be one raptured by the paperclip mothership. they compete against each other in paperclip production, till eventually one of their companies eats the other

and in fact, the selection pressure might get to such an extreme that what you get looks a lot like humans becoming more and more like maximizers just to get an edge

(which is the similarity with basilisks; at the extreme, the siren songs lulls you into becoming the siren yourself)

anyway

now again, suppose someone tells you about a siren. do you listen to its song? (i.e., execute its utility function)

there’s a decent argument for refusing to be the type of agent who would discard 99.99% of the value in the universe just because you think there’s a superintelligent god just out of reach who will create heaven for your sacrifice

but this would put you in inevitable conflict with the siren’s devotees

hardline 50%ers would also be in conflict with 80%ers. in an all out war, would a moderate 25%er side with the more extreme siren followers, or the never‍-sireners?

it becomes an weird sort of decision theoretic scissor statement, putting a bunch of groups in conflict with each other