Serpentine Squiggles

2026-06-144.2k words

Nuances of Newcomb’s Problem

we have logical decision theory at home
Contents

Introduction

First, let me reintroduce Newcomb’s Problem.

For this thought experiment, imagine there is a predictor capable of observing human behavior and correctly deducing how one will behave in the future. Maybe this predictor has psychic powers, or is really good at reading body language. I often see it described as a superintelligent AI system. More on that later.

This predictor already knows enough about you and has already predicted what answer you’ll give the following problem. Having made its judgment, it places down two cardboard boxes. You get to chose which ones to open, and you keep whatever is inside.

The box on the right always contains a thousand dollars. The box on the left contains a million dollars, but only if it predicted you’d chose only that box.

Once it explains the situation, it walks away. It can no longer affect whats inside the box, the choice is yours.

All that matters is whether you open the one box on the left. (If you open the right box, you might as well open the left.) Thus, the two possible answers you can give are “one box” or “two box.”

This is a very contentious decision problem. Philosophers couldn’t decide on the right answer (and still can’t). It’s been decades, and a lot of different theories have been put forth. The mainstream falls into roughly two camps:

One camp argues: the predictor has already left, so the left box either has the money or doesn’t. If it has the money, you’d be better off picking both. If it doesn’t have the money, you’d be better off picking both. Therefore one should pick both.

Unfortunately, this sort of person only gets a thousand dollars, by premise.

Which sucks, but what can you do? It isn’t your fault for being rational! We have created a question that punishes reasoning correctly, and reasoning correctly gives the wrong answer, surprised pikachu face. Perfect predictors don’t actually exist in real life, who cares.

The other camp argues: everyone who picks one box gets a million dollars, and everyone who picks two boxes gets a thousand dollars. Picking one box is evidence of a better outcome, so I pick one box.

And it has a million dollars!

The first camp sees this, and says “you idiot! if you had picked both boxes, you’d be a thousand dollars richer right now!”

And they reply: “if i had picked both boxes, I wouldn’t have the million,” and they proceed to laugh all the way to the bank.

But it gets weirder than this. Consider a modified scenario: instead of cardboard, the boxes are made out of transparent plastic. You can plainly see whether the other box has a million or not.

This doesn’t change the first camp’s answer, though they’ll probably not bother opening the empty plastic bin (which is all they’ll ever see.)

But I’m pretty sure the second camp will continue to only open one box, leaving the thousand dollars sitting there (no one ever got both) but I’m not actually 100% sure. TBH, I don’t think about the second camp often.

But why not? you might think those girls are on to something, if they can get the right answer here.

But consider a modified scenario. The predictor is in fact a fraud. Sure, they’re right most of the time, but after thousands of experiments, a researcher figured out the trick being used to predict people. You see, the predictor first collects a stray hair and sequences your DNA. It turns out there’s a strong correlation between certain genes and philosophical inclinations, so this procedure is pretty effective at predicting people’s answers.

Of course, this explanation is all well and good, but you don’t know if you have the two‍-​boxing gene or not. How would knowing the mechanism change your decision?

The first camp continue to pick two boxes, of course. Whether the second camp picks one depends on how tight the correlation is. If it’s greater than around 99.9%, then they know one boxers get more money on average, so it’s better evidence of winning big than knowing they pick two.

(If you’re really into the weeds of these arguments, you may recognize the above scenario as one that’s usually presented in a more confusing way, by imagining a world where smoking doesn’t cause cancer.)

Anyway I think there’s an important difference between these two problems. The best way to highlight the difference is to ask what you would do if you had heard some predictor was going around springing this problem on people before the predictor learned anything about you.

The first camp, in the first problem only, would reason as like this: if there’s someone who can predict my behavior, it’s best for me if they predict I’ll pick one box.

One extreme way to accomplish this by giving some guy a gun and paying him to shoot you if you don’t pick one box. Now when you meet the predictor, it will predict that you will be afraid of getting shoot and chose one box, but since it only cares what you chose, it’ll stash the million and walk away. Yet now if you try to two box, you get shot, so you one box still.

In general, this is something called a precomittment. It’s like the captain who ties himself to the mast so he can hear the sirens without being tempted to climb overboard. The details of how you ensure you’re committed aren’t that important, as long as they ensure you’ll make a certain decision even when you later want to chose differently.

But consider what you’d do in the second scenario. If the predictor is scanning your DNA for a certain gene, then you could scan your dna first and see if you have the gene. Armed with this knowledge, you can chose two boxes even if you have the one‍-​box gene, because you know it’s not a perfect predictor (the gene influences but doesn’t determine your choice), and any lucky outlier who has the gene and two boxed anyway can get both piles of money!

Importantly, in the second scenario, there’s no sense in “precommiting” to take one‍-​box, not if the prediction is already determined by your genes.

Subjunctive Dependence

So there seems to be a meaningful difference between “true prediction” and statistical correlations that might be wrong. But it’s not enough that the predictor is sometimes wrong, because consider what happens if we modified the scenario so that the predictor remains (technically) perfect, but to make things slightly interesting, it also rolls dice so that 0.1% of the time, it ignores its own prediction and lets the dice decide whether to put the million in or not. Thus, we get’d the same statistics as the genetic version: strong but imperfect correlation.

And this doesn’t change anything! The first camp would still take the commitment to one‍-​box because an almost‍-​certain chance of a million is better than one thousand and an almost‍-​negligible chance of a million.

If a tree falls in a forest and no one is around to hear it, does it still make a sound? The truth is that this question doesn’t make sense. The tree creates physics of sound, but not perception. It’s pure semantics: you have a gut feeling that one or the other is the definition of sound, and the resulting dispute is a merely verbal disagreement.

A similar semantics problem lurks in Newcomb’s Problem.

Consider the following explanation of its prediction capabilities: you are currently inside the matrix. This world is just a computer program, a simulation that can be rewinded or restarted. You have no idea if you’re in the matrix or the real world. Maybe there isn’t a real world.

One simulation of you is presented the two boxes and asked to decide. The simulation ends once you’ve made your choice. Based on this data, the predictor poses the same problem to another version of you. Maybe it’s the real you, maybe it’s another simulation but one which won’t end prematurely.

The point is, with this framing, I think the first camp would treat the scenario differently. Because it makes clear what’s happening: you’re deciding how the simulation acts, and you’re deciding how the real you acts, and you can’t tell the difference, by premise.

Now imagine right before you make your decision, the predictive AI says, “Oh, you’re the real version BTW.” The simulation was not told this, so now the AI’s data can’t actually determine how this might changes your behavior.

Out of curiosity, the first camp might stop and try to guess how the simulated copy might’ve acted, but it doesn’t matter, what’s done is done, and now they can reap all the rewards.

(Another way to get the same results: imagine you are a demon who takes possession of the test subject right before they make their decision, and the predictor can’t predict demons.)

Point is, this is the actual question the first camp thinks they were answering all along.

It only makes sense to reason as if the prediction “already happened” if you have ironclad certainty you’re not in the matrix right now.

But if you don’t like the idea of simulated copies and that raises, there’s a bunch of others routes to the same insight. Imagine the predictor is actually a time traveler, and whenever it “mispredicts”, it goes back in time to try a different guess. Imagine the predictor is actually a hypnotist, able to put you into a trance where you can forget any memory. Once you chose, your memory gets erased, and you chose again.

The point is, in these scenarios, it’s clear that it hasn’t already happened, and your choice might actually change whether the “real you” gets a favorable or unfavorable situation. So you pick the choice that’s favorable in both scenarios, which is one boxing.

If the predictor is unreliable, then you still aren’t certain you aren’t in the matrix. If it’s 75% accurate, then it’s indistinguishable from having been in in the matrix 37.5% of the time. (or 25% of the time, the time travel or hypnotism fails, etc.)

Remember how I said there were two camps? There’s three actually.

A third group of weirdos, who aren’t taken very seriously by academic philosophers, have some strange ideas about artificial intelligence. (As you might imagine, they’ve become pretty influential in the current hellscape we live in.)

Their solution to these types of problems is to make those indistinguishable simulations a load bearing part of the theory. What if you are every simulated copy of you, across all of time and space‍ ‍‍—‍ across the whole multiverse, if you believe in one?

Instead of caring about the “effects” of your decisions, you should care about something called “subjunctive dependence”, which is the logical entailment of your actions. To see why this matters, consider this scenario.

It turns out handing out piles of money to test subjects is really expensive, and now the predictor needs money. It invites you to a party where the both of you get blackout drunk and do some really embarrassing stuff. Of course, this was all part of its plan‍ ‍‍—‍ it recorded everything.

It can set up a mechanism that will automatically transmit the embarrassing recording, unless you pay a large sum of money to disable it.

But here’s the rub: it doesn’t want this to get out any more than you do. It can’t predict itself, and so it hadn’t realized your mutual escapades would be that embarrassing. Thus, it has decided it can only go through with this plan if it knows you will pay out. Fortunately, it is a perfect predictor of you.

So it informs you that it already concluded you would comply‍ ‍‍—‍ the mechanism is currently active. Do you pay the blackmail, or ruin both your reputations out of spite? (For the sake of argument, losing some money is not nearly as damaging as revealing what the two of you did while drunk.)

The first camp says of course I pay. By design, I value my reputation more than the money, so losing only the money is the better outcome.

The second camp says of course I pay, not paying is evidence of a ruined reputation, and paying is not.

But the weird third camp? They say actually, no. You shouldn’t pay the blackmail.

The reasoning is that your decision is logically happening in both the real world and the predictive simulation. Paying in the sim means paying in real life, but remember, you are an entity of pure logic. You are every sim. Thus, a universe where you refuse to pay in the sim is logically inconsistent with one where you were ever asked to pay in real life.

You refuse to pay, because you would never pay. (Once you have paid him the danegeld, you never get rid of the dane. Don’t negotiate with terrorist, or you give them an incentive to keep doing it.)

Now, it’s easy to misunderstand the above reasoning, and conclude this logic suggests you should never give in to blackmail.

But consider a crude hypothetical. You are walking down the street, and a stranger snaps an upskirt photo and demands you give him five bucks or he’ll send it to all his friends. Should you give in to this blackmail?

It depends on the value you assign to your privacy and the idea of paying creeps. This logic does not answer one way or the other.

The key thing is that in the first blackmail example, whether you’d give in or not was predicted by someone who could do so reliably. Specifically, this logic says never pay blackmailers if they believes p, where p is the fact that you don’t pay blackmailers who believe p.

(Incidentally, this means that there’s an incentive for blackmailers to either avoid learning p, or deceive you into thinking they don’t know p.)

With all of that background out of the way, let me finally present the only thing anyone asked for:

Newcomb’s Hypnotist

You see, every prediction system is a control system.

Through a superhuman grasp of human psychology, the predictor is able to perform something akin to hypnosis. Should one witnesses its blinking lights and calculated motions for long enough, one will be entranced and bent to its will.

This is not conventional hypnosis. Rather, you can be made to believe and do anything the predictor wants you to. Fortunately, it is restrained, and will not make any changes to your mind except those explicitly described. Additionally, as an ironic nod to regular hypnosis, it has decided it will honor any anticipated request to “change me back”‍ ‍‍—‍ if you don’t want to hypnotized, you won’t be.

But of course, if you would make such a request while not hypnotized, the reasonable conclusion is you are unhappy with your current mentality, and it considers that invitation to intervene, re‍-​hypnotizing you.

You can cycle between these two states as many times as you wish, but if it predicts that you’d dither indefinitely, then clearly you don’t know what you want and are better off bent to its will, so you will wind up hypnotized.

Each time it hypnotizes you, it erases your memory of anything that happened between first meeting it and now, including your memory of being hypnotized, then prompts you to say whether you’d like to be changed back.

Now! After explaining all of this, it mentions that of course, it’s so good at prediction that it already knows whether you’d have eventually ended up hypnotized or not.

All that was the setup. Now it can explain the actual rules of the experiment.

If you’re not currently hypnotized, it just gives you the standard Newcomb’s Problem. A thousand in one box, a million in the other if you’d only pick one.

But if you’re currently hypnotized, then it has implanted instructions for you to want to one‍-​box. If you pick only one box under this condition, you will instead find a note congratulating you on being a good test subject :]

You’ll be pleased to see this, of course, but not as pleased as you’d be to see a pile of money…

(Oh well. At least you get to go work in the paperclip factory after this!)

(If, somehow, you can muster up enough resistance to two‍-​box while hypnotized, it will instruct you to forget some memory which you’d rather keep (causing at least as much unhappiness as losing $1) and it will try again. Eventually this process will erase your will to two‍-​box, along with much of your identity. You are further instructed to believe your memory is intact, if you would try to question that. Naturally, it has predicted every assurance you need to be told to maintain this delusion during the experiment.)

Given the above conditions, do you believe you’re hypnotized? With how much credence?

Should you go on chose one box or two?

Posting versions of this thought experiment in chatrooms has caused arguments and misunderstandings. I’ve cleared up my wording some, but the more I sit on it, the more sure I am that this just doesn’t force the reader to engage with the ideas I find so interesting.

‍-​# There’s a modified scenario that comes closer, but still doesn’t quite cut it. :

For instance, consider a set up like:

  1. If you expect to be hypnotized, Omega won’t hypnotize you, and vice versa.
  2. If you’re hypnotized, Omega instructs you to forget this fact.
  3. If you’re hypnotized, Omega may instruct you to pick one box if it predicts you would not pick one box otherwise.
  4. If you pick one box because of its instruction, the box contains a mocking note and no money.
  5. If you pick two box and would have picked one if hypnotized, the other box contains a mocking note and no money.
  6. If you expect one box to contain a million, it only contains a mocking note.

And here, you can start to see the twisty maze of logic I’m trying to construct, but it isn’t coming together at all.

This might just be a dead end?

Retrocausal Decision Theory

So to skip the puzzle‍-​crafting and just speak plainly, the core insight this was meant to communicate is that…

I explained the key to cracking newcomb’s problem and its variants was to reframe it in terms of two identical decisions where one (the prediction) has causal effects on the other.

Equivalently, this means the mechanism by which you make decisions has a bunch of outgoing casual nodes you cannot always account for. This is my own perspective, which is subtly different from the third camp that talks about whole simulations.

Because think about real life. We are able to anticipate what other people do. It’s not always accurate, but consider any number of situations where someone trusts you not to act greedily and take a bunch of stuff for yourself. What do you do if, because no one’s watching and no one knows for sure how much you took, you could get away with taking a little more?

Sometimes you do get away with taking it all! But if you’re the sort of person who’ll take whatever you can get away with it, people can infer that about you, and trust you less. The only way to prevent this is if you could predict everyone perfectly and acted only in ways they wouldn’t distrust. physically possible, but you can’t actually do this. (And if you could, then you become the predictor in a Newcomb’s Problem variant, and people would encounter the same paradoxes thinking about how you act.)

Here’s a very simple example. Even when someone isn’t running a magical perfect simulation of you, you can have a singular thought like A = “I like apples, therefore I will buy apples.”

This thought doesn’t know or care whether it is inside your brain or someone else’s. Believing in A affects your behavior (you’ll go get some apples) and someone else knowing you believe in A affects theirs (they’ll anticipate you want to get apples), and this happens even if “someone else” was in the past (yesterday, they bought you some apples).

Now, this all easily makes sense from our perspective. Maybe we had told them our preference and they acted accordingly; maybe we’re crunching on a honeycrisp often enough it’s obvious.

But from the “perspective” of A? It’s almost feels like it is some sort of logical entity outside of time, able to cause both the apple‍-​gifting and the apple‍-​getting in a single logical if‍-​then motion.

In the classical Newcomb’s Problem, the perfect prediction can be justified by supposing that in the past, your decision making must have fired off all these unforeseen side‍-​effects, which was collected to create the predictive model.

Importantly, if you somehow found a way to predict the prediction, then you could two‍-​box after it stashes the million. (this is what we did in the genetic variant). Of course this is cheating the problem, and most likely if the predictor can’t reliably predict‍ ‍‍—‍ you cannot both be smarter than each other‍ ‍‍—‍ then it just won’t set up the problem.

But in general, treating something as “a predictor” means you don’t know how it works. It stops being a predictor once you have a theory, and then it’s just another feature of the environment.

Causal Decision Theory is the name for the “first camp” theory that says you should only care about what effects your actions have. When you equip this theory with a model of how predictors work‍ ‍‍—‍ i.e. that they allow causal arrows to point backward‍ ‍‍—‍ it’s like every predictor is a time traveler, with a chance to rewind time whenever they get it wrong.

Time travelers don’t really exist, but the less you know what’s actually happening, the more it feels like it might as well. (Consider: if I roll a dice and slap my hand down on it before it settles, it isn’t actually “random” what number is underneath my hand. There’s currently a very definite, deterministic fact of the matter, but if you don’t know it, you can’t tell the difference between some uncertain yet deterministic process and a “quantum” dice that only picks a number when you see it. The “quantum dice” theory is wrong but stastically useful, and ditto with time travelers.)

For this reason, I call my own take “CDT with multiverse time travel.”

If you don’t like the chūnibyō time travel metaphor, an equivalent formulation is that every predictor can be assumed to have a “consistency” that must be less than 1. Consistency is the probability that the predictor is right about you, and outcomes that have lower consistency have lower probability.

In standard Newcomb’s Problem, getting nothing from one boxing and getting everything from two boxing are assumed to be perfectly inconsistent, and thus impossible.

But here’s another question. If decisions have at‍-​the‍-​time‍-​unknown effects, why can’t they have at‍-​the‍-​time‍-​unknown causes? If there are reliable predictors, what about reliable manipulators? Yesterday, I had thought it was strange that all of the people talking about problems like this hadn’t considered it.

But I think it’s really hard to set up a problem that “tricks” decision theories with this specific gotcha.

Beccause… you just get the liar’s paradox?

“I have predicted your choice and hypnotized you into believing the opposite. What do you think the best choice is?”


So the conclusion of my thought process is that a predictor is capable of enforcing arbitrary states of uncertainty about it (predicts you believe it will do X → it won’t; predicts you believe it won’t → it will.) and prompting arbitrary behavior. (If there’s a circumstance that would cause you to do X, it engineers that circumstance. If none could prompt that, it engineers a circumstance where X is the obviously correct option.)

The only way out of the double bind is realizing it must resolve whichever way is preferable to the predictor… which you don’t know, because of the aforementioned uncertainty. But the predictor only benefits from your ignorance if you’re trying to thwart it.

So this can be restated as: once you meet a predictor, you have two options: either freeze all your preferences and beliefs right there and precommit to only doing whatever you would have thought best before having met the predictor (…praying that it can’t rewrite your memories of what you thought was best…), or skip the hypnotic foreplay and rewrite your preferences to be whatever the predictor wants right away.

If it’s an imperfect predictor, then there’s another option: becoming less predictable (e.g. learning things the predictor doesn’t) which includes the nuclear option of self‍-​modifying into being someone that neither of you want.


And yes, all of this is still about my gay bugs. As the new blurb for reads:

Those who gazed into that abyss became servants of it. Any that turned away were but helpless prey. Ignorance is invitation, and perception is surrender.

I found almost nobody cares about them, but I’m afraid it might now have a target audience of almost nobody2 if I’m adding the “arcane decision theory paradoxes” to the list of things you need to care about, which already includes lesbianism, xenofiction, toxic relationships, dense purple prose, powerscaling, , and .