Tuesday, September 8, 2015

Bayesianism for humans: "probable enough"

(note: this is a copy of my LW post. I'm gathering all my stuff in one place)


There are two insights from Bayesianism which occurred to me and which I hadn't seen anywhere else before.
I like lists in the two posts linked above, so for the sake of completeness, I'm going to add my two cents to a public domain. Second penny is here.


"Probable enough"

When you have eliminated the impossible, whatever  remains is often more improbable than your having made a mistake in one  of your impossibility proofs.


Bayesian way of thinking introduced me to the idea of "hypothesis which is probably isn't true, but probable enough to rise to the level of conscious attention" — in other words, to the situation when P(H) is notable but less than 50%.

Looking back, I think that the notion of taking seriously something which you don't think is true was alien to me. Hence, everything was either probably true or probably false; things from the former category were over-confidently certain, and things from the latter category were barely worth thinking about.

This model was correct, but only in a formal sense.

Suppose you are living in Gotham, the city famous because of it's crime rate and it's masked (and well-funded) vigilante, Batman. Recently you had read The Better Angels of Our Nature: Why Violence Has Declined by Steven Pinker, and according to some theories described here, Batman isn't good for Gotham at all.

Now you know, for example, the theory of Donald Black that "crime is, from the point of view of the perpetrator, the pursuit of justice". You know about idea that in order for crime rate to drop, people should perceive their law system as legitimate. You suspect that criminals beaten by Bats don't perceive the act as a fair and regular punishment for something bad, or an attempt to defend them from injustice; instead the act is perceived as a round of bad luck. So, the criminals are busy plotting their revenge, not internalizing civil norms.

You believe that if you send your copy of book (with key passages highlighted) to the person connected to Batman, Batman will change his ways and Gotham will become much more nice in terms of homicide rate. 

So you are trying to find out Batman's secret identity, and there are 17 possible suspects. Derek Powers looks like a good candidate: he is wealthy, and has a long history of secretly delegating illegal-violence-including tasks to his henchmen; however, his motivation is far from obvious. You estimate P(Derek Powers employs Batman) as 20%. You have very little information about other candidates, like Ferris Boyle, Bruce Wayne, Roland Daggett, Lucius Fox or Matches Malone, so you assign an equal 5% to everyone else.

In this case you should pick Derek Powers as your best guess when forced to name only one candidate (for example, if you forced to send the book to someone today), but also you should be aware that your guess is 80% likely to be wrong. When making expected utility calculations, you should take Derek Powers more seriously than Lucius Fox, but only by 15% more seriously.

In other words, you should take maximum a posteriori probability hypothesis into account while not deluding yourself into thinking that now you understand everything or nothing at all. Derek Powers hypothesis probably isn't true; but it is useful.

Sometimes I find it easier to reframe question from "what hypothesis is true?" to "what hypothesis is probable enough?". Now it's totally okay that your pet theory isn't probable but still probable enough, so doubt becomes easier. Also, you are aware that your pet theory is likely to be wrong (and this is nothing to be sad about), so the alternatives come to mind more naturally.

These "probable enough" hypothesis can serve as a very concise summaries of state of your knowledge when you simultaneously outline the general sort of evidence you've observed, and stress that you aren't really sure. I like to think about it like a rough, qualitative and more System1-friendly variant of Likelihood ratio sharing.

Planning Fallacy

The original explanation of planning fallacy (proposed by Kahneman and Tversky) is about people focusing on a most optimistic scenario when asked about typical one (instead of trying to do an Outside VIew). If you keep the distinction between "probable" and "probable enough" in mind, you can see this claim in a new light.

Because the most optimistic scenario is the most probable and the most typical one, in a certain sense.

The illustration, with numbers pulled out of thin air, goes like this: so, you want to visit a museum.

The first thing you need to do is to get dressed and take your keys and stuff. Usually (with 80% probability) you do this very quick, but there is a weak possibility of your museum ticket having been devoured by an entropy monster living on your computer table.

The second thing is to catch bus. Usually (p = 80%), bus is on schedule, but sometimes it can be too early or too late. After this, the bus could (20%) or could not (80%) get stuck in a traffic jam.

Finally, you need to find a museum building. You've been there before once, so you sorta remember your route, yet still could be lost with 20% probability.

And there you have it: P(everything is fine) = 40%, and probability of every other scenario is 10% or even less. "Everything is fine" is probable enough, yet likely to be false. Supposedly, humans pick MAP hypothesis and then forget about every other scenario in order to save computations.

Also, "everything is fine" is a good description of your plan. If your friend asks you, "so how are you planning to get to the museum?", and you answer "well, I catch the bus, get stuck in a traffic jam for 30 agonizing minutes, and then just walk from here", your friend is going  to get a completely wrong idea about dangers of your journey. So, in a certain sense, "everything is fine" is a typical scenario. 

Maybe it isn't human inability to pick the most likely scenario which should be blamed. Maybe it is false assumption that "most likely == likely to be correct" which contributes to this ubiquitous error.

In this case you would be better off having picked the "something will go wrong, and I will be late", instead of "everything will be fine".

So, sometimes you are interested in the best specimen out of your hypothesis space, sometimes you are interested in a most likely thingy (and it doesn't matter how vague it would be), and sometimes there are no shortcuts, and you have to do an actual expected utility calculation.

No comments:

Post a Comment