Bayesianism versus Smoothing: In Which I Surrender Unconditionally to Cosma Shalizi

Surrender alesia Google Search

I think it is time for me to issue an unconditional intellectual surrender to Cosma Shalizi. Watching Nate Silver and his now http://fivethirtyeight.com over the past two election cycles has convinced me that the Bayesian framework he throws around his model is a major obstacle to people’s understanding what is going on.

What is going on is made up of three things:

  1. Polling–that is, asking people what they think of the election candidates in a structured way.

  2. Aggregation–so that you are not just using one sample of 1000 to assess the current mood of the electorate but instead have something like 1/5 of the sampling standard error.

  3. Smoothing–imposing structure on the time series, both that it ought to be close to “fundamentals” and that it ought not to change too quickly.

But next to nobody reading Nate Silver and company’s “nowcast”, “polls-only”, and “polls-plus” forecast probabilities as they evolve overtime gets any sense of how the sausage is made.

It remains the case that the decision theorist in the subbasement dungeon of my brain whimpers that Bayesian posterior probabilities are what we ultimately want.

But, these days, when it says that, I gag and shorten its chain:

  1. I point out to it that what we really want as decision theorists are not Bayesian posterior probabilities but rather the misnamed “risk neutral probabilities” that are posterior odds times the utility of the outcome.

  2. I point out to it that if we are betting against other minds we need to know in what ways their information sets might be superior to ours and what disadvantage that puts us at: that invulnerability to a Dutch Book is a third-order consideration in a world in which others might will know of jacks of spades that will piss in your ear on command.

  3. And I point out to it that the answer to the frequentest question, “how different might our conclusions have been had we drawn a different sample?” provides much more insight into whether our procedures are converging to something sensible than any ex-ante Bayesian proof that we knew in advance, before we start the analysis, that our procedures must converge.

So go visit Sam Wang: polls, aggregation, soothing, plus not unreasonable random drift strike zones are more helpful than three different sets of posterior odds–given my suspicion that there is right now no action from the 538.com stuff on the truck side of the polls plus odds…

Must-read: John Holbo: It is difficult to get a man to intuit p-values when his h-index depends upon his not intuiting them

Must-Read: John Holbo: It is difficult to get a man to intuit p-values when his h-index depends upon his not intuiting them: “What p-value < .05 basically comes to…

…1) If this coin is fair, odds are less than 1 in 20 that you could match or beat that 5-heads run I just got!…. Now, to go with, an informal gloss on what your average scientific paper reports/asserts. No such thing as the prestigious science journal Fluke, so when a striking regularity of coin flips presents itself… scientific papers say: 2) Probably this is a trick coin!… Now we can trade in the rather confusing question—‘how does that p-value < .05 thing relate to the substantive take-away we really care about?’— with a less confusing question. What’s the relation between 1 and 2?…

5-heads in a row is evidence your coin is trick, or not, depending on background conditions. It could be weak evidence – so weak as to be none – or actually quite strong. Let’s talk through it. We are immediately inclined to say it’s weak evidence because we assume we are talking about our world, or one like it, in which trick coins are… waaaaaaaaay more unlikely than plain old flipping 5 heads. Ergo a 5-head run is vastly more likely to have been a fluke.

But, obviously, if the world is different things change. Suppose you are running to the bank with your brimming mason jar of quarters, and you collide with Mysterioso the Mysterious, carrying his equally large, equally full jar of trick quarters…. Oh no! The coins are mixed up! What to do? Flipping each 5 times is a decent method….

To review: we’re on the street, coins everywhere, magician swearing, jars rolling. From an even mix of fair and trick coins (per above) you pick a coin (any coin!) and flip – 5-heads. What to conclude? There is a 1 in 32 likelihood that this happened just by (longshot) chance. That is, given 5-heads, there is a 1-in-32 chance that you happen to have picked a fair coin (as likely as the alternative); then (flukily) you flipped 5 heads with it. On the other hand, there is a 31 out of 32 likelihood that… you picked a trick coin….

So if you want to explain to someone why their ‘likelihood that this thing happened just by chance’ intuition about p-values is wrong, flip it and tell them what they are thinking could be right, but only if they just collided with Mysterioso, as it were. So you gotta ask yourself: do you have reason to believe you just collided with Mysterioso? (Well do ya? Punk!?) OK, I promised intuitive….

Informally, a ‘collision with Mysterioso’ case can be glossed as: 1) The alternatives are each equally likely. (Fair coins roughly = trick in number, on the ground.) 2) The alternatives are each pretty likely. (If there are 20 different kinds of differently-behaved trick coins, scattered in equal numbers, flipping one 5 times can’t give you confidence as to which kind you’ve got.) 3) The alternatives are each quite different. (If trick behavior is subtle, 5 flips won’t cut it.) The world does present you, from time to time, with situations you can reasonably believe meet conditions 1-3. In any such case, misusing 1) as a reverse mirror, to say what is true if 2) will not be wildly off. But be aware this is a heuristic way to live the life of the mind. Very sketchy.

Let’s illustrate with a realistic case where 1-3 don’t hold, but people are in fact likely to reason, wrongly, as if they do. I tell you formula XYZ was administered to 5 cancer patients and they all recovered soon after. Would you say formula XYZ sounds likely to be an effective cancer treatment? Many would say yes. But now I add that formula XYZ is water and everyone immediately sees the problem. They were assuming it was independently even-odds XYZ was curative, or not. But it’s obviously not. A cure for cancer is like a trick coin. You don’t find one everyday. They’re 1 in 10 million. But if you are reasoning as if you just collided with Mysterioso, you may trick yourself into thinking maybe you just cured cancer. Intuitive?

Must-read: Cosma Shalizi (2011): “When Bayesians Can’t Handle the Truth”

Must-Read: So I was teaching Acemoglu, Johnson, and Robinson’s “Atlantic Trade” paper last week, and pointing out that (a) eighteenth-century England is a hugely-influential observation at the very edge of the range of the independent variables in the regression, and (b) it carries a huge residual even with a large estimated coefficient on Atlantic trade interacted with representative government. The huge residual, I said, means that the computer is saying: “I really do not like this model”. The rejection of a null hypothesis on the coefficient of interest is the computer saying “even though the model with a large coefficient is very unlikely, the model with a zero coefficient is very very very unlikely”. But, I said, Acemoglu, Johnson, and Robinson do not let their computer say the first statement, but only the second.

And so I thought of Cosma Shalizi and his:

Cosma Shalizi (2011): When Bayesians Can’t Handle the Truth: “When should a frequentist expect Bayesian updating to work?…

…There are elegant results on the consistency of Bayesian updating for well-specified models facing IID or Markovian data, but both completely correct models and fully observed states are vanishingly rare. In this talk, I give conditions for posterior convergence that hold when the prior excludes the truth, which may have complex dependencies. The key dynamical assumption is the convergence of time-averaged log likelihoods (Shannon- McMillan-Breiman property). The main statistical assumption is a building into the prior a form of capacity control related to the method of sieves. With these, I derive posterior convergence and a large deviations principle for the posterior, even in infinite- dimensional hypothesis spaces, extending in some cases to the rates of convergence; and clarify role of the prior and of model averaging as regularization devices. Paper: http://projecteuclid.org/euclid.ejs/1256822130

Must-read: Andrew Gelman and Cosma Rohilla Shalizi: “Philosophy and the practice of Bayesian statistics”

Andrew Gelman and Cosma Rohilla Shalizi (2011): Philosophy and the Practice of Bayesian Statistics: “A substantial school in the philosophy of science…

…identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science.

Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework.

Must-Read: Felix Schönbrodt: The False Discovery Rate (FDR) and the Positive Predictive Value (PPV)

Must-Read: Felix Schönbrodt: The False Discovery Rate (FDR) and the Positive Predictive Value (PPV): “To answer the question ‘What’s the probability that a significant p-value indicates a true effect?’…

…we are interested in a conditional probability Prob(effect is real | p-value is significant). Inspired by Colquhoun (2014)… a tree-diagram….

You re probably thinking of p values all wrong p 0 05 The Incidental Economist

[For a population in which 30% of investigated effects are real and statistical power is 35%,] the false discovery rate (FDR): 35 of (35+105) = 140 significant p-values actually come from a null effect… much more than… 5%…. Together with Michael Zehetleitner I developed an interactive app that computes and visualizes these numbers…. Run the App