The price and payoff of Bayesian statistics

I’ve never totally understood why people complain so much about having to specify prior distributions in order to do Bayesian inference. Even if you’re doing frequentist statistics, you have to make some assumptions about the world and about your data. If you’re using some maximum likelihood based approach, you’re counting on asymptotics to get you to multivariate normality — and so many data analysis problems just don’t have the sample size for that.

The big payoff with Bayesian statistics, it seems to me, is that you get full-on probability distributions as output, not just a mean and a standard error. But everyone focuses on specification of the prior.

Johnson & Albert in Ordinal Data Modeling:

The additional “price” of Bayesian inference is, thus, the requirement to specify the marginal distribution of the parameter values, or the prior. The return on this investment is substantial. We are no longer obliged to rely on asymptotic arguments when performing inferences on the model parameters, but instead can base these inferences on the exact conditional distribution of the model parameters given observed data–the posterior.

That is a huge payoff. But even more important than that, Bayesian statistics is so much more believable than classical. I am almost happy that I spent 15 years ignorant of what was going on in academic statistics so I could jump on the Bayesian train now.

Here’s one of the first “pop statistics” articles I’ve seen — an attempt to clarify for the layperson what is going on with statistical practice in academic research. It’s a good article. I learned a few things and found a few interesting references.

Reporter Siegfried misses a couple important points though. He doesn’t note that frequentist statistics are based on repeated sampling on into infinity, that confidence intervals cannot be interpreted except with reference to the long run. This is endlessly confusing to intro stats students. Most of them probably never absorb it.

And what about Bayesian statistics? Siegfried, like so many others, focuses on specifying the prior:

Bayesian math seems baffling at first, even to many scientists, but it basically just reflects the need to include previous knowledge when drawing conclusions from new observations. To infer the odds that a barking dog is hungry, for instance, it is not enough to know how often the dog barks when well-fed. You also need to know how often it eats — in order to calculate the prior probability of being hungry. Bayesian math combines a prior probability with observed data to produce an estimate of the likelihood of the hunger hypothesis.

This describes Bayesian stats mostly correctly (in my novice opinion) but focuses too much on the price (the need to specify the prior) rather than the payoff you get (probability distributions that are easily interpretable under conventional notions of probability).

Here, I think, Siegfried further obscures what’s going on with the enthusiasm for Bayesian ways of analyzing data:

But Bayesian methods introduce a confusion into the actual meaning of the mathematical concept of “probability” in the real world. Standard or “frequentist” statistics treat probabilities as objective realities; Bayesians treat probabilities as “degrees of belief” based in part on a personal assessment or subjective decision about what to include in the calculation. That’s a tough placebo to swallow for scientists wedded to the “objective” ideal of standard statistics. “Subjective prior beliefs are anathema to the frequentist, who relies instead on a series of ad hoc algorithms that maintain the facade of scientific objectivity,” Diamond and Kaul wrote.

No, no, no. Bayesian methods do not introduce confusion into the concept of probability. Classical statistics did that. Bayesian statistics clarifies probability — makes it into a human measure, not some pseudo-objective long-run construction.


One response to “The price and payoff of Bayesian statistics

  1. I think the frequentist statistics have the advantages and disadvantages, the same as Bayesian stats. The freq stats are the most widely used because it make difficult problems and models tractable using scalar statistics, and made direct inferences that although relies strongly in asymptotic distribution provide an inference which everybody agrees in the result. The difficulty of model properly the prior distribution have as result additional discussion over this step of inference and not over the results of the inference. Also the freq def of probability is more intuitive and provide a clear meaning to statement p=0.68. A “degree of belief” is difficult to interpret and to compare with experiment. Also many difficulties in freq stats arise from the fact their method where develop at early 20 century. The lack of numerical power only make possible use simple statistics as mean, variance, kurtosis, etc. and compare the observed value with a table. The Bayesian statistics can not be properly compute the in the complicate cases, and only with the arising of tractable numerical aproximations is that the Bayesian methods become competitive. I think that with new research in more general and computational intensive frequentist inference methods many of the problems of this aproach can be resolve at least in part.

    To be fair I like some points of Bayesian statistics, specially the fact the probabilities are not related with inherent randomness but the ignorance of causes of phenomena. That make me more sense that an real randomness in nature. Also make the inference straightforward with Bayes theorem.

    I feel that both system have good points and weak points. May be in the future will be discover a new inference method which posses the characteristics of both system and will surpass them.