Must-Read: Cosma Shalizi (2010): The Bootstrap

Must-Read: Cosma Shalizi (2010): The Bootstrap: “That these [statistical] origin myths invoke various limits is no accident… https://web.archive.org/web/20100518171527/http://www.americanscientist.org:80/issues/pub/2010/3/the-bootstrap/2

The great results of probability theory—the laws of large numbers, the ergodic theorem, the central limit theorem and so on—describe limits in which all stochastic processes in broad classes of models display the same asymptotic behavior. The central limit theorem (CLT), for instance, says that if we average more and more independent random quantities with a common distribution, and if that common distribution is not too pathological, then the distribution of their means approaches a Gaussian. (The non-Gaussian parts of the distribution wash away under averaging, but the average of two Gaussians is another Gaussian.) Typically, as in the CLT, the limits involve taking more and more data from the source, so statisticians use the theorems to find the asymptotic, large-sample distributions of their estimates. We have been especially devoted to rewriting our estimates as averages of independent quantities, so that we can use the CLT to get Gaussian asymptotics. Refinements to such results would consider, say, the rate at which the error of the asymptotic Gaussian approximation shrinks as the sample sizes grow….

The bootstrap approximates the sampling distribution, with three sources of approximation error… [1] using finitely many replications to stand for the full sampling distribution… brute force—just using enough replications—can also make it arbitrarily small… [2] statistical error… the sampling distribution changes with the parameters, and our initial fit is not completely accurate…[but] reduce the statistical error… [by] subtler tricks… specification error…. Here Efron had a second brilliant idea, which is to address specification error by replacing simulation from the model with resampling from the data…. Efron’s “nonparametric bootstrap” treats the original data set as a complete population and draws a new, simulated sample from it, picking each observation with equal probability (allowing repeated values) and then re-running the estimation…. This new method matters here because the Gaussian model is inaccurate….

Although this is more accurate than the Gaussian model, it’s still a really simple problem. Conceivably, some other nice distribution fits the returns better than the Gaussian, and it might even have analytical sampling formulas. The real strength of the bootstrap is that it lets us handle complicated models, and complicated questions, in exactly the same way as this simple case…

August 28, 2017

AUTHORS:

Brad DeLong
Connect with us!

Explore the Equitable Growth network of experts around the country and get answers to today's most pressing questions!

Get in Touch