Michael Jordan: On Computational Thinking, Inferential Thinking and Data Science
Should-Read: We are moving from a statistical culture that focused on the taming of sampling variation to a different statistical culture that… what? Guarding against overfitting and computational economy seem to be the most important goals, and they are linked And behind everything lurks the problem of induction: in what ways are we justified in assuming that the future will be like the past, and in what ways are we not? For if we assume the future will be like the past in ways that it will not, we are simply hosed: Michael Jordan: On Computational Thinking, Inferential Thinking and Data Science: “The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis…
…That classical perspectives from these fields are not adequate to address emerging problems in Data Science is apparent from their sharply divergent nature at an elementary level—in computer science, the growth of the number of data points is a source of “complexity” that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of “simplicity” in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as “runtime” in core statistical theory and the lack of a role for statistical concepts such as “risk” in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and including a surprising cameo role for symplectic geometry…