Should-Read: Drew Conway: The Data Science Venn Diagram “The primary colors of data: hacking skills, math and stats knowledge, and substantive expertise…

…each… very valuable, but when combined with only one other are at best simply not data science, or at worst downright dangerous…. Being able to manipulate text files at the command-line, understanding vectorized operations, thinking algorithmically… the hacking skills… apply[ing] appropriate math and statistics methods… [but] data plus math and statistics only gets you machine learning, which is great if that is what you are interested in, but not if you are doing data science… [which] is about discovery and building knowledge….

The hacking skills plus substantive expertise danger zone… people who “know enough to be dangerous”… capable of extracting and structuring data… related to a field they know quite a bit about and… run a linear regression… lack[ing] any understanding of what those coefficients mean. It is from this part of the diagram that the phrase “lies, damned lies, and statistics” emanates, because either through ignorance or malice this overlap of skills gives people the ability to create what appears to be a legitimate analysis without any understanding of how they got there or what they have created.

Fortunately, it requires near willful ignorance to acquire hacking skills and substantive expertise without also learning some math and statistics along the way. As such, the danger zone is sparsely populated, however, it does not take many to produce a lot of damage…

August 16, 2017


Brad DeLong
