Should-Read: Jasjeet S. Sekhon: Causal Inference in the Age of Big Data: “The rise of massive datasets that provide fine-grained information about human beings and their behavior offers unprecedented opportunities…

…for evaluating the effectiveness of social, behavioral, and medical treatments. With the availability of fine-grained data, researchers and policymakers are increasingly unsatisfied with estimates of average treatment effects based on experimental samples that are unrepresentative of populations of interest. Instead, they seek to target treatments to particular populations and subgroups. Because of these inferential challenges, Machine Learning (ML) is now being used for evaluating and predicting the effectiveness of interventions in a wide range of domains from technology firms to clinical medicine and election campaigns.

However, there are a number of issues that arise with the use of ML for causal inference. For example, although ML and related statistical models are good for prediction, they are not designed to estimate causal effects. Instead, they focus on predicting observed outcomes. Treatment effects, however, are never directly observed, and creating validation datasets where ground truth is known is difficult. Such validation is of particular importance because although ML algorithms have been designed to overcome prediction challenges when the data generating process is unknown, they cannot overcome bias when treatment assignment is a function of variables that are not observed…