A Mathematical Response to Scott Winship’s Analysis of The Great Gatsby Curve

While we know today’s big inequality and mobility news is Raj Chetty, Emmanuel Saez, and others’ new paper on variance in mobility over time — which we’ll be discussing shortly — we wanted to take a moment to look at the math behind some analysis derived from some of the previous data they have collected  from their Equality of Opportunity Project on variance in mobility over geography.

The “Great Gatsby Curve,” a term coined by Princeton economist Alan Krueger based on the work of Ottawa economist Miles Corak, shows that across developed countries, higher economic inequality is associated with lower mobility. This curve has sparked a great deal of debate particularly because the United States stands apart among the developed nations for its high inequality and low mobility.

In a recent foray into this debate, Manhattan Institute sociologist Scott Winship and Heritage Foundation research assistant Donald Schneider use data from the Equality of Opportunity Project to look at the relationship between inequality and mobility across U.S. “commuting zones,” which are groups of counties that form a common labor market. (The Equality of Opportunity Project assessed economic mobility by looking at the 2011-12 incomes of people born in 1980-82 and their parents’ incomes from 1996-2000.) Winship and Schneider conclude that this new data indicate that mobility is actually largely unrelated to economic inequality and that the core reason for low economic mobility is single motherhood.

I’ve reproduced their findings, but I don’t come to the same conclusions.  Winship and Schneider don’t weight the data properly and ignore too much of the data. Thus, I do not conclude that they provide compelling arguments to ignore economic inequality when discussing economic mobility.

Winship and Schneider use several data points to make their argument.  After showing that the share of households led by single mothers is negatively associated with mobility (very interesting but a little off topic), they reproduce the Great Gatsby curve for U.S. commuting zones.  They compare the difference between the income of families in a commuting zone’s 75th percentile and 25th percentile—also known as the interquartile range—with a measure of economic mobility.

Figure 1 shows the relationship Winship and Schneider plot between the interquartile income range and the gap in expected economic mobility between the children from the richest and poorest families, as measured by the difference in their expected adult income.  The upward sloping line was calculated using a simple regression and indicates that areas with larger income gaps among parents in 1996-2000 have larger income gaps between their children in 2011-12 (Note that children’s incomes are measured in 2011-2012, when they are about 30, and their parental incomes are also measured around 1996-2000, when the ‘children’ are 15-20, as explained in the second paragraph of this paper).

Figure 1: Income Concentration vs. Difference in Relative Mobility

Figure 1

Then Winship and Schneider move on to look at the top 1%. Here’s where I think their analysis really does not hold up.  First let me be clear about what Winship and Schneider mean by “We-are-the-99%” inequality.  They are referring to the fraction of a region’s total income that went to someone with an income in the top 1% nationally. Of the 741 commuting zones in the dataset, 629 had data on the share of the population in the national top 1% so commuting zones were omitted from the analysis. Figure 2 below replicates their “We-are-the-99%” plot in blue.

Figure 2: Income Concentration vs. Relative Mobility

Figure 2

Winship and Schneider’s analysis equally weighs each commuting zone for both the “We-are-the-99%” inequality measure and the interquartile range. Under their formulation, the Eagle Butte, South Dakota and Los Angeles, California commuting zones are equally weighted.

Winship and Schneider smartly turn their focus to the largest 100 commuting zones because two outliers were drove their “We-are-the-99%” findings. However, this means that they ignore more than 85% of the data.

Further, excluding all but the largest 100 commuting zones does not remove the weighting problem because the size of the top 100 zones still varies considerably.  The ratio between the population of the largest and smallest zones of the largest 100 is about 28:1 compared to 13,741:1 for the smallest of the whole set.  The best way to avoid this problem is to use population weights in the regression instead of simply ignoring more than 85% of the data. Specifically, we used a weighted least squares regression as implemented in both R and Stata.

When we use the correct weights, we find a different answer.  Population-weighted regressions find that the concentration of income in the top 1% is associated with larger gaps between mobility of children from lower- and higher-income families. The orange line in Figure 2 is the population-weighted line of best fit.

Unfortunately, Winship and Schneider spend less time looking into several other measures of economic mobility in the Equality of Opportunity Project data, including measures of absolute mobility and a second measure of relative mobility. The absolute mobility measure is the expected economic outcome for children in families with below the median income. The second measure of relative mobility is the likelihood that someone raised in the bottom quintile makes it to the top quintile.  By focusing on one of the relative mobility measures, they miss the other, stronger relationships.

All of these regression results are in Table 1.  For the absolute mobility, a negative coefficient indicates a lower income for children born to low-income families.  For the first relative mobility measure, a positive coefficient indicates larger gaps in income. In the second relative mobility measure, a positive coefficient means that fewer children from low-income families will make it to the top. The bottom line is that in each of these regressions higher inequality means lower economic mobility.

Table 1: Population-Weighted Regressions for Two Inequality Measures and Three Mobility Measures

Intercept p-value Coefficient p-value Adjusted R^2
Absolute Mobility vs. LN(IQR) 197.24 <0.001 -14.16 <0.001 0.2199
Relative Mobility vs. LN(IQR) -61.98 <0.001 8.67 <0.001 0.0598
Alt. Relative Mobility vs. LN(IQR) 123.41 <0.001 -10.46 <0.001 0.1593
Absolute Mobility vs. LN(1%) 39.87 <0.001 -1.39 <0.001 0.0371
Relative Mobility vs. LN(1%) 33.49 <0.001 0.58 0.083 0.0032
Alt. Relative Mobility vs. LN(1%) 6.72 <0.001 -1.15 <0.001 0.0336

Before making broad proclamations, it is important to understand the limitations in the data and the regressions.  While the coefficients are strongly statistically significant, most of these regressions explain only a small portion of the underlying variation in mobility between regions.  On the one hand, it shouldn’t be terribly surprising that a measure of inequality focusing on the top 1% is not a strong predictor of mobility measures that focus on quartiles. But, the issue may be data limitations.  Specifically, the data are only a snapshot of inequality and it would be useful to test these inequality measures over many years to determine the implications of inequality on mobility across a person’s lifetime.  Likewise, the data contained only mobility information for children born in 1980-82 and it would also be useful to test these results for more than a single cohort.

Winship and Schneider extend their weighting mistakes to their analysis of the size of the middle class and economic mobility. Winship and Schneider plotted the relationship for only the largest 100 commuting zones and claimed that it proved the size of the middle class does not matter for mobility.

As with the analysis above, a weighted regression has an important impact on the results for the size of the middle class, and each of the three economic mobility measures are in Table 2.  In each case, the size of the middle class is statistically significant and explains much of the variation in mobility.

Table 2: Population-Weighted Regression for the Size of the Middle Class and Three Mobility Measures

Intercept p-value Coefficient p-value Adjusted R^2
Absolute Mobility vs. Middle 19.38 <0.001 44.90 <0.001 0.4723
Relative Mobility vs. Middle 50.12 <0.001 -33.55 <0.001 0.1893
Alt. Relative Mobility vs. Middle -7.68 <0.001 32.55 <0.001 0.3293

While analysis of multiple measures of inequality and multiple measures of mobility suggest that economic inequality is negatively associated with economic mobility, much more analysis needs to be done to fully understand these complex relationships.  In particular, the mechanisms that drive these relationships should be understood.  I hope that this dispels some of the faulty analysis and encourages more rigorous analysis of these data.

Carter Price is a Senior Mathematician at the Washington Center for Equitable Growth.