# Moving the (Corrected) Calculations from Last Week’s Shiller Stock Market Posts into R…: Afternoon Note

Because only people who really, really, really want to make bad mistakes do things in the un-debuggable Excel (or Numbers)…

The (corrected) calculations for last weekend’s http://delong.typepad.com/sdj/2014/08/under-what-circumstances-should-you-worry-that-the-stock-market-is-too-high-the-honest-broker-for-the-week-of-august-16.html and http://delong.typepad.com/sdj/2014/08/under-what-circumstances-should-you-worry-that-the-stock-market-is-too-high-the-honest-broker-for-the-week-of-august-16.html:

Import data from http://delong.typepad.com/20140824_Shiller_Data.csv…

…define a function (shift) for constructing leads and lags, and perform elementary data manipulations: construct variables DATE (1871:1-2014:7), REAL_PRICE (1871:1-2014:7), REAL_DIVIDENDS (1871:1-2014:7), REAL_EARNINGS (1871:1-2014:7), MA10_EARNINGS (1881:1-2014:7) (the trailing 10-year moving average of real earnings that Campbell and Shiller use as their estimate of cyclically-adjusted permanent earnings), CUMULATIVE_RETURN (the cumulative return on a reinvested index portfolio since 1871 (1871:1-2014:7), LEAD10RETURN (1881:1-2004:7) (the 10-year forward realized annual rate of return), LEAD20RETURN (1881:1-1994:7) (the 20-year forward realized annual rate of return), CAPE (1881:1-2014:7) (the Campbell-Shiller cyclically-adjusted price-earnings ratio), and EXCESS_RETURNS (1871:1-2004:7) (the difference between the 10-year forward annual realized rate of return and the cyclically-adjusted earnings yield):

{r, echo=FALSE} # start the R block Shiller <- read.csv("~/Dropbox/20140824_Shiller_Data.csv") View(Shiller) # read in the data.frame and name it "Shiller"DATE = Shiller$DATE REAL_PRICE = Shiller$REAL_PRICE REAL_DIVIDENDS = Shiller$REAL_DIVIDENDS REAL_EARNINGS=Shiller$REAL_EARNINGS MA10_EARNINGS = Shiller$MA.10._OF_EARNINGS CUMULATIVE_RETURN = Shiller$CUMULATIVE_RETURN # pull out the variablesCAPE = REAL_PRICE/MA10_EARNINGS # define the Campbell-Shiller cyclically-adjusted price-earnings ratio--CAPE--as the real price divided by a 10-year trailing moving average of real incomeshift<-function(x,shift_by){ stopifnot(is.numeric(shift_by)) stopifnot(is.numeric(x)) if (length(shift_by)>1) return(sapply(shift_by,shift, x=x)) out<-NULL abs_shift_by=abs(shift_by) if (shift_by > 0 ) out<-c(tail(x,-abs_shift_by),rep(NA,abs_shift_by)) else if (shift_by < 0 ) out<-c(rep(NA,abs_shift_by), head(x,-abs_shift_by)) else out<-x out } # construct a function to create leads and lags of variablesLEAD1MORETURN = (shift(REAL_PRICE,1) + REAL_DIVIDENDS/12)/REAL_PRICE-1 LEAD1YRRETURN = (shift(CUMULATIVE_RETURN,12)/CUMULATIVE_RETURN) - 1 LEAD10RETURN = (shift(CUMULATIVE_RETURN,120)/CUMULATIVE_RETURN)^(1/10)-1 LEAD20RETURN = (shift(CUMULATIVE_RETURN,240)/CUMULATIVE_RETURN)^(1/20)-1 EXCESS_RETURN = LEAD10RETURN - 1/CAPE # define the 1-mo return, the 10-year realized forward return, the 20-year realized forward return, and the deviation of realized returns from the CAPE earnings yield

Make sure we have all the data and they look like they should...

{r, echo=FALSE} plot(DATE,REAL_PRICE, main="Real Stock Index Price", xlab="Date", ylab="Real Stock Index Price", pch=16, cex=0.5) plot(DATE,REAL_DIVIDENDS, main="Real Stock Index Dividends", xlab="Date", ylab="Real Stock Index Dividends", pch=16, cex=0.5) plot(DATE,REAL_EARNINGS, main="Real Stock Index Earnings", xlab="Date", ylab="Real Stock Index Earnings", pch=16, cex=0.5) plot(DATE,MA10_EARNINGS, main="Cyclically-Adjusted Real Earnings", xlab="Date", ylab="10-Yr MA of Trailing Real Earnings ", pch=16, cex=0.5) plot(DATE,CAPE, xlab="Date", main="Campbell-Shiller Cyclically-Adjusted Price-Earnings", ylab="Campbell-Shiller CAPE", pch=16, cex=0.5) plot(DATE,LEAD10RETURN, main="Realized Ten-Year Forward Returns", xlab="Date", ylab="10-Yr Forward Realized Annual Rate of Return", pch=16, cex=0.5) plot(DATE,LEAD20RETURN, main="Realized Twenty-Year Forward Returns", xlab="Date", ylab="20-Year Forward Realized Annual Rate of Return", pch=16, cex=0.5) plot(DATE,EXCESS_RETURN, main="Realized Ten-Year Excess Return Over CAPE Earnings Yield", xlab="Date", ylab="10-Year Excess Return Over 1/CAPE", pch=16, cex=0.5) plot(DATE,LEAD1MORETURN, main="One-Month Returns", xlab="Date", ylab="1-Month Realized Forward Return", pch=16, cex=0.5, xlim=c(1870,2015)) plot(DATE,LEAD1YRRETURN, main="One-Year Returns", xlab="Date", ylab="1-Year Realized Forward Return", pch=16, cex=0.5, xlim=c(1870,2015)) # plot everything and look at it

Yes, everything as it should be so far...

Now on to the analysis proper...

Let's start with the simplest possible forward-return regression: regressing the ten-year future realized return in the Campbell-Shiller stock index database on the Campbell-Shiller cyclically-adjusted earnings yield INVERSECAPE--the inverse of the CAPE:

{r, echo=FALSE} INVERSECAPE = 1/CAPE return_regression_2.lm = lm(formula = LEAD10RETURN ~ INVERSECAPE) summary(return_regression_2.lm)

In response to:

Call: lm(formula = LEAD10RETURN ~ INVERSECAPE)

R reports:

Residuals: Min 1Q Median 3Q Max -0.106298 -0.030839 0.002955 0.028179 0.103866 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.007659 0.002878 -2.661 0.00788 INVERSECAPE 0.995904 0.036513 27.275 < 2e-16Residual standard error: 0.04284 on 1482 degrees of freedom (240 observations deleted due to missingness) Multiple R-squared: 0.3342, Adjusted R-squared: 0.3338 F-statistic: 743.9 on 1 and 1482 DF, p-value: < 2.2e-16

The significance levels that R reports are wrong: its naive regression package assumes that each of the 1482 observed 10-year returns is independent of each of the others. They are not. Each monthly return shows up as a component in 120 10-year returns. The right t-value for the cyclically-adjusted earnings yield INVERSECAPE is not 27.3 but rather something between 4 and 5--still highly, highly significant.

More important, a third of the variance in future 10-year returns is accounted for by knowing the value of INVERSECAPE. More important, the intercept is zero and the coefficient is 1. More important, the ability of the earnings yield to forecast future 10-year returns remains highly, highly significant. More important, you get these not just by knowing what INVERSECAPE is and then performing some linear transformation on it, but by just the INVERSECAPE itself. What this equation tells us is that, since 1881, 0 + 1 x INVERSECAPE is a remarkably good linear forecast of ten-year future returns.

{r, echo=FALSE} plot(INVERSECAPE,LEAD10RETURN, main="Realized Ten-Year Forward Returns vs. CAPE Earnings Yield", xlab="CAPE Earnings Yield", ylab="10-Year Realized Foreward Returns", pch=16, cex=0.5) abline(lm(LEAD10RETURN ~ INVERSECAPE))

Note that this particular functional form for understanding how knowing CAPE should shape your forecast of future returns is not important. INVERSECAPE is convenient because it comes in the same units as returns. But regressing future long-run returns on CAPE itself does about as well. Submit:

{r, echo=FALSE} INVERSECAPE = 1/CAPE return_regression_2.lm = lm(formula = LEAD10RETURN ~ CAPE) summary(return_regression_2.lm)

And R spits out:

Call: lm(formula = LEAD10RETURN ~ CAPE) Residuals: Min 1Q Median 3Q Max -0.116777 -0.029650 0.004347 0.028478 0.093157 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.1383475 0.0029889 46.29 <2e-16 CAPE -0.0045885 0.0001727 -26.57 <2e-16Residual standard error: 0.04321 on 1482 degrees of freedom (240 observations deleted due to missingness) Multiple R-squared: 0.3226, Adjusted R-squared: 0.3221 F-statistic: 705.8 on 1 and 1482 DF, p-value: < 2.2e-16

The same 1/3 of variance accounted for. The same parameter at the median of the distribution. This formulation suggests that expected ten-year real returns turn negative at a CAPE value of above 30--that offsetting the 3.3% real earnings yield at that point is an anticipated equal decline in valuation metrics over the next ten years, but the plot reveals that this prediction relies heavily on the linearity. Submitting:

{r, echo=FALSE} plot(CAPE,LEAD10RETURN, main="Realized Ten-Year Forward Returns vs. CAPE ", xlab="CAPE", ylab="10-Year Realized Foreward Returns", pch=16, cex=0.5) abline(lm(LEAD10RETURN ~ CAPE))

produces from R:

Basically what we know about expected returns is that on the one occasion when CAPE rose above 30, the dot-com crash of 2000 was in the near future and the housing crash of 2008 came into the ten-year return window. That is not much information on which to base a long-run "sell" decision.

Let's think about not what economists call risk--variation about expected returns--but what people call risk: the chance that your money won't be there in real terms. The lowest realized 10-year returns did indeed come when the CAPE was at its highest and thus the earnings yield INVERSECAPE was at its lowest. But the second-lowest returns happened when CAPE was not high but normal. And the other periods of negative realized returns happened when CAPE was high, but not that low. Plus there is a lot of mass of the distribution with both high CAPE and very healthy positive returns. What's going on?

{r, echo=FALSE} plot(DATE,LEAD10RETURN, main="Realized Ten-Year Forward Returns", xlab="DATE", ylab="10-Year Realized Foreward Returns", pch=16, cex=0.5) plot(CAPE,LEAD10RETURN, main="Realized Ten-Year Forward Returns", xlab="DATE", ylab="10-Year Realized Foreward Returns", pch=16, cex=0.5)

There are only four historical periods during which a ten-year investment in the S&P has not at least held its real value: ten years before the post-World War I deflation and the post-WWI depression of the start of the 1920s; (barely) in the Great Depression and the WWII inflation; 10 years before the stagflation of the 1970s and the subsequent Volcker depression; and 10 years before the recent financial unpleasantness for those dates where the ten-year return window includes both the dot-com and the housing-bubble crashes.

There is little more to be squeezed out of this particular data set.

{r, echo=FALSE} return_regression_3.lm = lm(formula = LEAD10RETURN ~ INVERSECAPE + CAPE + CAPESQ) summary(return_regression_3.lm)

All the data will say is that once the CAPE earnings yield is known there is absolutely no point in adding either CAPE or CAPE^{2} in the hopes of picking up some predictive ability via curvature, while the computer does have a (weak) preference for placing predictive weight on the yield if it is added to the regression:

Call: lm(formula = LEAD10RETURN ~ INVERSECAPE + CAPE + CAPESQ) Residuals: Min 1Q Median 3Q Max -0.110743 -0.029043 0.002934 0.028354 0.099453 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.550e-02 2.612e-02 0.976 0.329 INVERSECAPE 7.356e-01 1.268e-01 5.801 8.07e-09 CAPE -2.194e-04 1.559e-03 -0.141 0.888 CAPESQ -3.578e-05 2.679e-05 -1.336 0.182Residual standard error: 0.0421 on 1480 degrees of freedom (240 observations deleted due to missingness) Multiple R-squared: 0.358, Adjusted R-squared: 0.3567 F-statistic: 275.1 on 3 and 1480 DF, p-value: < 2.2e-16

Sources: http://delong.typepad.com/20140824_shiller_stock_data.rmd | http://delong.typepad.com/20140824_Shiller_Data.csv