Thursday, November 8, 2018

The Problem with Bootstrapping

If you’ve ever had someone run simulations of your financial plan, the whole process looks wonderfully scientific. Some software takes your financial plan and simulates possible future returns to see how your plans work out. But what assumptions are baked into this software? Here I use pictures to show the shortcomings of a technique called bootstrapping.

With monthly bootstrapping, simulation software chooses several months at random from the history of actual market returns to create a possible future. The simulator repeats this process many times to create many possible futures.

Instead of monthly bootstrapping, some simulators choose annual returns at random. Other simulators collect blocks of consecutive years. All these methods have their problems. Here we show the problem with monthly bootstrapping, but this problem applies equally well to annual bootstrapping.

I started with Robert Shiller’s online return data (http://www.econ.yale.edu/~shiller/data.htm) for total monthly returns of U.S. stock from July 1926 to September 2018 (1107 months). I then built 30-year portfolios with 2 methods:

1. Monthly bootstrapping
2. Rolling 30-year periods

For the bootstrapping, I simulated one billion 30-year returns. For each rolling period, I just started at a particular month in stock return history and collected 360 consecutive months of returns. To eliminate bias against months near the beginning and end of the historical data, I used “wrapping,” meaning that some 30-year periods began near the end of the historical data and wrapped about to July 1926 to complete the 30 years of returns.

So, while there were a billion bootstrapped returns, there were only 1107 rolling returns. This is part of the appeal of bootstrapping; you can create as many different possible futures as you like.

The following chart shows what the distributions look like. The distribution of rolling returns is very coarse because there are so few of them available in our stock market history. The bootstrap distribution also has bars, but they are so fine, we can’t see them.

The big thing to notice is that the two distributions don’t match well at all. Both curves have the same area under them, but historical returns are more bunched near the center. In fact, the rolling period results were within one percentage point of the mean return 47% of the time, but the bootstrapping results were within one percentage point of the mean only 28% of the time.

Why is this a problem? Because bootstrapping results are supposed to be realistic possible futures. If bootstrapping results don’t look much like the past, what makes us think they are a realistic model of the future?

Much of the theory of finance is built on a foundation of thinking in terms of annual returns. I repeated the process above for annual returns rather than 30-year returns. The next chart compares the bootstrapping and rolling distributions.

This time, the distributions match reasonably well. Someone who looks at just this chart could be forgiven for thinking that bootstrapping matches reality. But the small difference shown in this annual chart grows to the large difference we saw in the earlier 30-year chart.

Some will defend bootstrapping and claim the difference shown in the 30-year distribution chart isn’t enough to negatively affect portfolio simulations. This isn’t true. Actual 30-year returns are much less volatile than bootstrapping says they are. Testing financial plans with bootstrapping pushes people to higher bond allocations at too young an age.

What accounts for the difference we see in the 30-year distribution chart? It turns out that stock returns from one year to the next have correlations that bootstrapping eliminates. After stocks have had a good run, there is a tendency for them to have a below-average year. Similarly, stocks tend to have a good year after a poor run. This effect is too weak to exploit with market timing, but it does build up over the course of decades.

Does this mean we should be doing simulations using returns from rolling periods? Perhaps, but this method has its troubles too. There just aren’t enough rolling periods in history to draw statistical conclusions. The future likely won’t look exactly like any one period from the past.

But this isn’t an excuse to use bootstrapping. Actual returns show correlations over time. Bootstrapping strips away these correlations. There is a good quote William J. Bernstein attributes to Ralph Wagner about returns being like “an excitable dog on a very long leash.” For our purposes, the dog’s owner represents the collective fundamentals of U.S. businesses, and the dog represents stock prices. Fundamentals have volatility, but not as much as stock prices. Over one year, the dog can take you anywhere. The longer you own stocks, the more your returns follow the owner rather than the dog. We’re all still affected by the dog’s wanderings, but bootstrapping is like cutting the dog’s leash and letting it run wild.

None of this is any guarantee that stock market returns will take you where you want to go. There is no perfect way to peer into the future. Simulations that use bootstrapping have the appearance of scientific rigour, but their outputs have more decimal places than anyone can reasonably justify.