Writing about Donald Trump—especially nowadays—is likely to incite a heated discussion and lose you friends. Whether or not you’re for or against him and his politics, it’s interesting that so few of the political bigwigs managed to foresee that he would get this far. Specifically, Nate Silver and his statistical strategists at FiveThirtyEight have recently come under fire for giving Trump a 5% chance of winning the Republican nomination, and only upping it to 13% by the time the first state caucuses came around. So the question is: Why did this happen?
First things first. Nominations and primaries are incredibly complicated things that even the best statisticians have trouble predicting. Similar to forecasting the paths of hurricanes, the general method is to calculate the most likely outcomes based on the historical outcome of similar situations. Nate Silver has earned quite a reputation for being accurate in his predictions, but even he has mentioned that the data availability is spotty when it comes to the historical data used for this specific type of statistical model. This makes the whole endeavor tricky to begin with. FiveThirtyEight and Silver also weren’t using any formal models for their first predictions—something the media assumed—and the reason why is fairly interesting.
Silver himself said that one of their largest downfalls was sticking too much to history. Generally, primaries are forecasted by using current poll data combined with historical perceptions and reactions to a party’s specific policies and platforms. Donald Trump, being the nontraditional candidate that he is, is oftentimes unpredictable in his platform decisions and his statements to the public. This, combined with the fact that his political views frequently breach traditional party lines, makes it incredibly difficult to predict how the populace will perceive him. Trump is the kind of candidate that is not usually seen in the political circuits, which means that any historical data is, for the most part, not terribly useful. That’s not to say that if there was more historical data that Silver’s predictions would have been any more accurate, but it is the reason why, according to him, they weren’t using formal models for their prediction. Another pitfall was that they weren’t relying on active, real-time data such as current polls as much as they probably should have.
Really, the problem stemmed from FiveThirtyEight and Silver taking fairly “subjective odds” and having them interpreted as legitimate statistical data. According to Silver, this kind of statistical bubble can be directly compared to the mortgage crisis and real estate market in general. While some things can be predicted at least slightly reliably, there are loops and collapses that are entirely unforeseeable. Unprecedented outcomes happen all the time, after all.
Let’s be clear that this in no way is meant to incriminate Silver or the statisticians working with him. Their jobs are incredibly difficult, and working with incomplete or questionable data—the kind of data Silver was working with in this case—makes the entire situation like shooting blindfolded at best. Attempting to apply historical data to an ahistorical case—or completely ignoring historical data, even—is just as awkward as trying to create a statistical model for something that could be considered pure chaos. Certainly, not creating a model was a bit of a mistake, and any kind of prediction that needs to be accurate requires attention to be paid to current analytics as well as historical data.