It sounds like, to the extent that you’re making judgment calls, you’re using them to make the model less sure of itself.
I think when you build models like this, there are a lot of things you optimize for that probably result in some mild overfitting [when a model fits the existing data too well, it can be a sign that the model is too focused on explaining what happened in the past and not enough on predicting the future]. And so given that you kind of unconsciously make a lot of those decisions, it’s usually good to put a little bit of a finger on the scale toward having more uncertainty. I was a little surprised when we first turned it on this year, I thought that Biden would be higher. But, you know, sometimes it’s good when a model doesn’t exactly match your intuition.
It’s the old Bill James line, right? If you build something and it spits out 10 names, you want eight of them to be names that you expected. Because if it’s 10, it’s not adding anything to your knowledge base. And if it’s three, well, it’s maybe wrong.
Yeah. If you look back to 1972, there are a lot of elections since then where things move by six or seven or eight points or more between early August and Election Day. The reason I say six or seven points is although Biden is up by more like eight points in the national average, he’s up by six or seven points in the tipping point states. So what’s the chance of a six or seven point polling error? I think what the model is saying is that if you combine the movement that could occur before election day and then error on election day itself, it’s still 28% or so. Now to be clear, if Biden holds his current numbers then it would shoot up to probably around 90%, low 90s by Election Day, which is still a long way from 99% or something.
But obviously there’s some theories that say hey, because of increased polarization, because there are fewer swing voters and there aren’t many undecideds, the average has not been very volatile this year. There’s more polling. There are all these perfectly reasonable theories that explain why variance, and what we call drift, between now and election day would be smaller than usual. And that’s a counterweight against COVID, and the economy has never posted numbers like this, and shit’s crazy, right? There’s an unprecedented amount of news. And so we wind up basically assuming that variance will be as high as it is, on average, in the historical data set, instead of assuming that we’ve solved this problem. In 2016, you’d have wound up saying the variance is probably somewhere more in the middle, because there were so many undecided voters and third party voters in 2016, and that year the polling average was volatile. This year, there are not many undecideds, and no real third party—sorry Kanye. The polling average has been stable, but we just can’t ignore the elephant in the room, which is COVID.
We’re in a world now where COVID has happened and shit is unquestionably weird. You always have to account for the possibility of unusual circumstances but does the fact that we’re already way off the normal path impact how you go about building the model this time around?
I think when you wind up with an edge case, then you want to kind of think through, am I doing things in ways that generalize well, that’d also handle this case well? Like, if you’re working on an NBA metric and it has Giannis way lower than you would think, you might examine something and say, Okay, well, why is he good?
Right. Is he good in a way that breaks the model, and that’s what makes him good, or is he good in a way that the model should account for and thus we should adjust the model?
Right. And ideally, like, oh, you adjusted it. Now it also handles Kevin Durant better, another player that for some reason we’re underrating. And so you’re looking and saying, Okay, can I come up with generalizations? Certainly, it seems that if the economy is kind of going haywire, where you have millions unemployed and millions become reemployed and GDP falls by 36% annualized or whatever, then economic uncertainty is high. And you can measure that. . But all we’re really saying is, if you go back and look at the history of elections for which we have rigorous polling, an eight point lead in the popular vote in August translates into Biden winning the popular vote 80-something percent of the time, 82% or something. However, because Trump has an Electoral College advantage, it’s 72%, which in some ways is still fairly high.
This interview has been edited and condensed.