Looking back at our 2021 forecasts

February 10, 2022
At Deck, we help progressive campaigns and organizations reach the right voters.

To do that, we’ve developed models that predict who a voter will support, how elastic that support might be, how a voter might cast their ballot, and more. Then, to make those predictions actionable, we’ve built software that guides users through the process of building great lists for persuading voters, mobilizing supporters, and raising campaign funds across a range of outreach tactics.

In 2021, we focused mostly on the elections in Virginia and New Jersey. In another post, my teammate CG shared the impact our targeting tools had in those races. (In short, campaigns that used Deck did 2.2 points better than those that didn’t relative to results in the same districts from 2019.)

Here, I want to focus on a different use case for our data: forecasting election outcomes.

Overall, the mean absolute error of our 2021 forecasts was just 2 points. For comparison, the mean absolute error of polls gathered by FiveThirtyEight across all races in 2020 at the congressional level and up was 6 points. In this post, I’ll unpack our accuracy a bit more, discuss what methodological choices may have helped us, share how we think our forecasts could be helpful in future races, and preview some improvements we’re working on for 2022.

How our forecasts performed

In Virginia House of Delegates races, our forecasts were off by an average of 2.2 points. In the Virginia gubernatorial race, our estimated 50% vote share for Terry McAuliffe was off by 1 point. In New Jersey State Senate races, our forecasts were off by an average of 2.8 points. In New Jersey General Assembly races, we were off by an average of 2.2 points. And in the New Jersey gubernatorial race, our estimated 54% vote share for Phil Murphy was off by 3 points.

You can download a CSV showing our forecasts and the corresponding actual results here.

We were pretty delighted by these results! Exactly 95% of results fell within our model’s 95% margin of error and the correlation between actual and forecasted results was extremely strong — even in cases where the outcome was highly lopsided, a situation our forecasts have struggled with in previous years.

What makes our approach special

We’re obsessed with polling at Deck. Our Slack is littered with interesting new findings from our friends at organizations like Data for Progress and Change Research. There’s no better way to find out what’s motivating voter behavior than to directly ask a random sample of voters and then trust smart pollsters to make statistical sense of their answers.

However, as people transition their methods of communication, become increasingly distrustful of the media and the polls they sponsor, and oscillate between feeling engaged and disinterested in politics based on who is in power, it’s harder than ever to get a good representative sample and figure out the right way to weight the sample’s responses. And while it’s OK to be directionally right on questions of issue support, people (unfairly!) expect much greater precision in election horse race polling.

That’s why our methods completely avoid survey responses. We think a healthy, well-informed progressive ecosystem should have access to forecasts and individual-level predictions developed through a variety of methods — in case any single approach is having a bad year for reasons we couldn’t have anticipated.

We train our models on historic precinct-level election results, data on each candidate’s media coverage gathered through partnerships with organizations like Aylien and Critical Mention, itemized campaign finance data gathered from dozens of state and local jurisdictions, candidate issue stances gathered by organizations like VoteSmart, candidate demographics gathered by organizations like Ballotpedia and the Reflective Democracy Project, district-level economic indicators, and the demographic and socioeconomic traits of the voters responsible for each precinct’s election results.

The idea is to map hard data on how small groups of voters (or, in the case of turnout and voting method, individual voters) actually behaved in past elections to a range of contextual factors that might explain their behavior — or at least correlate with it. Some of the most powerful features in our models include the demographic composition of each candidate’s in-district donors (in non-federal races) and the volume/sentiment of each candidate’s recent local media coverage.

Additionally, by not relying on surveys, we’re able to generate predictions quickly, cheaply, and frequently. This gives us a window into every state legislative and congressional race in the country — including many that would never benefit from polling. And it means we can provide our data to interested organizations without charging nearly as much as we would need to with a survey-based approach.

Why we think our forecasts did so well in 2021

Our models spotted warning signs in Virginia and New Jersey pretty early. Our initial forecasts for the Virginia House of Delegates in May 2021 gave Democrats only a 52% chance of holding the chamber.

To better understand what drove our forecasts’ early skepticism, we can look at the SHAP values for a selection of features in our forecasting model, as applied to the Terry McAuliffe campaign for Governor. (In short, SHAP values estimate the impact each variable has on a model’s outcome. A deeper explanation can be found through this explainer.)

This output gives us some clues. Comparing McAuliffe’s data to data from 2017, when Ralph Northam won the governor’s race by 9 points and Democrats flipped 15 Republican-held seats in the House of Delegates, helps us make more sense of it. For example:

  • While Northam and McAuliffe had raised similar amounts of money by this point, Northam had brought it in from nearly twice as many individual donors. Additionally, Northam had more than three times as many in-state donors. While Northam had 76% of total in-state contributions by May 2017, McAuliffe only had 48% of in-state contributions by May 2021. This can help us understand why variables covering the number of individual contributions going to each campaign had a negative influence on McAuliffe’s forecast.
  • While some key economic indicators were stable from 2017 to 2021 despite the pandemic (e.g. the state unemployment rate and first-time jobless claims), others were showing concerning signs. The consumer price index for goods in Virginia had grown by 12%, household expenditures had grown by 11%, and the workforce participation rate had declined 4%. This helps to explain why variables covering state and local economic indicators and the party currently in power had a negative impact for McAuliffe.
  • Meanwhile, media coverage of Democratic campaigns up and down the ballot in local news outlets had become more negative. For example, in 2017, the average Democratic candidate in Virginia had 47% of all in-district media coverage in their race and 26% of the negative coverage. In 2021, the average Democratic candidate had 58% of total coverage and 49% of the negative coverage. With more Democratic incumbents in the state legislature, enhanced scrutiny should be expected. But in reviewing common keywords in articles with negative sentiment, we also see many references to right-wing attacks that appear to have gained traction in mainstream press. We use simple NLP models to classify the type of content in each candidate’s media coverage, and we then interact the resulting feature vectors with more basic features covering the volume and sentiment of coverage. This might help to explain the overall negative impact of our media coverage features.

So in the end, our model found a healthy amount of evidence suggesting that the election was going to be very close despite recent Democratic wins across the state.

However, this doesn’t mean we’ve unlocked the secret code for predicting all elections. Our models are probabilistic, and we expect our vote share estimates to be somewhere within the margin of error about 95% of the time. That means it shouldn’t be unusual for us to be a few points off a fair amount of the time.

And if there’s an issue in our underlying data or modeling code, or if something foundational about what our variables are capturing has changed, we really might be in trouble! Which, again, is why we’re trying something different. A reliance on multiple approaches can help our movement avoid single points of failure.

How we intend to continue advancing our methods

In 2020, our forecasts had a dip in accuracy — moving from the ~2 point median absolute error we’ve maintained since 2015 to a ~4 point median absolute error. While that’s still pretty good by most standards, it led us to think about what we could do better.

First, we untangled some wrongheaded changes we had made for 2020 — including incorporating a polling average into our forecasts (negating the idea that we were offering something truly different).

We also tried out new data structures to mitigate the risks of ecological inference, given that our models are trained on aggregate data. In the end, we migrated to deep learning models, representing the traits of a precinct’s voters and an election’s candidates with lots of arrays, all nested snugly in a tensor.

And finally, we rebuilt our election results data from scratch. We documented this process here, and we now have a great resource that we can share with the broader progressive community.

Going forward, we’re going to be experimenting with new deep learning architectures and new ways of estimating uncertainty. We also hope to start representing how each voter’s profile has changed over time, in case changes in a person’s circumstance (and not just their present circumstance) have predictive power. And as people get sorted into new districts, we want to be more creative about representing traits like incumbency.

If you have any ideas for us, please let us know! We’re thrilled to be a part of the progressive data ecosystem and want to keep finding new ways to better serve our allies.

More Like This