How did we get it so wrong? All of the fancy data-driven electoral maps and visualizations painted a seemingly objective election forecast. Yet the smooth victory for Hillary Clinton that much of the mainstream media anticipated failed to materialize.
But blindly blaming pollsters for not anticipating this year’s election results simplifies a rather complex problem. In fact, the idea that polling forecasts were outrageously out of kilter with reality is based on a gross misconception.
Many people implicitly assume that the national election forecasts were predicting the Electoral College outcome. This, however, is not true. Most of the national polls before the election were anticipating the outcome of the popular vote.
Real Clear Politics averaged the results of major polls before the election and predicted a 3.3 percentage point lead for Clinton over Trump. The reality wasn’t too far off. It turns out that Clinton did in fact win the popular vote by at least 1.5 points (over 2 million votes), which is well within the margin of error (and better than the predictions during the preceding 2012 election).
AI Weekly
The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.
Included with VentureBeat Insider and VentureBeat VIP memberships.
However, since the electoral vote is all that ultimately matters, state-level polls – not national ones — are the essential building blocks for forming an accurate prediction of the US presidential elections. Real Clear Politics calculated averages of state polls in the key states of Pennsylvania, Wisconsin and Michigan, predicting a win for Clinton in all three. However, Trump ended up winning each of the states (by +1.1% in Pennsylvania, +0.8% in Wisconsin and by only +0.2% in Michigan).
But why did state polling yield these errors? Since the sample size of opinion polls at a state level is usually smaller, their results can often show greater variability. Small changes in the demographic make-up of who turns up to vote can therefore have dramatic effects which candidate wins. This year, many state polls most likely underestimated the extent to which rural voters with, on average, lower levels of education would come to vote for Trump. In the next election, it’s clear that more effective polling needs to be conducted in the swing states with larger samples and more research into who will turn up on the polling day.
So how can the opinion polling survive in today’s very politically polarized and technology-driven landscape? The traditional method of phone interviews on landlines is archaic and slowly declining. It also suffers from biases as it is difficult for polling companies to actually reach people on their landlines and access a representative sample of the population.
Some critics also say that phone polling sometimes leads to dishonest responses. November’s election gave rise to the notion of ‘shy Trumpers’, in which Trump supporters would find it socially undesirable to declare their true political desires on the phone, yet comfortably vote in the anonymity of the polling booth. In a study by Politico and Morning Consult during the Republican primaries, this appeared not to be the case. Their study found that there may have been a slight tendency for people to under-report their decision to back Trump compared to other Republican candidates, but only among college-educated and higher-income voters.
Many commentators similarly attributed forecasting problems during the UK’s Brexit vote to ‘shy Tories’ and there seems to be more promising evidence to back this up. Averaging the results of the Internet opinion polls resulted in a 1.2 percentage point lead for the “leave” camp, whereas the average forecast for phone-only polls was a 2.6-point lead for “remain”.
Online polls therefore open up opportunities to conduct surveys in innovative and potentially less biased ways. However, they are still riddled with the same difficulties of ensuring that the sample represents the whole population. There is also the risk that bots and Internet trolls will seize upon these polls to deliberately sway the outcome. But if these barriers are addressed, the potential upside is huge. Pollsters would be able to access a large pool of social media data, giving them a far richer demographic background of the audience than is currently possible with phone polling.
The cost of conducting online surveys is also not as onerous in terms of time and money. This would encourage more and more companies to conduct polls, improving the overall accuracy when aggregating the survey results across different opinion polls.
But it’s not just the medium of polling that needs to change. Better use of big data is also essential if pollsters want to be at the top of their game. Many big data startups have already played an important role in political campaigns.
Civis Analytics, a startup cofounded by key members of Obama’s 2012 campaign’s data team, has developed a rich database of 250 million American adults. This data has been used to improve the process of phone polling. For example, Enroll America, a nonprofit set up by the Obama administration to increase enrollment in the Affordable Care Act, wanted to learn how to identify individuals without health insurance.
Civis achieved this by making targeted calls to people who were already in its database to ascertain whether they had health insurance. The startup then married these results with information it already had about the individuals to try to identify predictive factors behind having insurance coverage, such as socioeconomic background, consumer history, and geography. The polling and market research firms of the future need to move beyond simply surveying a snapshot of the population and seek to understand the underlying demographic and historical drivers of voting behavior.
Pollsters have a tricky job of predicting an uncertain future. But they themselves will face an uncertain future if they don’t adapt to the digital world. Opinion polling will need to move online, and big data needs to play a greater role in determining voting turnout. But it is naïve to believe pollsters are the main culprits for this year’s election surprise.
Much of the media’s unstoppable drive to paint Trump as a ruthless demagogue left little room for careful analysis of the data and reporting on his thriving support among the white working people outside of cities.
For the layperson, this election serves as a reminder to always scrutinize the facts, understand the limits of forecasts and always be ready for a surprise.
Jay Patani is a Tech Evangelist for Valo, a real time analytics software company. He advises large financial services institutions on big data strategy. He is also a freelance writer, covering technology trends and startups.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More