Some polls predicting 2020 US presidential election results appear to have missed the mark. Aggregator RealClearPolitics showed former Vice President Joe Biden a 7-point advantage over current President Donald Trump, while FiveThirtyEight suggested that Biden had at least an 8-point advantage on average nationally. In reality the race was much closer. For example, in Florida, where FiveThirtyEight showed a 2.5 point lead in favor of Biden, Trump took the win when he received unexpected support in Miami-Dade County.
Polls are not a perfect science. Reports in the run-up to the 2016 election showed Hillary Clinton to lead the country nationwide, with a closer race in states like Wisconsin, Michigan and Pennsylvania. But Trump ultimately got the 270 electoral college votes necessary to win the presidency. A report by the American Association for Public Opinion Research concluded that state-level surveys “underestimated Trump’s support in the upper Midwest,” with forecasters pointing to a lack of high-quality survey data from those states.
So is there a more accurate way of projecting election results than traditional polls, which are mainly based on phone calls and online panel polls? Companies like KCore Analytics, Expert.AI and Advanced Symbolics claim that algorithms can capture a more complete picture of election dynamics because they are based on signals like tweets and Facebook messages. However, after the 2020 elections, it is still unclear whether the AI has proven to be more or less accurate than the polls.
KCore Analytics predicted from social media posts that Biden would have a strong advantage – around 8 or 9 points – in terms of referendum, but a small head start when it comes to electoral college. Italy-based Expert.AI, which found Biden ranked higher in terms of sentiment on social media, put the Democratic candidate a little ahead of Trump (50.2% to 47.3%). On the flip side, Advanced Symbolics’ Polly system, developed by scientists from the University of Ottawa, with predictions showing that thanks to the expected victories in Florida, Texas and USA, Biden would win 372 votes for the electoral college compared to Trump’s 166 had received, completely next to Ohio – all states that went after Trump.
As with the survey, some of the inequality in the algorithmic predictions can be attributed to methodological differences.
Expert.ai uses a knowledge diagram that identifies named entities – including people, companies, and places – and attempts to model the relationships between them. The company says its system, which adds 84 emotional labels to hundreds of thousands of posts from Twitter and other networks, semi-automatically weed out bot-like social accounts. Expert.ai’s algorithm ranks the labels on a scale from 1 to 100 (according to their intensity) and multiplies them by the number of occurrences per candidate. At the same time, emotions are classified as “positive” or “negative” and an index is created from this, with which the two candidates can be compared.
In comparison, KCore Analytics, which claims it used over 1 billion mined Tweets as a guide to its predictions, uses an end-to-end framework to find influencers and hashtags on networks like Twitter. The data is selected for both content and frequency – supposedly in real time and with no bots – which an AI model called AWS-LSTM analyzes for opinion classification with a claimed accuracy of up to 89.5%.
Polly collects a randomized, controlled sample of American voters identified from their posts and conversations on social media. Before November 3, there were 288,659 people.
A challenge in predicting election results with AI is that the algorithms need to be trained to learn different models for the electoral college that will match national predictions. On the other hand, they need to optimize their ability to uncover problems that are important for certain minority groups and regions. The smaller the groups, the more difficult it is to find them.
According to Advanced Symbolics, Polly failed spectacularly in this regard. The model predicted Florida would vote for Biden with 52.6% of the total vote in the state, but only because the system did not get a segregated sample of Cuban Americans, who normally vote for Republican candidates. Instead, Polly, along with Venezuelan and Mexican Americans, classified her as “Hispanic”.
“We need to consider more ethnic and regional ‘factors’ for the next election,” the Polly team admitted in a blog post this week. “Reinforcing mistakes makes them easier to spot – find out where Polly went astray, problem by problem, state by state.”
Rural areas of the US were also more difficult to account for in the models. This is because a lower percentage of likely voters in these regions use Twitter, which leads the models to underestimate the leeway of, for example, Biden voters. In addition, there are fewer potential Trump voters on Twitter as the social network tends to be liberal. This means that tweets from Trump supporters are weighted more heavily in social-based election forecasting models, but sometimes not heavily enough, as was the case with Polly.
Trump received more than 68.6 million votes on election day this year, compared to 62.8 million in 2016. And in counties like Miami-Dade, which were expected to “turn blue,” Republicans voted with one percentage slightly higher than Democrats (63% of the county’s registered Republicans versus 56% of Democrats) as of Oct. 30.
Companies like KCore Analytics claim that their AI models are superior to traditional surveys because they can be scaled and adjusted to massive groups of potential voters to predict outcomes with sample biases (like underrepresented minorities) and other thresholds. They correctly predicted that Britain would vote to leave the European Union in 2016, and they correctly predicted that around 80% of the winners of the general election in Taiwan, as well as the close regional races in India and Pakistan.
But they are not infallible. And, as Fortune notes, none of these models take into account how legal challenges, unfaithful voters (electoral college members who don’t vote for the candidate they signed up to), or other disruptive factors could affect the outcome of a race. And with Polly as a case study, these approaches – like traditional polls – seem to have underestimated voter enthusiasm for Trump in 2020, especially among black and Latin American voters and members of the LGBTQ community.
Andrew Gelman, professor of statistics and political science at Columbia University, suggests that poll models tuned to specific variables in a given election year are likely to be closer to the mark than guesses derived from poll averages. “Political scientists have developed models with which the national vote can be predicted well on the basis of so-called ‘fundamentals’: key variables such as economic growth, president approval and term of office,” he wrote in a comment for Wired. “If we had taken one of these models and adjusted it based on the 2016 party voting shares (as opposed to using current poll data), we would have predicted a narrow Biden win.”
How startups scale communication: The pandemic is causing startups to examine their communication solutions more closely. Learn how