Much has been made of the role of big data in this year’s US elections. The New York Times has reported how both campaigns have been mining even the most private personal data to try to get an edge in terms of mobilising voters, and Stephen Baker has studied how the ‘numerati’ are being employed by both parties to identify swing voters. Analysts can now classify neighbourhoods and individuals using methods borrowed from marketing analytics which identify lifestyle, education, demographic and consumer characteristics in order to locate these voters and enable the campaigns to target them with advertising and campaign promises.
It’s clear that Obama has achieved the edge in terms of voter mobilisation, and has done better than most expected by winning all the swing states. But are these wins the result of micro-targeting tactics involving highly granular data, or was it just greater overall coverage by the campaign that won the election? George Edwards, Winant Professor of American Government at Oxford, prefers the latter explanation. In a morning-after analysis he identified the Obama campaign’s ‘ground game’ as its fundamental advantage. Obama committed serious resources to opening more field offices than Romney, at least one in every state, and thus his campaign was simply more present in voters’ neighbourhoods. Edwards says that although it’s important to be able to identify swing voters, the most important work that goes on at neighbourhood or individual level is knocking on doors and persuading or shaming people who are already registered with your party to go to the polls. Research by political scientists has shown overwhelmingly that this process of getting out the vote the old-fashioned way is the most effective way to impact the results. So which of these tactics was more important: accurately targeting the undecided or leveraging existing affiliation?
Having said that, it’s important to remember that there is no way to assess accurately which of these methods had the edge. The best exit poll so far, that of the New York Times, only covers 18 states, and unless people can be contacted and persuaded to tell analysts whether they were in fact swing voters, and if so, what made them decide one way or the other, there is no way to tell whether targeting worked. Nor is this a zero-sum game: presumably efforts to micro-target messages toward swing voters complement the ‘ground game’ of getting out registered party voters, and we have no idea of the possible interactions between these two approaches. The NY Times’ recap claims that this happened. Meanwhile the New Yorker quotes Obama campaign analysts as saying that despite the advances in targeting and modelling, ‘none of it is a big, dramatic departure from what we did last time.’ In an attempt to figure out whether social media campaigns have changed voting behaviour, one team has conducted an online experiment using Facebook, but this covers only a small corner of the territory in terms of the use of big data as a campaign tool. We may never know the full story – despite the claims that will be made on behalf of big data, analysts’ ability to verify what actually happened lags far behind their ability to deal with the data on the front end.
However, if we can’t tell whether big data won the election, it was certainly a big contender in the battle to predict the results. Statistician and blogger Nate Silver’s 538 model was clearly correct, and a whole universe of Silver-clones can be expected to spring up among the news media in an effort to grab coverage before the 2016 election. In contrast, Michael Wu showed earlier this year that sentiment analysis using Twitter was more than 93% aligned with the Gallup polling data – the only drawback being that Gallup predicted a Romney win. But then OII’s own Mark Graham and his team used Twitter data – unlike Wu, without sentiment analysis, counting only mentions of each candidate – and correctly predicted the winner.
So what does this tell us? That big data, just like any other data, is only as good as our ability to understand and manipulate it. That voters’ decisionmaking is more complex than their consumer preferences and the views they express on social media. That we need to put some time into analysing whether the models were right, whether the key groups were influenced more effectively than before, and what the complementarity is between hyper-sophisticated data analysis and good old-fashioned shoe leather.
And that sometimes, despite all the perceived uncertainty, the right candidate wins.