At the 2013 European big data conference in Brussels, where a mix of private and public sector leaders are discussing how to use big data to make Europe more competitive. It’s a very wide-ranging discussion which has gone from academic research data infrastructures to public genomes and self-driving cars. The mix of public and private sector, though, makes this a more interesting and useful discussion than the more hype-filled discussions at purely industry events.
This morning’s discussion focused on how policymakers can enable the sharing of data in ways that may help the EU move out of the financial crisis. Neelie Kroes spoke about the aim of creating ‘a single market for data reuse and sharing in the EU’, and big data as a recipe for growth and a way to better understand society. At which point one representative sitting next to me commented that it was ironic that while cheerleading for big data, the EU had rejected the only social science big data project (‘FuturICT’) proposed for the EU’s billion-euro FP7 flagship scheme, in favour of large scale natural science projects.
Kroes’ positive headline was followed by the reality: an argument between EC representatives about how to balance data availability with data protection, privacy and IP. No resolution was reached. Some rather general figures were offered: ‘data’ could contribute €250bn in annual value to the EU public administration sector, and ‘open data’ had an estimated value of €32bn for the EU27 economies in 2010. The discussion offered a mixture of corporate big data evangelism with an EC consciousness of real problems with privacy and access. The debate was driven by anxiety that the EU is lagging the US in enabling access to and use of digital data, and that this is correlated with the ongoing financial slowdown. Therefore making the ‘data economy’ flow may be correlated with reviving the financial one.
The discussion puts data protection and privacy concerns at the centre of this perceived data industry bottleneck. One main characteristic of the debate about how big data can be used to its full potential, both at this conference and at large, is that it oscillates between examples involving data which does not link directly to individual identities (e.g. vehicle travel patterns) and those which do (sharing individual genetic information), and also between the industry definition of big data as that which needs a lot of computing power, versus big data as data that is especially rich and multidimensional, is unfamiliar in origin, or is larger than previous datasets in a particular area. This is clearly contributing to the sense of delay people are voicing about the EU’s data policy: there is no one policy which applies to ‘data’ as a unitary whole, and all available regulatory frameworks leave gaps. Questions raised here have included, ‘What happens when open data mixes with proprietary data?’; ‘Does velocity conflict with democracy?’; ‘how can researchers conduct analysis of big data on different scales without running into privacy problems?’
Most of the corporate speakers here attribute the policy bottleneck to a lack of public understanding. They advocate for the public to be given clear case studies that demonstrate how sharing their personal data can benefit them, so that the EU will have an easier time getting data sharing conventions adopted. The moderators of the discussion push them, asking whether it’s also necessary to present information on the costs of sharing data, but no one is clear on what those are. This seems like yet another example of how the incentives to share data are clear whereas privacy is a slippery concept that is not user-friendly.
An American corporate representative says that EU regulations such as data minimisation (don’t collect more than you need) and privacy protection (don’t use data in ways which present meaningful harm to individuals) are so fuzzy, and the potential punishment for violating them so high, that they make the EU an unattractive place to companies aiming for data-driven innovation, and therefore undermine data’s potential to fuel growth. He advocates regulating basic rights instead of data collection and sharing: just punish those who use data in ways which violate individuals’ constitutional rights. Of course, this kind of bottom-up, post-hoc regulation only works if people understand what is happening to their data and can convince the FTC to see their point of view. It’s also unclear what position user consent occupies in this American perspective, since corporations are essentially free to do what they want until they violate our rights, in contrast to the proposed EU approach where user consent has to be specific and binding.
So, is the EU stuck on the issues of privacy and data protection and in need of a massive policy push to catch up with US levels of data sharing and innovation? Or is it just wisely set up in order to regulate in advance rather than in response to NSA-type privacy crises? Is it better to regulate in ways that promote competition in the data market, and thus create an environment where there are a myriad smaller companies competing, each with different pieces of our personal data – or to allow a market leader like Google to hoover it up and then regulate its behaviour? At the moment the EU’s antitrust case against Google as a leader in the search market indicates that it is aiming for the first situation – but which benefits us most as ‘private’ citizens?
There may be some lessons here from the open data movement. If you can regulate open data in ways that make it a democratic resource, perhaps you can also make big data shareable without infringing privacy and data protection. Paul Suikerbuijk of the Dutch government open data initiative offers an example of taking the data to the people: his office has been doing ethnographic research in small Dutch communities on people’s current problems and priorities, and discussing how open data may be useful to them – for example predicting the population changes that are causing local schools to close, and organising across districts in order not to lose them. This equates to the ‘positive use case’ approach proposed by corporate representatives at today’s event – except that it’s oriented towards user priorities and operates at the micro-level, rather than acting as a way to herd people towards certain behaviours that are friendly to corporate priorities.
From a sociological perspective, today’s debate illustrates how ‘big data’ can be seen as an ‘assemblage’, built out of pre-existing types of data, managed and analysed with techniques that are a palimpsest of previous methodologies and approaches, and regulated in a relational and intuitive way that employs precedent and most-similar examples. The EU assemblage is determined by EU history, conditions and priorities, and is very different from the US one. It seems unlikely, under the ‘regulate’ vs. ‘don’t regulate’ dichotomy, that they can compete for the kind of corporate growth that the US is seeing around data at the moment. The EU is, however, evolving a healthy open data movement, a set of privacy guidelines which may even put a spoke in the NSA’s priorities, and is trying to work through difficult questions which so far have not been addressed in the US. It’s even possible that this effort will pay off over the longer term, as data on individuals becomes even more ubiquitous and managing it even more contentious.