I’m at a conference in the Hague this week, on Responsible Data for Humanitarian Response, where I’m about to do a session on ‘checks and balances’ in data governance for the humanitarian field. This is an interesting challenge, because currently there aren’t a lot of them either in that field or in relation to developing countries more broadly. My challenge is to isolate what we can possibly address – which of course brings up the fact that there is much we can’t. So, for what it’s worth:
First, here’s the problem. We are asking how to regulate data use in sectors – humanitarian response and international development – that are mostly unregulated in any other way. You can pretty much set up your own data analytics outfit in response to a humanitarian crisis, as happened for instance with Crisis Commons and Flowminder after the Haiti earthquake in 2010. In fact, you can set up any kind of disaster response or development organisation unilaterally: this is the standard model. From Bill Gates to Oxfam, organisations dealing with poverty and crisis tend to determine their own approach. What this means for data governance is that there is a highly fragmented field, ranging from institutions that are governmental (USAID, DfID) and can be regulated by states to institutions that are completely independent (such as private foundations) and are pretty tough to reach with regulation.
We are also looking at a change in the nature of the data regulation problem. Data regulation has traditionally been about preventing the identification of individuals with the idea that an adversary wants to harm a specific person, and needs to identify them to do that. However, with big data, the problem potentially moves to the demographic level. If you want to to wipe out a whole group, you can do it using sensor data (e.g. satellite or mobile phone location records), without seeking personally identifiable information at any point. The largely private ownership of big data means there are new challenges in terms of knowing what has been collected and what is being done with it. The likelihood that states or individuals will know enough to defend themselves from data misuse is very limited. Third, a lot of data misuse takes places in areas which are currently seen as exceptions to the need for clear rules. For instance, the European 1995 Data Protection Directive offers exceptions for purposes of scientific research, national security, defence, public safety, or important economic or financial interests on the national level. These pretty much cover anything one might do with data under the rubric of humanitarian or development interventions.
So data governance appears to be at once a wide open space, and a challenge that needs a major rethink. One proposition that has been made at this conference is that the data governance challenge replicates that of regulating the aviation sector when it first emerged a century ago. Planes seemed like an exciting, harmless innovation until it turned out that you could not only crash them and kill yourself and others, but could drop bombs from them and create widespread destruction. I would contest this, on the basis that you have to be comparatively rich to die in a plane crash, whereas data misuse can touch absolutely anyone on the planet. A better analogy might be trying to regulate emissions to prevent climate change – like fossil fuels, countries now see data analytics as a necessity for economic growth, and the negative effects of data increasingly impact on the poorest and most marginalised, who are least able to resist and seek remedy. (By the way, for anyone who missed it, we have not figured out how to regulate CO2 emissions. There are too many competing short-term incentives, those most at risk have no power, and the political tradeoffs needed are insanely complicated to negotiate. Does this sound familiar?)
So, if we accept that this is a bit of a chaotic space, what are the options? Perhaps we need to approach data governance in a distributed way, addressing the data landscape as a series of villages rather than as a unified whole. For example, research often occurs in coalitions that include academic actors. These configurations can be regulated by university Institutional Review Boards under the regulations that govern human experimentation, as was the work of Flowminder in Haiti. If this had been the case with the now notorious Facebook experiment of 2014, it might not have proceeded. However, it seems that the experiment escaped review by Cornell’s IRB on the grounds that it was a pre-existing dataset and not collected for that study – which underlines that existing ways of preventing data misuse need to be updated and reworked for the big data era, when new uses of pre-collected data are a real category of risk.
A new village appeared recently in the humanitarian/development datasphere: corporate actors. Surprisingly, there is cause to be hopeful here given the example of Orange. The French telco released a massive mobile call dataset from Côte d’Ivoire in 2012 in its Data for Development Challenge with relatively little articulation of an ethical framework for the researchers involved. Interestingly Orange has repeated the challenge in 2014 with a dataset from Senegal, but has put in place a self-developed ethical framework for understanding and dealing with the risk of data misuse that is pretty comprehensive. The new framework has resulted in the company limiting the publication of research that was judged too sensitive in the country context. One reason for this successful self-regulation is that telcos are highly incentivised not to expose their customers to harm because it is a sure way to lose market share. So profit can be an incentive to act more, rather than less, ethically in some cases.
That still leaves the problem of private humanitarian or development activities, particularly small-scale ones such as hackathons and advocacy projects. The large ones such as the Red Cross and UNICEF have very strong and coherent ethical frameworks to govern their work, dating back many decades. The challenge here is to adapt these to deal with data use and misuse, which is not impossible but takes serious high-level will within the organisation, often from people with little technical understanding of the kinds of data available. The small actors, however, do not base their actions in these frameworks. So we have large organisations with coherent ethical frameworks that aren’t yet applicable to digital data, and small organisations working on digital interventions who have no ethical frameworks at all.
To understand what’s possible, given this fragmented landscape and the fundamental change in the type of data and activity that needs regulation, we need to address the overarching question of power. Why are these organisations unregulated? It’s because their activities focus on poor and marginalised people who live abroad, not on people at home who can hold them accountable. It’s also because the label of humanitarianism or development, as I’ve said in previous posts, tend to deflect criticism and regulation – the work is too urgent, important and beneficial to be subject to regulation. So accountability is lacking overall, and it’s unlikely it’s going to arise around data use if it hasn’t arisen around other activities. This is why we mainly see whistleblowers calling data misusers to account – Snowden, Assange and others show that a mechanism for preventive accountability is missing.
So what can be done? If there is a lack of accountability, this points to a democracy deficit in the field as a whole. It’s great that some actors are responsibly self-regulating, but the stakes are still radically lower for the data analyst than for the data subject, and until the incentive not to mess up is as high as the incentive to use the data (which for corporate actors such as Orange it may be), it’s unlikely that self-regulation will be a reliable answer. How is accountability generated? Well, in different ways. Each village has its own mechanisms, or potential mechanisms, for accountability, and its own leverage points to make them operational. It’s necessary to separate out the field, and to reconceptualise from ‘data governance’ to the governance of particular groups and classes of actors. For the least accountable though, such as privately run donors and certain NGOs, the only way to impose any kind of accountability may be through public pressures. This means raising awareness in those organisations’ home countries, where they are accountable at least to authorities such as the taxation system. This would take either a phenomenal level of altruism amongst developed-country civil society, or a real and highly publicised disaster relating to data misuse. Then again, people have died in huge numbers due to misconceived development policies such as structural adjustment and forced population displacement, without any kind of civil uprising in the countries that have funded these policies.
In the end the best ethical code comes from Hippocrates: first do no harm. Perhaps anyone training to work with big data analytics should have to take the Hippocratic oath. After all, particularly if they work for Facebook, they are going to get access to people’s insides.