If you’re someone with data analysis skills and a yearning to solve poverty/malaria/the mystery of bee mortality, now’s your time. There are numerous ‘big data’ initiatives emerging to connect development initiatives with people who can code, including programs based around GIS, mobile phone data, crowdsourced crisis reporting and microfinance, to name but a few. But in the happy hype about the potential of big data for making everything better, the ethical implications for the development field – let alone the notion of involving the subjects of development in interventions that will affect their lives and privacy – tend to come as an afterthought.
This is not particular to ‘big data’ development solutions. Old-style development solutions have done their bit to ignore the multidimensional reality of poverty. But it is relevant because addressing big data as a public good only focuses and magnifies what is becoming one of the central technological issues of our time: who owns our data, who should get access to it, and how we can know what our digital traces are being used for. Anonymity is often promised but dubiously delivered: in the developing world as in the industrialised countries, an increase in the amount of ‘data exhaust’ is paralleled by the reduced ability to render it safe for research. A recent study on mobile phone data found that if your mobile phone company releases subscribers’ call data to researchers under current anonymisation practices, ‘four spatio-temporal points are enough to uniquely identify 95% of the individuals’. That means that if they can see the (approximate and blurred) geo-locations of four calls you have made, they can track your movements throughout the rest of the dataset. The EFF has published a report detailing why this may be a bad idea – for example, you might not want people knowing where you go to church, the health facilities you frequent or where you spent last night. Although your mobile company removes your name from the call data, this is still enough to make the average person nervous, particularly given the likelihood that other datasets may be released in future which make your data easier to connect with your actual identity.
It’s clear that releases of personal data are, and should be, regulated. But how does this apply when we start thinking of this kind of transactional data as a public good, or of research using it as an act of ‘development’? What if ‘development’ becomes a shibboleth with which any researcher can potentially access personal data from developing countries with different ethical assumptions, or under less stringent professional criteria, than if the data belonged to citizens in Europe or the US? The UN Global Pulse blog notes that ‘while recognizing the privacy challenges, the [Big Data for Development] report suggests that what is needed is a careful balance between privacy and use of data for the public good.’ But what is the public good, and who gets to define it?
Another question that has so far gone unaddressed is at what point big data about, or generated by, people living in developing countries becomes ‘big data for development’. I would not include in this definition the Open Development Data movement, which advocates taking established ‘development’ data sources and rendering them more effective through better use of meta-data, more sharing and essentially democratising thinking about development. This, at least in theory, should build healthy debate amongst experts and provide more access to information for the countries and groups who are the subjects of development interventions, and may in turn provide opportunities to voice dissent and make mistakes such as structural adjustment less likely to occur. Among others, William Easterly has very effectively (and satirically) pointed out how ‘development’ is a context where the ‘public good’ often trumps consideration of the individual.
One warning signal being emitted by the ‘big data for development’ movement is that stuff that doesn’t belong in the same category is getting lumped together. An example from the current Global Pulse report: ‘Social-science research has shown that Twitter, mobile phone data analysis, and Google Analytics can reveal issues and trends of concern to global development, such as disease outbreaks or mobility patterns.’ Spot the difference: disease outbreaks are one thing, mobility patterns entirely another. No one can argue with preventing disease, but knowing where people are moving to – for example when driven by ethnic cleansing, flooding or economic opportunity, to take three completely different motives for migration – has huge practical and political implications, both for the authorities in destination areas and for the people moving. Knowing who is coming and when can lead to people being turned back, being channelled to a particular place, being classified in advance as refugees or a security threat, being targeted for violence, or all of the above.
And in fact, perhaps one can even argue with publicising outbreaks of disease. Another potential problem is the cross-platform use of ‘development’ data from the poorest countries, much of which was collected before the possibility that datasets could be opened up to everyone, merged or hacked. But now that subsidised health insurance is being promoted across low-income countries, the extensive records held by national and international health authorities could become useful in new ways. Who has TB, who has HIV, who is more susceptible to malaria? This data, as it crosses the border between development and commerce, becomes a potential battleground between human rights and commercial profit.
The internet, as everyone knows, was set up as a commons where cooperation and consensus would rule. What happened next, however, represents a cautionary tale about corporate interests versus the values of the commons. Tim Berners-Lee has repeatedly spoken of the complex balance between openness and privacy, warning that forcing complete openness on strangers is both naïve and authoritarian. Rebecca McKinnon demonstrates why this is true with her description of the clash between aspirational Silicon Valley understandings of the networked commons, where anonymity is a signal that someone is being dishonest, and the real world where dissidents, minorities and those at risk of violence or oppression need to use pseudonyms to operate safely. She tells how in 2009 Facebook changed its privacy settings without warning in order to make people’s networks and ‘causes’ visible beyond their group of friends. This caused horror and outrage, not to mention very real personal risk, amongst dissidents protesting against the Iranian government, which was understandably very eager to know about their networks and affiliations.
So how is privacy to be regulated in the context of development? Perhaps the first thing we need to do is to recognise that privacy in the context of ‘doing development’ is likely to be different from data protection in developing countries. For one thing, even though privacy exists as a human right; as a regional framework even in West Africa, the world’s income-poorest region; and as specific legislation in many of the world’s lowest-income countries, as Graham Greenleaf wrote in 2012,
‘it is important to note that ‘growth’ or ‘expansion’ of data privacy laws cannot be equated with improvement in privacy protection. Some privacy laws are simply not enforced. Surveillance activities in both the private and public sectors can also grow at the same time as laws are enacted and operational, and quite often do when data privacy laws are a trade‐off for, or a belated response to, more intensive surveillance.’
This is an important point. Even where legislation on privacy exists, we should consider the incentives to ignore that legislation – the increased availability of data being one of them. Anyone living a digitally networked life knows how incentivised corporations are to bend – or completely ignore – the rules; and how governments freely consume our private personal data without even having rules to bend, because they exempted themselves from them.
So who should or could regulate the use of data ‘for development purposes’? This is a tough one to answer because first, ‘development’ tends to erase difference and nuance – ‘the poor’ are not a unit, nor are ‘poor countries’, and the reasons governments want to know where their citizens are and what they are doing have traditionally covered a very wide range. FDR wanted to know about your employment history for a different reason than Senator McCarthy, and the NHS wants your medical data for different reasons than the Tuskegee health department did. However, all the current ‘big data’ sources in the news, from India’s Universal ID program to the mobile phone records of everyone in Cote d’Ivoire constitute a way to track, classify and group people; to study populations and to determine what needs to be changed about their circumstances.
Based on the issues I’ve discussed here, a strong privacy framework for transactional data in developing countries would be one based on the assumption that no data is fully anonymous, and that data which seems adequately anonymised now may not be so in the future, under different technological conditions or when merged with other datasets. Rather than not releasing data, this suggests putting serious thought into anonymisation techniques rather than doing what the (fairly ambivalent) law requires. Furthermore, I’d propose something which is both obvious and yet quite difficult to imagine: that people’s transactional data should remain private until they have had the chance, individually or collectively, to voice an opinion as to whether it’s ok to release it or not. (This goes for people in industrialised countries as well as those in developing ones, but arguably industrialised countries provide greater potential for public pushback if things go wrong.)
Until there is a meaningful public debate about the commercial uses everyone’s data is being put to, and whether there are alternative frameworks for who is allowed access to it, I am not happy with my location/spending patterns/social networking activities/internet search records being used by third parties. This is presumably just as true for people who don’t have access to the debate, or necessarily even to the knowledge that their data is being collected and used. It’s time to make a case for access to information about information.
Alternative approaches such as participatory development have something to offer on this. Paulo Freire said that ‘Leaders who do not act dialogically, but insist on imposing their decisions, do not organize the people – they manipulate them. They do not liberate, nor are they liberated: they oppress.’ To the extent that development initiatives seek to lead, they should take note of his warning – if you don’t involve the poor when you try to remedy poverty, you end up being just another developer.