Today I heard about yet another ‘big data’ research project on ‘refugees-and-migrants’ and it became time to write something. The project aimed to interview undocumented migrants in Europe, then combine those interviews with data from mobile phones, bank accounts and national databases, and also with data from the organisations responsible for controlling the presence of migrants (and often deporting them). The aim of the project was, it appeared, unclear. The researchers wanted to, variously, create knowledge about the characteristics and situation of the undocumented because it would lead to better outcomes for migrants; to create knowledge because it would lead to social acceptance of migration, and to locate ‘lost’ child migrants.
This was not the only project I have heard presented recently. Since the increased influx of refugees into the EU due to the Syrian conflict, data science has discovered migrants as an object of study, and every new research project seems to be about migration.
There was the project where a leading software provider planned to market its new data analytics environment by conducting a ‘social good’ data science project using migrants’ social media output to find out where they were going, where they planned to claim asylum, and what they were talking about on the way. The senior executive described it in these words:
‘if you assume that a refugee camp is basically the same as a prison, and the migrants are prisoners, then by tracking them after they leave the camp we can stop them reoffending.’
The project was then joined by a group of star professors and researchers from some of Europe’s top universities, all of whom wanted to do ‘something on refugees’ but had no idea what. Conversations revolved for several weeks within the group as they tried to figure out how the tools they had could be used on data from, or about, migrants and what kind of good that might do.
There was another project too: a commercial consultancy specializing in data science planned to combine mobile phone records with satellite data to track migrants travelling from North Africa or the Middle East towards the EU, and to provide intelligence on where particular groups came from. They would then sell the intelligence to the EU’s border protection authorities. Finding that mobile phone data was not readily available for this purpose, they reoriented to use social media data and other online content.
These were just a few of the projects I have either seen presented, or have been asked to be an ethics advisor to over the last few months. I have tried to help where possible by asking questions, pointing out legal issues with creating large amounts of personal data with unknown externalities, and orienting them, wherever possible, towards aims of protection rather than simply identification and categorisation.
However, there are so many projects out there – I challenge you to find a data science research group that doesn’t have at least one project to track or otherwise research migrants – that it’s time for some critical attention to this body of research. I have laid out (in this post) what I considered a worst-case scenario for human rights where migrants could be tracked, their place of origin identified and pre-emptive categorisations made of their likely background and motives. Based on this they could be stopped, persecuted or victimised along the way, by states or groups uncomfortable with their position as part of international migration routes. They could also, possibly more dangerously, be subjected to automated decisionmaking as the EU, in particular, struggles to distinguish those with asylum claims that align with the current rules. And it wasn’t long before a research group called me up to ask if I would be ethics advisor to a project where they realised exactly this scenario.
So, to all the data scientists out there doing work on migrants-and-refugees, here are some thoughts on the problems inherent in many of these projects so far:
- The headline.
All refugees are migrants, not all migrants are refugees. Putting them together in one research project, unless the aim of the project is to compare the two on some analytical basis, conflates legal statuses in ways that are potentially misrepresentative. It also, more subtly, encourages people to create a hierarchy of need: ‘refugees’ are more deserving than ‘migrants’; people fleeing war are more deserving than people fleeing other forms of danger and hardship.
Similarly, almost no one except a native citizen of a place has an entirely clear migration status because one’s migration status is determined by politics. Today, I am a legal British resident of the Netherlands but due to this year’s referendum in Britain, some time in 2018 I could abruptly become an undocumented migrant. Anyone who studies migration will tell you that people can have multiple and overlapping types of legal status. These days you can (unfortunately) be an asylum seeker, a refugee, a migrant, a refused asylum-seeker and undocumented, all at the same time. All these statuses are created by policies that intersect, or fail to, at the national and international level. They are not stable categories because they are determined by where someone is, and the politics of that place. Although all the researchers I have spoken to about these data science projects claim to be ‘just using legal categories’, the unstable nature of those categories means that any research that uses them uncritically is not being neutral but is inevitably supporting a particular political agenda, even if the researcher does not know what it is.
- The lack of a theory of change
A theory of change is one tool that can help to determine how to act ethically in scientific research. What do you hope will change as a result of your research, and how do you envisage that change happening? This means that if you are conducting a project on an issue as politically volatile as migration, in order to be benevolent towards its subjects, creating knowledge alone is not enough. For academic research grant applications it is usually ok to say that you are creating knowledge for knowledge’s sake, but in this case it is not. If you plan to track, identify or sort some of the most vulnerable people on earth, you need to have a plan for how that information will get used. If the main benefit is to the agencies tasked with controlling migration, it is important to be aware of this, and to be clear about it. But be aware of the politics of your work: if you are conducting research whose primary use will be in identifying and deporting the undocumented (something 90 percent of the projects I have seen can be used for), that is a political position that needs awareness and justification.
- Some kinds of invisibility should be respected.
When I was an undocumented migrant for a period of several years, I did it by choice. It was not a good choice, but (like Churchill’s description of democracy) it was better than the other available ones. I was unable to study, work legally, pay taxes, vote, or go to hospital when I got sick, let alone make plans for the future. I did not have a good time. However, it was a conscious decision. The alternative was being deported for up to ten years and breaking up my family. Making my presence known to the authorities, even through a general statistical approach rather than personal identification, would not have benefited me in any way. And it’s not as if the authorities, local or national think that being undocumented is a picnic – informing them that the lives of undocumented people are not easy is neither here nor there. Undocumented status is created by politics, and it can only be solved by politics, not by information in isolation of any political influence.
- Predicting is acting.
If you gather the data and have the expertise to make predictions others cannot make, whether about numbers, behaviour, migration trajectories, financial activities, or any other feature of migrant flows – you are acting on migrants by creating that information. Data is never raw, and models are never neutral. Those institutions charged with controlling migration are extremely well-funded, have strong political support, and are incentivised by many urgent concerns, not least national security. They are not unaware of all the data scientists out there producing new analyses of migration dynamics, nor are they lacking in technical skill. Together, these projects amount to a massive database on migration at a highly granular level, far beyond what authorities can produce on their own. What are the implications of this? Well, consider that intelligence agencies around the world have expressed their gratitude to all the people who post about their lives online, and the tech giants who collect, assemble and make that information easily searchable. Anyone doing ‘data for social good’ projects should be aware that they are becoming part of the surveillance economy, regardless of their often excellent motives.
This does not suggest that data science cannot do social good, or that research teams should not be engaging with social problems. It does suggest, however, that if you are a data scientist engaging with a migration-related question, you should have an idea of what your theory of change is. You should be aware that your research is producing actionable data on highly vulnerable people, and there is no inherent reason why its use will be beneficial. Historically, data on undocumented migrants has not been beneficial to those migrants. Finally, your work has politics, whether it is designed for social good or not, and you should make sure you agree with them.