Towards a Sociology of Data

OII recently held a workshop – ‘Towards a Sociology of Data’ – which brought together sociologists, philosophers, political scientists and computer scientists to discuss whether there exists – or needs to be – ‘a sociology of data’. The workshop was based around three main discussions: our relationship to data as private individuals, as citizens, and as researchers.

We started from the idea that data that reflect the activities and identities of individuals are becoming larger in scale and scope, more commercialised, and more likely to be linked, merged or otherwise mashed up with other data. This is part of the way we define how ‘data’ becomes ‘big data’: the latter is a process of combining and defining, rather than just more data.

Data are also more likely to become ‘open’, conflict between personal and open data. The discussion outlined the three-way relationship between privacy, disclosure and identity, where ‘disclosure is about how we maintain and control our relationships… [and] Privacy is whom we choose to disclose to and how we choose to disclose’. In relation to this, the ‘data layer’ of social interaction is playing in increasingly important as the way in which we control and release our information – or have our information controlled and released on our behalf.

So what is the connection between personal and private data, between privacy and autonomy, and what does it mean to allow our data out into public or commercial spaces? Arguably, this is the first time that there have been real and effective incentives to release personal information about ourselves to commercial entities such as Amazon or Facebook, which suggests that rather than ‘throwing away’ our privacy in our online activities, we may be offering it – or selling it – in exchange for services or information that we want. Do we effectively gain in autonomy, for instance, when we take part in social networking activities, in return for which we make information about ourselves public?

This leads to a question about the meaning of informed consent when the personal data we emit exists in a constantly changing and developing environment. In 2002 the Supreme Court decided that even if the terms and conditions of a given service change, our consent stands – once we click ‘accept’, we have accepted not just the present, but also unknown future uses of our personal data.

Arguably, however, there is an important difference between having one’s information collected by an institution, as occurs in the case of surveillance, and interacting with an interface, as we experience many companies in our everyday online activities. Presumably there is a difference between a person analysing and using your information and an algorithm doing so, as occurs with people’s use of Gmail. One participant commented:

‘In a sense this sort of critique of invasions of privacy framed in liberal language of self, autonomy, ownership, property, is all about the notion that someone is trying to find out what individuals are up to.  But surely they’re not.  What they’re trying to find out is how individuals are associating into social units, and then finding out what those social units are up to.’

In other words, Google doesn’t care about you. It cares about its users in the aggregate, and whether the aggregate includes you or not is neither here nor there. A more useful question, therefore, might be how our expressions of our online identities are shaped by the platforms and applications we use, and what it means if we have constrained access to the content we have posted.

The discussion of data’s relation to society began by identifying a three-way interaction between government, corporations and individuals, which varies across countries and can therefore be analysed comparatively. For instance, the census has been influenced in various eras and locations by firms as census-takers (e.g. Lockheed Martin in the UK) or as shapers of the information collected (as in the post-war period when citizens were also visualised as consumers and the census came to include questions on consumption that would feed into firms’ marketing strategies). Data collected from the public en masse has also been used by government to gain information on political opinions, as in China. In light of crowdsourcing and citizen science, we can ask what a citizen economy of data might look like – and whether such a ‘radical remixing’ is possible in an age when data is as valuable as oil.

Personal data may be more valuable than ever, but it is also less contextualised. As data describing people grows, our ability to keep the metadata – the data that describes and qualifies the data – intact decreases, so that it is less and less possible to be certain that the data being produced, logged and used is ‘good’, reliable, or authentic. To some extent the formalisation of data – its formatting, for example, may now have to take the place of metadata and fix it in a form which reflects the decisions made about it.

At the other end of this process of gathering and categorising data is the act of using it, often to predict the future. The availability of large datasets on people’s behaviour and transactions is seeding a new cycle of empirical modelling of the whole, as opposed to elements of the whole. This has occurred periodically over the history of scientific research, and is occurring today in what has been termed ‘the end of theory’, with efforts to precisely model and predict processes from the large-scale such as climatic change and economic trends to the individual level such as genetically determined health, voting behaviour, and people’s movements.

Data can be seen as an environment: one participant suggested that the metaphor of a GIS system where social and physical layers of the environment are merged helps us to understand the current layering of data in a socio-political context, where people’s activities in different spaces can be layered and mapped over each other to create a particular image of social activity.

Finally, the discussion on data and the self concentrated on the question of whether data can be context free. It can, in the sense that it must come from outside the knower, must be an isolable unit, specifically isolable from its context in the sense that it can be used to build an explanation of a phenomenon which can be independently challenged. This suggests a problem with data from social media: if they are owned and controlled by firms, they cannot be isolated, challenged, or used to replicate a particular analysis.

Predictably, we didn’t resolve the question of whether data is context-free or not. It’s fairly contentious. But the discussion moved us toward a better understanding of how we might usefully think of data as a palimpsest within which we operate and research.


  1. […] tendency to conflate methodology with sociological approach. This was something also addressed in a workshop earlier this year at OII. This is an important question because it goes to the constantly recurring question of the role […]

  2. […] of use to the different communities using personal data. The Oxford Internet Institute has held a discussion on exactly this problem – what is data? – but we too noted that new taxonomies and […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: