Privacy International defends the right to privacy across the world, and fights surveillance and other intrusions into private life by governments and corporations. Read more »


Big data: A tool for development or threat to privacy?

Big data consists mainly of data that is openly available, created and stored. It includes public sector data such as national health statistics, procurement and budgetary information, and transport and infrastructure data. While big data may carry benefits for development initiatives, it also carries serious risks, which are often ignored. In pursuit of the promised social benefits that big data may bring, it is critical that fundamental human rights and ethical values are not cast aside.

Expanding beyond publicly accessible data

Along with other humanitarian organisations and UN agencies, one key advocate and user of big data is the UN Global Pulse, launched in 2009 in recognition of the need for more timely information to track and monitor the impacts of global and local socio-economic crises. This innovative initiative explores how digital data sources and real-time analytics technologies can help policymakers understand human well-being and emerging vulnerabilities in real-time, in order to better protect populations from shocks.

UN Global Pulse clearly identified the privacy concerns linked to their use of big data and the impact of privacy in “Big Data for Development: Challenges & Opportunities” and have adopted Privacy and Data Protection Principles. While these are positive steps in the right direction, more needs to be done, given the increasingly complex web of actors concerned, the expanding scope of their work, the growing amount of data that can be collected on individuals, and the poor legal protections in place.

Increasingly, big data includes not only openly available information but extends to information collected by the private sector. This includes Twitter feeds, Google searches, and call detail records held by network providers. The efforts of groups such as UN Global Pulse are focussed on opening access to private sector data; UN Global Pulse noted this “challenge” and have been encouraging enterprises to participate in “data philanthropy” by providing access to their data for public benefit.

Dangers of big data

While access to such data is posited as opening opportunities for development, it also has the potential to seriously threaten the right of individuals to keep their personal information private.

If private sector data falls into the wrong hands, it could enable monitoring of individuals, identification and surveillance. Despite guarantees of anonymisation, the correlation of separate pieces of data can (re)identify an individual and provide information about them that is even more private than the data they consented to share, such as their religion, ethnicity or sexual orientation. If this were to happen in certain contexts the consequences could have tragic impacts, especially if the data concerned relates to vulnerable groups such as minorities or refugees, as well as societal groups including journalists, social dissidents and human rights advocates.

Big data development initiatives such as that conducted by French telecommunication company, Orange, in Côte d’Ivoire have shown that even a basic mobile phone traffic data set can enable conclusions about social divisions and segregation on the basis of ethnicity, language, religion or political persuasion.

What about consent?

Because big data is derived from aggregated data from various sources (which are not always identifiable), there is no process to request the consent of a person for the resulting data that emerges. In many cases, that data is more personal than the set of data the person consented to give.

In October 2012, MIT and the Université Catholique de Louvain, in Belgium, published research proving the uniqueness of human mobility traces and the implications this has on protecting privacy. The researchers analysed the anonymised data of 1.5 million mobile phone users in a small European country collected between April 2006 and June 2007, and found that just four points of reference, with fairly low spatial and temporal resolution, were sufficient to uniquely identify 95 per cent of them. This showed that even if anonymised datasets do not contain name, home address, phone number or other obvious identifier, the uniqueness of individuals’ patterns (i.e. top location of users) information could be linked back to them.

Advocates for big data for development argue that there is no need to request consent because they concern themselves with unidentifiable anonymised data. Yet, even if one actor in one context uses data anonymously, this does not mean that the same data set will not be de-anonymised by another actor. The UN Global Pulse can promise that they will not do anything that could potentially violate the right to privacy and permit re-identification, but can they guarantee others along the process ensure the same ethical safeguards apply?

Whose data and for what policies and programmes?

New technologies are enabling the creation of new forms and high quantities of data that can inform policy-making processes, improving the effectiveness and efficiency of public policy and administration. However, inaccuracies can exist in the data used – either because data is not regularly updated, relates only to a sample of the population, or lacks contextual analysis.

A recurring criticism of big data and its use to analyse socio-economic trends for the purpose of developing policies and programmes is the fact that the big data collected does not necessarily represent those towards whom these policies are targeted. The collection of data may itself be exclusionary when it only relates to users of a certain service (health care, social benefits), platforms (i.e. Facebook users, Twitter account holders, etc.) or other grouping (i.e. online shoppers, loyalty card members of airlines, supermarkets, etc.)

In the developing world, only 31 per cent of the population is online, 63 in 100 inhabitants have a mobile phone and 11 per cent have access to mobile-broadband. Ninety per cent of the 1.1 billion households that are not connected to the Internet are located in the developing world. Some countries in Africa have less than 10 per cent of their population active on the internet. This means whole populations can be excluded in data-based decision-making processes.

So what must be done?

As noted by Linnet Taylor, researcher at the Oxford Internet Institute working on a project about big data and its meaning for the social sciences, a quick analysis of the big data discourse reveals a clear double standard:

There is a certain irony here: 90% of the discussion at the forum referred to big data as a tool for surveillance, whereas the thread of debate that focused on developing countries alone, treated it as a way to ‘observe’ the poor in order to remedy poverty”.

Data is data. Yet the short- and long-term consequences of collecting data in environments where appropriate legal and institutional safeguards are lacking have not been properly explored. Amassing and analysing data always has the potential to enable surveillance, regardless of the well-intentioned objectives that may underpin its collection. Development is not merely about economic prosperity, and social services. It is about providing individuals with a safe environment in which they can live in dignity.

Towards accountability

In their recently published paper, Big data and Due Process: Towards A Framework to Redress Predictive Privacy Harms, Crawford and Schultz propose a new framework for a “right to procedural data due process,” arguing that “individuals who are privately and often secretly 'judged' by big data should have similar rights to those judged by the courts with respect to how their personal data has been used in such adjudications”.

Unlike the common model of personally identifiable information, big data does not easily fall within legally protected categories of data. This means there are no legal provisions protecting the data collected, processed and disclosed, and the rights of individuals whose data is being analysed.

Therefore, Crawford and Schultz have innovatively re-visited some relevant founding principles of the legal concept of due process. Due process (as understood in the American context) prohibits the government from depriving an individual’s rights to life, liberty, or property without affording them access to certain basic procedural components of the adjudication process. The concept equally exists under European human rights law, though is more commonly called procedural fairness.

By doing so, Crawford and Schultz are challenging the fairness of the process of collection rather than the attempting to regulate it, which would be more complex and contested. They have thus applied these principles to address existing privacy concerns linked to the development and use of big data, namely:

  • requiring those who use big data to “adjudicate” others, to post some form of notice disclosing not only the type of predictions they are attempting, but also the general sources of data that they are drawing upon as inputs, including a means whereby those whose personal data in included can learn of that fact;
  • providing an opportunity for a hearing to challenge fairness of the predictive process; and
  • the establishment of an impartial adjudicator and judicial review to ensure accountability of those who adjudicate others, i.e. those who deprive individuals of a liberty interest do so without unwarranted bias or a direct financial interest in the outcome.

The use of big data is intrinsically linked to ethical values, which means that the starting point must be the development international guidelines governing access to and analysis of individuals’ data. Thus as Crawford and Schultz conclude:

Before there can be greater social acceptance of big data’s role in decision-making, especially within government, it must also appear fair, and have an acceptable degree of predictability, transparency, and rationality. Without these characteristics, we cannot trust big data to be part of governance”.