You are here

Will the government fail the crucial privacy test for the National Pupil Database?

10 September 2012

As part of the government’s ambitious Open Data programme, the Cabinet Office announced last year that data from the National Pupil Database (NPD) will be made freely available and accessible to all. The NPD, previously only available to researchers on an academic licence, contains a record for every single state school pupil in the country, covering educational attainment from reception to sixth form, as well as characteristics such as attendance, ethnic background and free school meal eligibility. According to the Cabinet Office, the data will be ‘anonymised’: names and other information that directly identifies individuals will be removed from the version of the NPD that is open to the public. But this in itself is not a safeguard of pupils’ privacy rights.

A key issue with the open NPD is that each record will, at the very least, include a pupil’s full academic record, along with the school the pupil attended and their school year. These variables are essential if the data is to be useful for comparing the performance of different schools and tracking them over time. But this also makes it much easier to identify people on the database.

Say you know a few details about your friend, such as the school they attend, their year, and what grade they achieved for a handful of their GCSEs. Using the open NPD, you could first narrow down the set of 8,000,000 or so records to perhaps 120 that match the correct school and year. Looking at those 120 records, how many match your friend’s combination of subject and grade at GCSE? In a lot of cases, it could be just one record. If so, you have identified your friend on the NPD and you now know a lot more about them than before, including their full academic history dating back to primary school.

This is not speculation; at least one pupil has already discovered this first hand. At a ‘Hack Day’ in July, where a sample of NPD data was used as the subject of a coding and programming event, one pupil was identifiable to themselves due to the uniqueness of their exam results.

So what are the implications of this? Is it an issue that people who are close enough to me to know my school, year and some exam results may be able to track down my full academic record?

In the first instance, Privacy International believes that you should be able to choose how much information you disclose to the people you know. Whether you failed Maths GCSE or got an A* in English at A-Level, it is up to you who you share this with, and it is an infringement of the right to privacy if the government effectively tells people on your behalf. Schoolchildren do not, in effect, get to choose whether they are included on the NPD; given that no young person has given consent for his or her record to be made public, the data it contains should be carefully protected.

Secondly, there may well be more serious implications of being identified on the NPD. Given that most CVs include job applicants’ school and A-Level results, employers could identify many job seekers’ records on the NPD. Whether it is fair or appropriate to have your Key Stage 3 performance (years 7-9) influence a decision to employ you is certainly debatable. More concerning, however, is the fact that we do not yet know what other information will be included in the open NPD. Academic performance, school year and school attended are the bare minimum that will be included in the release, but there are around 500 other variables for each pupil contained in the NPD, some of which may be considerably more sensitive than exam results. For example, the full NPD includes columns for free school meal eligibility and special needs status; while the Cabinet Office has said that these will definitely not be included in the release, there may be other similarly sensitive variables (or variables directly related those considered considered sensitive) whose privacy implications have not been properly examined.

The Open Data release from the NPD needs to be designed in a way that inherently protects anonymity and prevents records from being identified, so that there simply isn’t the risk of sensitive information being disclosed. A dataset of 500 variables has complex interactions, and the DfE shows no signs of being aware of the issues that a superficial assessment should have raised.

The government has failed to put the work in for the NPD exams, it has failed the mock exam, and it shows no sign of doing any better in the test that may determine its future.