This weekend, the Department for Education sponsored an "appathon", allowing attendees access to the National Pupil Database (which holds information like exam results, special education needs, truancy records and eligibility for free school meals on every child at every state school in the country) and inviting people to build "apps".
The database contains over 400 variables and the records of around 600,000 children. With so many variables, it is a relatively simple task to identify individual children who in any way stand out from the crowd, e.g. those who've performed unusually well in rare subjects. The kind of information the database holds is extremely sensitive and children may have gone out of their way to conceal it from their classmates. Make no mistake - this is intensely personal stuff, not "open data", and any suggestion otherwise betrays a fundamental misunderstanding of both categories. Accordingly, additional safeguards of process and content must be applied.
This data is not condusive to an "appathon" environment. It is a good thing that the DfE are considering new approaches, but this is important data of significant utility, that must be treated and protected appropriately. A public event, with participants solicited on Twitter and no transparency of process, was not appropriate.
Last week, we asked four simple questions of the organisers in an email:
We have noted with concern the "National Pupil Data Appathon" this weekend, and that, by nature, this data is intensely personal for a protected group: school children. We note that this data is a mandatory, detailed, database of every state school child in England, and significant care should be taken over its use, to prevent harm to those who have no choice on inclusion.
Can you please clarify the following 4 points:
1) What data access arrangements will attendees be provided with?
2) What legal commitments will attendees be required to make?
3) What data protection/managment guidance will be given to attendees?
3b) Given the use of an API, will synthetic data be provided for
4) Who gave permission in the first instance and can we see the letter
We thank you for your time, your quick and detailed response.
Last Friday, the organisers' response was "No comment".
With Secretary of State Michael Gove attending the event on Sunday, we had hoped that the DfE had satisfied its data custodian requirements. However, this writeup
of the event subsequently made it clear that our concerns were in no way theoretical. Stating that one child was identifiable based on one data point, the author also states that another person can connect the other 399 data points to the same child. The DfE's legal responsibiltiy for maintaining the confidentiality of every student seems to have been breached.