PI response to ICO consultation on web scraping by generative AI

PI responded to the ICO consultation on the legality of web scraping by AI developers when producing generative AI models such as LLMs. Developers are known to scrape enormous amounts of data from the web in order to train their models on different types of human-generated content. But data collection by AI web-scrapers can be indiscriminate and the outputs of generative AI models can be unpredictable and potentially harmful.

An Electric Control Cabinet Production Factory

Photo by İsmail Enes Ayhan on Unsplash

Generative AI models are based on indiscriminate and potentially harmful data scraping

Existing and emergent practices of web-scraping for AI is rife with problems. We are not convinced it stands up to the scrutiny and standards expected by existing law. If the balance is got wrong here, then people stand to have their right to privacy further violated by new technologies.

The approach taken by the ICO towards web scraping for generative AI models may therefore have important downstream repercussions for the future of people’s information rights online.

Our response to the consultation discusses in more detail the following three matters:

  1. The risks of an overly permissive approach to the “legitimate interests” test leaving the door wide open for personal data to be misused or abused in the future;
  2. The barriers to exercising information rights in the context of “invisible processing” activities like web scraping; and
  3. The potential benefit of a public registry system for generative AI models.

Download our full response to the consultation below.