No boundaries: Exfiltration of personal data by session-replay scripts
Websites have long used third-party analytics scripts to collect information about how visitors use their sites. In November 2017, researchers at Princeton found that an increasing number of sites use "session replay" scripts that collect every action the user performs while on the site, including mouse movements, keystrokes, scrolling behaviour, and the complete contents of pages loaded. Users logically expect the sites to receive typed data only after they're pressed the "submit" button, but all keystrokes are collected without any indication to the user that this is happening. The collected data is sent to third-party services that can replay the session as if it were live; the data cannot be anonymised. Sites claim that their purpose is to gain insights into users' interactions with their sites and to identify broken or confusing pages.
The study went on to examine seven of the top session replay companies, including Yandex, FullStory (used by US drugstore giant Walgreens), Hotjar, UserReplay, Smartlook, Clicktale, and SessionCam. These are in use on 482 of the Alexa top 50,000 sites.
These recordings can leak an enormous amount of data, some of it sensitive information such as credit card details, passwords, and medical conditions, to third parties. Some of the services technically require the redaction of personal data before sessions are submitted to them, but the researchers find that automated submissions mean the redactions are imperfect and partial. In their tests, all displayed page content leaked. Ad-blocking lists such as EasyList and EasyPrivacy do not block FullStory, Smartlook, or UserReplay; EasyPrivacy has rules to block Yandex, Hotjar, ClickTale, and SessionCam. The researchers scanned the configuration settings of the Alexa top 1 million publishers that use UserReplay, the only one that allows publishers to disable data collection from users with the Do Not Track flag set in their browsers, and found that none honoured the DNT signal.
People must know
People must be able to know what data is being generated by devices, the networks and platforms we use, and the infrastructure within which devices become embedded. People should be able to know and ultimately determine the manner of processing.
Limit data analysis by design
As nearly every human interaction now generates some form of data, systems should be designed to limit the invasiveness of data analysis by all parties in the transaction and networking.
Control over intelligence
Individuals should have control over the data generated about their activities, conduct, devices, and interactions, and be able to determine who is gaining this intelligence and how it is to be used.
We should know all our data and profiles
Individuals need to have full insight into their profiles. This includes full access to derived, inferred and predicted data about them.