The mystery of the Amazon Echo data
Image: Anatomy of an AI system: a map of the many processes — extracting material resources, data, and human labor — that make an Amazon Echo work. Credit: Kate Crawford and Vladan Joler
With over 6.3 million Amazon Echo devices worldwide, there is a good chance these constantly active devices will record criminal behavior.
Bloomberg, who recently reported on yet another creepy feature, that Amazon workers are listening to what you tell Alexa, were told by workers that audios shared on internal chat rooms include a child screaming for help and potential sexual assaults.
Smart speakers are constantly active. They listen in short snippets (a few seconds) in order to detect whether the ‘wake word’ has been used and then delete the information if the word is not detected. There are multiple stories about devices wrongly detecting the wake word, recording unexpectedly, speaking when a wake word has not been used, and even sending audios to strangers.
Whilst Amazon and Google claim that no audio is stored on the device or sent to the Cloud unless the device detects the wake word, this is not the whole picture. The details about the collection of information at this stage of the process are vague. Amazon’s FAQ’s describe how customers can review voice recordings associated with their accounts and delete them. This relates to Cloud stored data. No mention is made of data that persists on the device itself.
Data that a device may store that is not transferred to servers is an issue that arose in the notorious murder case involving an Amazon Echo.
In November 2015, a man in Arkansas had friends over at his house to watch football. In the morning, one of his friends was found dead in the hot tub. Police filed search warrants with Amazon, requesting any information that may have been recorded by the suspect’s Echo speaker. It was widely reported that Amazon refused to give law enforcement access to their servers, until the suspect himself decided to hand over data.
What was not widely reported, was that there was a disagreement about what kinds of data could be on the device itself. The police not only filed a warrant to get access to Amazon’s servers, but also to get access to the device itself.
The Amazon Echo is not designed to store and process large amounts of data. The device has microphones that are “always on” and process data in the Cloud. As already noted the devices record snippets of voice data whilst waiting for the ‘wake word’. Only then do they transmit the audio to the Cloud to be processed. Google states that for their smart speaker these snippets are ‘deleted’.
Is Google Home recording all of my conversations?
No. Google Home listens in short (a few seconds) snippets for the hotword. Those snippets are deleted if the hotword is not detected, and none of that information leaves your device until the hotword is heard. When Google Home detects that you've said "Ok Google" or that you've physically long pressed the top of your Google Home device, the LEDs on top of the device light up to tell you that recording is happening, Google Home records what you say, and sends that recording (including the few-second hotword recording) to Google in order to fulfill your request. You can delete those recordings through My Activity anytime.
Amazon is less clear, merely stating:
3. Is Alexa recording all my conversations?
No. By default, Echo devices are designed to detect only your chosen wake word (Alexa, Amazon, Computer or Echo). The device detects the wake word by identifying acoustic patterns that match the wake word. No audio is stored or sent to the cloud unless the device detects the wake word (or Alexa is activated by pressing a button).
However, referring only to audio data is not the full story.
In the Arkansas case, Amazon referred to the Bentonville Police’s statement that they were able to “extract the data” from the Echo device and did not contest it. It only stated that “Data extracted from the Echo device would not have included any voice recording from Alexa because such data is stored remotely.”
Such disagreements indicate that we do not know enough about our devices and what they may or may not record. We are largely ignorant of the full range of data that connected devices generate about users, what is collected by servers and what persists on the device itself and thus could be extracted by those with the technical means. Unless we have the requisite skills, it is extremely difficult to gain insight and mechanisms such as subject access requests, where data protection laws exist, are unlikely to give the full picture. To illustrate this, we made a subject access request to Amazon in relation to an Echo Dot.
Amazon stated in the investigation into James Bates that “No audio recording of the user’s requests is stored on the device itself.” Privacy International’s own subject access request to Amazon revealed that the Amazon Echo Dot itself holds logs. In correspondence Amazon stated that:
“With regard to your request for personal data held on your Echo Dot device, we are unfortunately unable to provide you with this. In order for us to extract the data, in this case usage logs, the device needs to be placed online, after which a request of the logs to be uploaded can be processed and the data gathered. As your Echo Dot was not connected online, we hold no data in relation to your Echo Dot.”
Privacy International connected the device and sought usage logs and other data, as well as posing a number of questions, the responses are as follows:
“Echo Dot devices store a limited amount of data locally. Although we have no obligation to provide you data that is held locally on your device and that we don’t process, since you have specifically asked for it we have extracted that data from your device…”
What data is held on the Echo Dot which is not uploaded to the cloud i.e. in addition to usage logs?
As mentioned, the amount of data that Echo Dot devices store locally is limited. Some data uses small caches that are constantly overwritten, such as our on-device technology for detecting the “wake word” and device logs. Other data is retained on the device, such as information about the customer’s Wi-Fi network, the wake word the customer has configured for their device (“Alexa,” “Echo,” “Amazon,” or “Computer”), and alarms the customer has set. Customers are able to view and change their device settings in the Alexa app.
Is it possible for the owner of the device to obtain and view the data held locally, when it has not been either uploaded to the cloud or extracted upon request by Amazon?
No, the owner of the device generally does not have access to this cache.
Is it possible for the owner of the device to delete that data without a request to Amazon?
Yes, the owner of the device may perform a factory reset by following the instructions at https://www.amazon.co.uk/gp/help/customer/display.html/ref=help_search_1-1?ie=UTF8&nodeId=202080910&qid=1520948814&sr=1-1.”
As connected devices such as the Amazon Echo play an increasing role in criminal proceedings, we are concerned that:
- We do not know what data connected devices in the home may collect, including accidentally;
- We do not know what they store on the device;
- We do not know how long the data on the device remains and whether it is ever fully erased;
- There is a risk of unequal access to data, whereby the police believe they can access data that the owner of the device cannot or does not know exists.