You are here

Metadata

What is metadata?

When we interact over our modern communications infrastructure, the content of our communication is not the only information we send. We also send data about the communication that allows the communication to successfully reach its intended recipient. Traditionally this is what was called communications metadata - data about data.

Under the traditional definition of metadata, this information about our communications would include:

  • the location that it originated from, e.g. home address of the telephone, subscription information, nearest cell tower
  • the device that sent or made the communication, e.g. telephone identifier, IMEI of the mobile phone, relatively unique data from the computer that sent a message
  • the times at which the message was made and sent
  • the recipient of the communication and their location and device, and time received
  • information related to the sender and recipients of a communication, e.g. email address, address book entry information, email providers, ISPs and IP address, and
  • the length of a continuous interaction or the size of a message, e.g. how long was a phone call? how many bits in a message?

And that’s just to name a few. When we use the internet we similarly leave a record – things we search for, websites we visit and the time we stay there and the files we download. Even our actions on social networking sites – to whom we are connected, what things we “like”, what advertising we respond to – creates a behavioural record. All of these actions constitutes communications metadata.

But metadata also includes data generated by our devices:

  • the precise location of our phone while on
  • where we were when our device checked for new emails
  • where we were when our device checked for any new social media updates, application updates or any similar automated checks

Again, that's just to name a few. This information is constantly being generated by our devices and in particular, our smartphones. The new and more expansive definition of metadata is such that we send metadata continuously, even when we are asleep, because our devices are in near constant contact with some component of the communications infrastructure.

Without appropriate protections on metadata, we can be tracked and profiled on an almost permanent and continuous basis.

If it's not the content of my communications, how does the collection of metadata invade my privacy?

Companies and governments that collect and store your metadata don't admit that this information can reveal a lot about you. In fact, metadata reveals even more about you than the content of the communications ever could.

The documents released by whistleblower Edward Snowden confirm that governments are actively seeking our metadata, which never lies and is always being generated. Because of antiquated legal interpretations, metadata is ironically considered less sensitive information even though it can be used to map your life, interests, and likely future.

Taken alone, pieces of metadata may not seem to be of much consequence. However, technological advancements mean that metadata can be analysed, mined and combined in ways that make it incredibly revelatory. When accessed and analysed, metadata can create a comprehensive profile of a person’s life – where they are at all times, with whom they talk and for how long, their interests, medical conditions, political and religious viewpoints, and shopping habits.

Metadata paints a picture about an individual’s patterns of behaviour, viewpoints, interactions and associations, revealing even more about that person than the content of their emails or phone calls might. The mere fact that such information is collected and stored by internet companies shows how valuable it can be. Applying this capability across a nation, it is possible to compile a very detailed and invasive picture of the entire population including their behaviours and interactions.

Such a powerful insight could be used to predict future behaviour including voting behaviour or sentiment of the population towards officials. This can fundamentally undermine democracy and decrease confidence in communications technologies that underpin modern societies and economies.

It is for this reason that PI and other privacy organisations have been arguing for stronger and equivalent protections for metadata. According to the International Principles on the Application of Human Rights to Communications Surveillance that we co-led, "Existing legal frameworks distinguish between "content" or  "non-content," "subscriber information" or "metadata," stored data or in transit data, data held in the home or in the possession of a third party service provider. However, these distinctions are no longer appropriate for measuring the degree of the intrusion that Communications Surveillance makes into individuals’ private lives and associations. While it has long been agreed that communications content deserves significant protection in law because of its capability to reveal sensitive information, it is now clear that other information arising from communications – metadata and other forms of non-content data – may reveal even more about an individual than the content itself, and thus deserves equivalent protection. "

Even though governments want to downplay the value of this information as they rush to gain access to it, the former head of the CIA, Michael Hayden, remarked in 2014: "We kill people based on metadata".

Why do companies and governments want metadata?

By generating a profile of an individual’s private life and his or her interactions, companies are able to more accurately target marketing and advertising. With 96% of Google’s $37.9 billion revenue in 2011 coming from advertising, our information is the life-blood that sustains that industry.

It is no wonder, then, that governments are doing all they can to get their hands on our metadata. Leaked classified NSA documents show that the US government has acquired access to phone metadata from most major US phone providers. In effect, then, the government can generate a complete profile of every single person who communicates within, to, or via the US. Even if, as we’re told, the NSA is only interested in analysing the private lives of terrorists, this nevertheless places an extraordinary amount of power in the hands of the US government. 

The NSA is not the only ones getting their hands on our metadata. In Britain and Australia, governments seek access to metadata held by domestic phone companies and ISPs around 500,000 times a year. In South Korea, there were around 30 million requests for metadata in 2011-2012 alone.

Since metadata reveals relationships, access requests for this information are often broad. They are used to generate a picture of interactions, by looking at a number of "hops" away from you, usually three in total.

So, this means that innocently associating with someone who associates with someone being investigated using this type of surveillance automatically places you within the net, without any suspicion of wrong doing on your part. Furthermore, since you have no control over other people's activities, you may be powerless to prevent the state from inspecting every detail of your life.

The companies who are part of the global surveillance industry are marketing and developing tools to analyse metadata. ETI A/S' Evident provides, in a basic package, 1,000 selectors relating to metadata including IP address, Chat nickname, Email address, User login name, Webmail login name, and more.

With so many points of data available to be selected and analysed, the digital trail that we generate and our devices make available, taken together all the pieces give a more accurate picture of our lives than even we may be aware of. And the way it is used means that even if you've done nothing wrong, you are likely to be included in dragnets.

Do I have any legal protections against the misuse of my metadata?

Metadata is accorded far fewer legal protections than the content of our communications. Whereas in many democratic States law enforcement authorities require judicial orders to tap our phone calls and read our emails, a simple written request may be all that is necessary to access detailed records about our private lives. It is not only law enforcement authorities that are entitled to make such requests – a large variety of public bodies may also have access to our metadata, from local government authorities to revenue watchdogs. When the entire justification for extensive metadata collection and disclosure is the prevention of terrorism, it is difficult to understand why our sensitive, private data is ripe for the picking by authorities that play no role in policing or security.

Recently, however, there have been significant developments when it comes to collecting metadata. In particular, the European Court of Justice invalidated the European Union’s 2006 Data Retention Directive policy, stating that the mass collection of metadata is an interference with the right to privacy, and access to this data cannot be justified under vague references to combating serious crimes or terrorism. The Court said that if access to this sensitive data is granted, such access must be subject to prior review "carried out by a court or by an independent administrative body."

Further afield in other jurisdictions, the Supreme Court in the Philippine's struck down a section of a bill that would have allowed law enforcement to intercept, among other things, a communication's origin, destination, and route of a message. The court decided that the insight that metadata provides was too great and had to be considered an interference with right to privacy. Because of this, the lack of a warranted procedure being in place, the court removed the section from the Act.

Even the White House Review in response to the Snowden revelations came to a similar conclusion, while referencing our International Principles. "The assumption behind the argument that meta-data is meaningfully different from other information is that the collection of meta-data does not seriously invade individual privacy. As we have seen, however, that assumption is questionable. In a world of ever more complex technology, it is increasingly unclear whether the distinction between “meta-data” and other information carries much weight.[...] the legal system has been slow to catch up with these major changes in meta-data, it may well be that, as a practical matter, the distinction itself should be discarded."

One modern challenge in protecting metadata however is that your metadata can be generated in multiple jurisdictions and accordingly it may be unclear what legal regime applies and which governments can have access to it.

Are there any technical means to protect my metadata?

While it is possible to secure the content of messages with some ease, it is more challenging to protect metadata. There are a number of approaches to tackling the metadata problem; each of these addresses a narrow part of the issue and cannot deal with the totality of your metadata picture.

The TOR project uses encryption and a set of computers operated by volunteers to package the metadata in such a way that communications appear to come from this set of TOR computer and not the actual users. The websites therefore do not know the IP address of the users. Furthermore, by using multiple intermediate nodes, no one TOR node will have the complete metadata picture of the communication. An extension to this allows for users to interact with hidden services that they do not know the ultimate destination address. In combination, these two approaches make it very difficult for an adversary to collect all information about a users interaction with a hidden service over the TOR network.

Other solutions, such as proxies and virtual private networks (VPN), offer some degree of privacy protection with respect to third parties but the proxy and VPN services themselves know your address and can be compelled to turn over their logs or generate logs that they advertised they would not do.

All of the solutions presented above do suffer from some drawbacks and can have vulnerabilities that unmask a user in certain circumstances. It is important that users familiarise themselves with these issues before using the software. For example, if all your traffic is flowing via one of the solutions above and one of your online accounts checks for updates, it may be possible to correlate your anonymised activity to your actual identity.