Why Twitter has taken the wrong fork in the road
Earlier this week it was announced that UK-based Datasift would start offering their customers the ability to mine Twitter’s past two years of tweets for market research purposes. The licensing fees will add another revenue stream to Twitter's portfolio - but at what cost to the company's reputation? Twitter, once the darling of the privacy world, seems to have lost its way.
Datasift don't believe that there are any privacy implications to their new service. In fact, they didn't mention privacy even once in their announcement. They just threw the statement in there that there was no controversy because this was 'public' information anyways.
The irony for DataSift is that either this datastore is unique (so that DataSift can seem special in their unique form of access to the 2 years of tweets), or it is 'public' so DataSift has nothing new to offer under the sun. If it is the latter, then this is a non-story. But clearly it is a story because otherwise DataSift wouldn't be so proud.
In the abstract, you still have a right to privacy in a 'public space'. This is a topic for a much longer discussion. Imagine CCTV cameras set up for security then making their feeds available to the shops on the high street to see how long you spent looking into the shop windows of Victoria's Secret or Boots. Just by walking around outside with your mobile phone on, companies like Path Intelligence can access information about your phone in order to track your movements. Have you no rights in these situations? Of course you can choose not to go outside. Or, as we have in a democratic society, you retain rights. Anyone who collects your personal information must take some care in how it is processed.
Of course these examples are different from the current case, but just because something is public doesn't mean that it can be appropriated and sold, or you can be subjected to unwarranted surveillance. It's not as though you have complete rights over this information either; but you still retain some rights. I know this is a sticky point, but just because something is in the public domain doesn't mean that it's open season on your data.
If Twitter is conversational, which it is, it is akin to speaking in a pub, or speaking amongst friends and colleagues in a large room, or even in a park. You don't expect that others will be around just to judge you; but I guess on Twitter we learn to live with that (though we can block some people from following us for instance, though they can still access via the website and RSS feeds). But what you don't expect is two years of your conversations to be monitored, profiled, and sold off to others. If someone was following you and did this, you would be severely annoyed. If a company does this to make money from your information and make your thoughts part of a market-research initiative, you would have some reason to be perturbed.
Twitter controls how you can get access to its historical data. Legally, in this case, it is 'Twitter's data'. DataSift is proclaiming that they are the first company to have access to this because previously Twitter guarded this information carefully. Yes, other Twitter users could get access, but not easily for scraping purposes.
As a user of Twitter you expect that what you say will be accessible to others but you don't expect that this information will be data mined. So yes, one of your tweets could result in a news story (as many celebrities and politicians discover, or enjoy). You don't expect that your tweets over a two year period will be dissected to see your attitudes towards a company. Public interest is legitimate and an exemption from privacy law; but helping market research isn't.
Now if you use Twitter and say anything about a political party, about a product, about a place you visited, etc. your tweet will become part of a dataset that will be mined and sold onwards. You have to admit that this isn't the same Twitter we thought it was.
Even Google and Facebook don't do this with your information, even with information that is 'publicly available'. Those two companies are VERY careful about the fact that they do not sell information to third parties, and they would be burned alive if they allowed a third party to 'scrape' information from their sites.
Twitter 'owns' your tweets. They govern who has access to it. Law Enforcement authorities want access to the detailed data that Twitter holds, and Twitter certainly doesn't hand it over without a fight because they know it is your personal information.
The fact that they are giving another organisation priviledged access goes to show that they can make decisions about how the information is processed. That makes Twitter, under data protection law, a data controller. As such they have duties -- they need to notify us of how our information is being used; they need to place restrictions on how the information is further processed.
Access of this nature is unprecedented. DataSift is the first to note this because they want the acclaim. They are also getting more granular access to data -- which isn't entirely clear, but includes location, possibly IP address information from the dvice where your tweet was sent, etc. This isn't necessarily information that general Twitter users can access; and certainly not over a two-year period.
Individual users can't even get access to this information on themselves: we recently filed subject access requests with Twitter for personal information and Twitter responded with only 3-months worth of data. Possibly we should infer that Twitter only keeps this non-tweet-content information for only three months -- but now it is available for 2 years to DataSift? This doesn't make sense.
Second, DataSift has to follow some rules. They are taking the tweets and mining it and assigning moods etc. and then selling it onwards. Do you know that they aren't going through your Tweets and selling your asssigned behavioural profile to CocaCola/Walmart/etc.? What's to stop that subsequent firm then targetting you? Probably it's because there are things that DataSift will and will not do with your data. Well they have to be more explicit about what they are and are not going to do with this information. They can't just say 'it's all public' and wash their hands of any obligations.
Consider what would happen if there was a large public protest, e.g. a political event, a march, etc. Say the police wanted access to all Tweets and geographic information on all users who attended the protest. Previously they would have to ask Twitter under law. Now they could just buy this information from DataSift (and DataSift this week said in an interview on BBC News24 that the public sector could buy this information).
Governments are investing massively in social media monitoring. Why bother when you can just buy the analysis from a third company? The way that the law works is that it may be legally problematic for a law enforcement agency in a democratic country to mine the information; but it is perfectly legal to buy it from a company.
This is all about this ambition towards 'Big Data'. Some firms hope to make billions off this elusive concept (which is why I liked the other quotes in the media about how this information isn't particularly useful). They want to predict trends. Social media surveillance is big business to identify the next political revolution, the next movie that will make it big, the next big product. The grist for this mill of big data is personal information.
This is arguably true, but companies need to make the argument that they have a right to this information in the first place, that they have a right to mine it for their own purposes even though it is yours. And it is all done without you being able to see what is being decided based on your past actions and preferences.
Every day Google gets requests for takedowns of search results emerging from 'public' information. This is made possible because inidividuals can see what information is indexed and identify what is problematic (and sometimes places people at risk). Google is responsive and responsible most of the time in these cases (provided it doesn't interfere with free expression, and we nearly always agree with them). And we know what Google is doing with this information for the most part. Instead, Datasift is getting vast amounts of information and doing things we don't know what they are doing, how long they are keeping it for, who they are selling it to, and how that information is then being used. Do you remember what you said two years ago? No. Can you easily get access to it? Not really. Can your friends? Not easily. But can DataSift and all their thousands of clients? Absolutely.