Whisper it: there’s more to personal data than “PII”

Inigo Montoya: "You keep using that word..."If you’re running a social media app that “promises users anonymity and claims to be the ‘the safest place on the internet’,” you probably don’t want to be splashed across the Guardian for tracking the location of users, “including some who have specifically asked not to be followed.” And you really don’t want the report to include quotes from your executives like this (concerning a user who claimed to be a “sex-obsessed lobbyist in Washington DC”):

“He’s a guy that we’ll track for the rest of his life and he’ll have no idea we’ll be watching him.”

So, it looks like Whisper may have (to put it mildly) a reputation problem to address.

But the phrase that leapt out at me was this, from the company’s response to the Guardian’s claims:

Whisper does not request or store any personally identifiable information from users, therefore there is never a breach of anonymity.

It’s our old friend, “personally identifiable information”. This one comes up a lot in contracts relating to the use of personal data: service providers will insist that they are not accessing or using any “personally identifiable information”, and so there’s no problem with privacy or data protection compliance.

The problem is that the definition of “personal data” under the Data Protection Act goes rather wider than the popular conception of “personally identifiable information”, or “PII”. This phrase, as used in commercial practice (though see below), seems to lack any precise definition, but is usually taken to mean information such as name, address, email, social security number, etc.

The Data Protection Act, however, defines personal data as

data relating a living individual who can be identified:

(a)  from those data, or

(b)  from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller.

The point to note is that there are two limbs to this definition. If “personally identifiable information” has any precise meaning at all, I’d say it’s as the first limb of that definition: information from which a living individual can be identified directly.

But that still leaves the second limb: information which isn’t enough, on its own, to identify an individual, but which could be used to identify that individual if combined with other data. And the point that gets overlooked by people fixated with “personally identifiable information” is that this “second limb” information is just as much personal data for UK/EU data protection purposes as the “first limb” information.

Indeed, it should be noted that the legal definition of “personally identifiable information” under US law (if Wikipedia is to be believed) also includes this concept of “indirectly” identifying information. So it’s not even a “US vs EU” issue: it’s a “commercially convenient meaning” vs “actual legal meaning” issue.

The Guardian reports that Whisper has updated its privacy policy to warn users (somewhat belatedly) that the app’s geolocation feature may “allow others, over time, to make a determination as to your identity” – which, to my mind, amounts to an admission that this information is personal data within that second limb.

It remains to be seen how the Whisper story will play out, but it’s already a good lesson in the problems of widely-used but vaguely-defined (or misunderstood) terms like “personally identifiable information”, and the need for lawyers to be tediously pedantic in insisting that even so-called “non-PII” may still be subject to the rigours of data protection legislation.

Schofield’s Laws of Computing: your cut-out-and-keep guide

Jack Schofield (and Pipe), by Aleks Krotoski

Photo by Aleks Krotoski.

The Guardian’s computer editor, Jack Schofield, has formulated three “laws of computing” over the years.

I’ve found these useful to keep in mind, both in my own personal use of computers and in advising on IT contracts, so here is a quick post bringing all three together in one short list:

  • Schofield’s First Law: never put data into a program unless you can see exactly how to get it out.
  • Schofield’s Second Law: data doesn’t really exist unless you have at least two copies of it.
  • Schofield’s Third Law: the easier it is for you to access your data, the easier it is for someone else to access your data.

Of those, I’d say the first is the most useful in a commercial IT context. (This post was prompted by reviewing a contract in which our ability to do this isn’t as set out as clearly as I’d like, though I’m sure it’s not going to be a problem in practice.)

In my personal IT use, it’s the second law that is the one I always keep in mind – and that I try to find a gentle way of pointing out to people when they are mourning the loss of family photographs in a hard drive crash or laptop theft. Dropbox is your friend, people! 

The third law is both vaguer and probably of wider application – particularly after the Year of Snowden has highlighted how porous supposedly “secure” IT systems can be.