Defining personal information

Last week in DC I had lunch with an old colleague of ours, a CPO of company. We had a wide ranging discussion, but the most fascinating discussion was the term 'personally identifiable information', or 'PII'.  In case you were wondering, privacy advocates do spend much of their time talking shop, and sometimes quite arcane issues arise -- but we're not entirely boring people.

I never liked the term 'PII' -- it is a very North American term, and it is laden with 1980s notions of identifiability.  That is, what is the difference between personal information (PI), personally identifiable information (PII), and sensitive personal information, personal health information (PHI); and how do all of these differ with de-identified personal information, and even just plain old identifiers that may not be personal, or information that may appear to have no personal information at all?

This matters a lot if you are trying to draft a law to protect personal information.  But the exact constitution of personal information, or even PII, is still uncertain.  In 2007, for instance, Europe's privacy regulators tried to define the process around defining personal information. How do you regulate something you can't define?

My colleague then put the question to her network of colleagues and friends (yes, more privacy geeks).  I de-identified the responses (to some extent) below, where 'PI_Friend' is my lunch partner, and De-identified Individuals are numbered accordingly.

The interesting thing about the conversation below is that this is a discussion amongst privacy professionals, and yet nothing clear emerges other than the dislike for the term of PII, and an acceptance that there is much grey-ness in the very notion of personal information.  This is going to make the future of regulation quite interesting.


PI_Friend:  Fellow privacy professionals--looking for the best definition of PII...whaddya got?

DII1: any information that can be used to identify, locate or contact a natural person

DII2: I am a purist. I don't like the second "I", just PI. I like that personal information is "Any information about an identifiable, locatable, or contactable individual." I like that it is "about" the individual. I never know what "natural" is anymore.

DII2: But the other thing is that some information may not be precisely this (individually identifiable, locatable, or contactable) but still may carry privacy-like obligations. Since "non-personal information" really does not cut it, I think we need some new terms for the quasi-personal information out there.

PI_Friend: WHAT is quasi-personal information? And I assume your P and I are just Personal Information?

DII3: First, yes, Personal Information. "Quasi"-personal information is the "non"-personal information of the new decade. It separates information about people who cannot in the traditional or direct ways be identified, located, or contacted. However, the information still may carry privacy-like rights or obligations. Like for OBA profiles. I just think that the way technology is going, we need more than the black and white definitions from the last millennium.

DII4:  I must weigh in with DII3 about the PI as I have avoided the term PII for the last 12 years.

DII5: I think of PII as information that can be used to uniquely identify, contact, or locate a single person OR that can be used with other sources to uniquely identify, contact, or locate a single person. Some information that may appear to be non-PII - using the various existing definitions of PII - can be used to identify an individual when combined with multiple pieces of other information or when used in a specific context.

DII3: In a sort of quasi sort of manner or fashion unlike "somewhat" or "to some people" or, in the case of our OBA friends, "them's fightin' words." Let's all get together and rethink our vocabulary.

DII1: I'm with DII3 on PI (not PII, and also not PHI, NPPI, NPPFI, III or any of the other crazy acronyms we use). I not sure about the quasi stuff. The BA folks say their data isn't PI as it's not "identifiable" but it's used for profiling and pushing through ads - to me that's "contact" - maybe we need to define those words "identify" "locate" and "contact" instead.

DII5: Are you looking for a definition that describes specific data objects, or a definition that takes into account the fact that all of this is contextual? For example, My last name and the zip 10025 is not PII, but my last name and the zip 94109 *is*.

DII6: BTW - TRUSTe's definition: "Personally Identifiable Information [PII]" means any information collected, either by itself or in conjunction with other information (i) that can be used to identify, contact, or locate an Individual or (ii) where identification or contact information of an Individual can be derived. To the extent any information (which by itself is not Personally Identifiable Information) in combination with other information (whether or not either element of information is PII or not) can be used to identify, contact, or locate an Individual, then the combination of such information also will be considered Personally Identifiable Information.

DII7: Would quasi-personal information be the kind of information that by itself may not be identifiable, but could be aggregated with other information to identify the person (I'm thinking about the questions that many sites are using to verify people - your car's first color, mother's middle name, etc)?

DII8: if seeking a universally agreed upon definition, perhaps PII stands for Prohibitively Impossible Inquiry, Plausibly Indefinable Item, but then again, Perchance I’m Idiotic...

DIII9: I find trying to define PII to be a fairly pointless exercise. The real question is whther the data has any potential privacy impact - now or in the future. What we think is anonymous today could become identifiable tomorrow. So if forced to come up with something, I would strip it down even more than DII3's suggestions - something like "any information about a person." An OBA profile, an IP address, even a cookie ID can potentially have a privacy impact, and should be protected as such. But not all PII is created equal, so the next (and more difficult and more important) question is what level of protection is warranted for this particular PII.

DII3: Right, the point is, "So, I have this information. What are my obligations? What are the business risks and what should I do about them? What are the risks to people and what should I do about them?" Those are the three questions that need to be answered for information about, derived from, associated with, near, generated, etc. people, mutants, extraterrestrials, etc.

