Give up on privacy. This is what you should worry about instead

Privacy is an illusion in the world of big data and data mining

Privacy is an illusion. It simply doesn’t exist. You probably don’t believe me. Until recently, it was easy for us to have the illusion of privacy. Now, with accelerating increases in computational power and storage, and the incredible sophistication of data mining and machine learning, that fake veil of “privacy” is about to be torn down–and it will be shown to have been nothing more than a mirage in the first place.

The argument is simple, and comes in two parts. First, there is a syllogism:

Given: There is some set of data {x} that is public
Given: There is a set of data {y} that can be inferred or computed based on {x}

Then: The set of data points {y} must also therefore be considered public.

It’s a little abstract, but it’s tough to dispute. If your address is publicly listed as 1234 Main Street, Unit 1 and 1234 Main Street is publicly listed as an apartment building that only contains rental units, then logically one can conclude that you rent your apartment. The fact that you are a renter is therefore also public information. How could it not be?

The second “given” in the syllogism can be stated as a mathematical function,

{y} = F({x})

This is just another way of saying that there is some function F, some calculation that you can perform, that lets you figure out y from x. If that’s true, it’s difficult to justify any argument that {y} could be private if {x} is not. Sure, one might hope that the calculation of F is really really hard, and the nobody would bother doing it. But that’s not a principled argument.

It’s like what computer security experts refer to as “security through obscurity“: a system isn’t actually secure if you simply “hide” the way to get in and hope nobody finds it. It’s like hiding porn from your spouse by layering it in 7 layers of folders called “New Folder” or “Work Stuff”. It’s not really making your porn files private or secure: it’s just hoping your spouse doesn’t take the time to crack your James Bond-like system.

And as our technology gets better and better, there will eventually be almost no functions F that are too difficult to solve. This leads us to the second part of my argument, which is this hypothesis:

As our technology and computational abilities improve, the set of “private” things that cannot be computed or deduced from public information will become vanishingly small.

I do not have proof of this hypothesis. But I also cannot think of any scenarios that concretely disprove it either. Let’s take a look at some examples of what we think of as “private” information, so that you can see what I mean.

Where you live

“Doxxing” is usually defined as publishing someone’s private information, such as their home phone number or address, on the internet. But most acts of “doxxing” are nothing more than someone finding a person’s phone number or address in one place on the internet (Google, online whitepages, city or state public records, domain name registration companies) and re-posting it to another place on the internet (Facebook, Twitter, 4chan or Reddit). It hardly seems like a “breach of privacy” to move information from one public place to another.

Twitter's policy on Google and doxxingTwitter has even acknowledged this in its approach to doxxing reports. Twitter has decided that if someone posts your address on Twitter, it is not a breach of privacy if they got your address from Google. This makes sense: if your information is already on Google it can hardly be considered “private”. Moreover, this surely isn’t restricted to just Google. If any information that is already somewhere on the internet is already defined as “public”, Twitter’s policy effectively nullifies doxxing.

Moreover, think of how our technology is improving. Think about what is visible in the backgrounds of your selfies. Think about the houses that are visible in the videos you took of your cat out on the front lawn. Could a large-scale data-mining system use publicly available photos to match against your photos to figure out where you live? Even if the answer currently is “no”, you can be sure that eventually that answer will be “yes”.

You don’t take Facebook photos or selfies? That’s fine. Most likely you drive on public roads to and from work every day. All of your movements over public highways and land are public data. Which means that anything that can be inferred from that data is also public information. That means where you live and where you work are both public information.

“Isn’t it stalkerish for someone to be researching where I drive every day?” Maybe in the past it was. In the past, your movements over public highways were only recorded if someone was deliberately recording your movements. But that’s just not the case any more: companies like Avigilon, Cisco and Human Recognition Systems are forming massive networks of video infrastructure that have the ability to record and analyze literally every person and every event visible in urban public spaces, every single moment of the day.

We may not be to the point that just anyone can “research” your traffic movements over public land, but that doesn’t mean they are private. It’s still public data. It just means that gathering all of the {X} to compute {Y} is hard. But as our technology improves, as computational speed and power increases, the ability to casually search and analyze public traffic records will be inevitable.

It bears repeating: if where you live ({y}) can in principle be calculated from public traffic records ({x}), then where you live is already public.  Just because nobody is bothering to do it yet doesn’t mean it’s actually private.

Your personality and state of mind

Speaking of that network of cameras that record your every move over public property: they can also see details of your face, your posture, your movements. With the massive amounts of data, and incredibly intelligent data-mining techniques that we have available today, these public data points will soon be used to compute your state of mind, your mood, and even features of your personality.

Does it sound like science fiction? It’s really not. There is already a software company out there that is being used by call centers to use artificial intelligence to detect subtle features of your tone of voice when you call into a help line that will profile your mood and your interaction style, so you can be connected with a call center service representative who is a “good match” for you.

DirectTV is already using incredibly deep data-mining algorithms to learn things about you based on nothing more than your channel-browsing habits. The millions of micro-data points, ranging from how quickly you flip from channel to channel to which stations you pause on and for how long, can tell DirectTV if you are single or have a spouse, and even if that spouse is in the room with you at the time.

Remember our equation: if a conclusion can be calculated from public data, then that conclusion is also public. The data obtained by DirectTV is not public, but the complex psychological things that they can learn from you illustrates exactly how much you can “give away”, without realizing it, in your casual behavior. Ever little movement, every facial expression, every gesture that can be captured by cameras (or microphones) while you are walking in a public place is public data.

Which means that anything that can be computed from those–from anxiety level to personality disorders–is also public.

Your naked body

Another classic thing that people like to claim is “private” is their naked body. I have actually argued that this is stupid, but hey it’s how people feel.

But here’s the thing: the sheer computational power and number-crunching ability that we will see with our technology over the next few decades will uncover massive new horizons of things that can be simulated and computed about the physical world. We have data on the exact flexibility and tensile properties of fabrics. We have knowledge of the physics of movement. We can simulate the way we expect objects to interact with each other and move against each other.

Do you really think computer simulations will not be able to figure out what your body looks like based on how your clothing hangs on you and moves around you when you walk?

OK. Sure. Keep thinking that.

In the mean time: what else you got? What’s something that is supposedly “private” that cannot in principle be calculated from public data, given enough computational and mathematical power? Let me know.

This is what we should be thinking about instead

If I’m right, and the number of things that cannot be computed from public data is vanishingly small–given sufficiently powerful computational ability–then privacy is a myth. We actually never had privacy: we only had the illusion of privacy because we didn’t have the technological capabilities to draw out all of the available data to their logical conclusions.

But that doesn’t mean you should freak out. It also doesn’t mean that we should just ignore the legal concerns that people associate with “privacy” in our society today. All it means is that we need to re-frame these problems in a different way.

Instead of focusing on privacy, we need to focus on data abuse.

The simplest example of this goes back to doxxing: You don’t need a concept of privacy to protect people from threats and harassment, or their homes from vandalism. Threats, harassment and vandalism are already crimes, and will remain crimes even when we give up the notion of “privacy” entirely.

Thinking about the big data work that Direct TV is doing suggests another cautionary tale: you cannot stop DirectTV from obtaining data about your channel browsing habits, but you can definitely make laws against DirectTV having their policies, rates or service depend on knowledge they gain from those browsing habits. To put it another way: you might not be able to prevent DirectTV from figuring out (based on your television-watching habits) that you are a sad, single man who lives alone with his dog and likes to drink until 2:00 am every night; but we can make laws against DirectTV abusing that knowledge by charging you more than other people for late night skin flicks.

Are you worried about your employer? You don’t need to cling to the concept of “privacy” to protect yourself, just make it illegal for them to make decisions based on personal traits or activities.

Are you worried about the government? Instead of desperately trying to retain “privacy”, just make sure there are restrictions on how they can and cannot act on the information they have.

Information is free and will always be free. In a future of incredible technology and artificial intelligence, everyone will be able to compute the most amazing personal details of everyone else’s life–based on public data–if they want to. The goal should not be to stop that from happening, but to stop people from doing bad things with that knowledge.