The New Oil

The New Oil logo
Understanding Metadata

Understanding Metadata

Ninety-five percent of the web is encrypted. That means that if you visit Proton, your Internet Service Provider (ISP) can see that you visited and how long stayed, but they can’t see your login credentials (username and password) or which exact pages you went to. This is done with the use of Transport Layer Security (TLS), a powerful and popular encryption protocol used online.

There are two problems with relying strictly on the current TLS model of the intert, however. First is that it only protects “data in transit.” When you connect to Amazon, your ISP can only see that you visited Amazon, but Amazon themselves can still see every page, click, and purchase without restriction. This opens up the risk of data breaches, malicious insider threats, and more.

Second, however, you often don’t need to see the content itself to start making powerful, accurate assumptions.

What is Metadata?

Metadata is often described as “data about the data.” For example, the content of an email is not metadata but who you emailed, what time, the subject, and the size of the email are. On the surface this may not seem very revealing. However, let’s look at some offline examples from the Electronic Frontier Foundation:

  • They know you called a gynecologist, spoke for a half hour, and then called the local Planned Parenthood’s number later that day. But nobody knows what you spoke about.
  • They know you got an email from an HIV testing service, then called your doctor, then visited an HIV support group website in the same hour. But they don’t know what was in the email or what you talked about on the phone.
  • They know you called the suicide prevention hotline from the Golden Gate Bridge. But the topic of the call remains a secret.

Metadata has the potential to be just as revealing as content itself, and therefore should be protected just as much as the actual data. These are not hypothetical abuses or situations. A former NSA Chief once said “[The US Government] kills people based on metadata,” referring to how metadata can reveal so much information that it can be used to justify military strikes. In another instance, police were able to determine a man murdered his wife based on the metadata from his smartwatch and CCTV cameras. I could list many more stories like these. Metadata matters.

Metadata is extraordinarily intrusive

How to Deal with Metadata

Unfortunately, any digital action creates metadata. The best you can do when attempting to protect your privacy is to be mindful of what metadata may be created by the action you’re about to take and then determine how to best reduce or mitigate it:

  • Reputable VPNs do not log your activity.
  • Certain messengers use various techniques to make such logging impossible.
  • Using the Tor browser to access a website makes the website unable to uniquely identify you based on metadata like IP address or browser fingerprint.

These are just a few examples of how certain tools can help with the metadata problem.

The amount of metadata created and recorded can be quite extensive. One smart TV manufacturer was caught scanning the names of nearby WiFi networks, as well as detecting every device on the local network and detailed information about them.

Fortunately, most of us don’t need to be 100% anonymous, and situations like these fall largely outside of the threat model of most people reading this. However, it’s always a good idea to protect or eliminate your metadata wherever possible. Ask what metadata could potentially be leaked and how you can prevent that. Remember to always balance this your threat model.