By Robert Quinlivan, OneZero | February 12, 2020
All the ways advertisers and others get your data
If you have used the internet at any point in the last 10 years, you’re probably aware that companies are collecting data for advertising purposes. The major players in this market are Google, Facebook, and a handful of startups and small data brokers looking at the new frontiers of data, such as connected appliances and energy grids.
It’s all part of the collection side of the data economy. Targeted advertising requires a high-resolution view of consumer activity in order to build more accurate models for advertisers. While this system of extraction was originally built for advertising, this data is also valuable to the courts, government agencies, and political campaigns.
I will attempt to present a neutral and objective view of these sources of data collection. The only commentary I will make here is that these methods are surprising, by which I mean that the average user of these systems is unlikely to be aware of them.
I feel that “surprise” is an important measuring stick in the data privacy conversation. I don’t think the average person is surprised to learn that Amazon knows their purchase or search history. It may, however, be quite surprising to learn that Alexa passively records conversations.
The internet was designed for open collaboration, but that collaborative design is antithetical to privacy.
1. DNS
The first thing that happens when you type a web address into your favorite browser is that the browser must look up where the server is located.
To locate the server, a Domain Name System (DNS) lookup request is performed. A DNS is a registry of domain names and the IP address associated with each domain. If you want to see it in action, try typing your favorite website into dnslookup.org.
Remember that this DNS request happens before the first byte from the host server has been sent back to the browser. So before you’ve even loaded the content of the page, your DNS knows what website you’re trying to visit.
Google, itself a major data broker, runs its own popular public DNS located at 8.8.8.8. Google says it logs an IP address per every request to its DNS, but claims in its privacy policy not to “correlate or combine information from our temporary or permanent logs with any personal information that you have provided.” However, the potential is there, and Google, which has been known to read user emails to build advertising profiles, could change its policy without disclosing the change to users.
While many requests for content on the internet are encrypted using the HTTPS protocol, DNS requests are often sent in plaintext. That means that your browsing history could be easily intercepted by your internet service provider (ISP) or the DNS provider itself.
Support for secure DNS is becoming more common, but until it’s the universal standard, DNS remains a source of browsing habits leaking to data collection companies.
Even if secure and encrypted DNS becomes a standard, the potential for logging of IP addresses remains if it’s controlled by a major data broker, such as Google. It would still be possible for Google to correlate your IP address to your DNS requests and thereby learn about your browsing habits without intercepting any of your actual browsing data.
2. Your location — even if you turn it off
Let’s say that you are very privacy-minded and decide to disable the geolocation feature on your phone or laptop. Unfortunately, it’s still quite easy to locate you, even if the GPS on your device is turned off.
Your device has an associated IP address. Your IP address is a numerical code that represents your location on the vast ocean of the internet. IP addresses are used for routing network traffic, and they are laid out in ranges allocated to an ISP. When you connect to your ISP, an IP address is assigned to your device based on what part of the network you’re connecting from.
It is relatively easy to figure out where in the world you are connected based on your IP address, even if you disable the location feature on your device. If you’re curious, there are a dozen or more websites designed to test geolocation by IP.
Your IP address is also personally identifiable. It typically does not change when you close your browser or restart your device. If a website owner were to log your connection to their site with your IP address, even if you didn’t log in or create an account, your activity on that website could be tied to you based on your IP address.
One common workaround for this problem is the use of a Virtual Private Network (VPN) to route all network traffic from your device to another server, even one on the other side of the planet. With a VPN, your true IP address is no longer used to access websites you visit, so your location is no longer exposed. VPNs also provide privacy protection when connecting to a public Wi-Fi network, like those in a coffee shop.
Why, you may very reasonably ask, would I want to connect my refrigerator to the internet?
3. Your home appliances
In recent years we have seen popular products like Amazon’s Alexa and other digital home assistants that provide the same search query functionality as Google, but with the added convenience of a voice-activated system. It wasn’t long before privacy issues with these systems surfaced.
Amazon’s Alexa is only the first in a wave of household products called the Internet of Things (IoT). The Internet of Things envisions a world of updated appliances that connect over your home Wi-Fi network, or in the near future, over 5G.
Why, you may very reasonably ask, would I want to connect my refrigerator to the internet? While the “smart fridge” concept may need some tinkering before consumers will buy into it, smart TVs certainly have taken off. As the entertainment industry has pivoted toward streaming services, and increased production lines enable better resolutions at home, consumers are choosing connected TVs to take advantage of new content.
Some smart TV models capture data about what content is playing on your device in order to provide advertisers with viewership data.
The motivations for these systems are the same as any targeted advertising system. TV viewership statistics, traditionally derived by paying families to report their viewing habits in intricate detail, are vague and depend on a relatively small sample size. Smart TV analytics data offers better data about viewership, by providing a detailed view of the real-time viewing habits of millions of consumers.
Unfortunately, consumers are not always aware of the data collection aspect of these new connected appliance products. It’s possible to opt out of data collection in many of these systems, but you’ll have to dig into the settings for each model.
The internet was designed for open collaboration, but that collaborative design is antithetical to privacy. Even if you were to avoid directly interacting with data brokers like Google and Facebook, and even if you were to install aggressive ad-blocking software in your browser, your internet activity could still be leaked in unexpected ways.
As the horizon of internet-connected devices expands, and our daily lives become ever more enmeshed in the collaborative world of the internet, the trail of data we leave behind as we go about our lives will grow exponentially. The increased bandwidth of 5G will also enable more live streaming services that will increase our data footprint. Once 5G networks arrive, and devices begin to support it by default, it will be even easier to track your location and collect data about your daily habits.
The more data we produce, the more opportunities there will be to collect that data. As these technologies become publicly available, it will behoove the public to decide what degree of “surprise” is acceptable.