Who's Watching Me? A Look Into Online Privacy

Tracking Over Time

Data

We are focusing on a dataset compiled by Who Tracks Me. The dataset contains information on companies, sites, and trackers across the Internet for each month beginning in May 2017.

Reach

A key way to measure the amount that a company tracks is by looking at its reach, i.e., the proportion of page loads that have at least one tracker from that company. (Some companies control many trackers.) Reach is a percentage, so Google, which has maintained a reach of around 80% over the past 2 years, has a tracker on 8 out of every 10 webpages loaded.

Who's Watching?

The rankings for the top 3 companies with the highest reach have been mostly stable over time. Google, Facebook, and Amazon (starting in November 2017) have dominated over the past couple of years.

What Do They See?

Companies use a variety of methods to track everything from your location, screen size, and operating system, to your IP address and browser history. This information can be used to construct a digital fingerprint unique to you.

Content Type Usage

Websites use different types of content for tracking.
A content type's usage is measured as the percentage of all page loads on which the particular content type is used.
Click 'Start' to learn more about the different content types.

Content Type Prevalence Averaged within Company Categories

Data

Click on a tracker content type below to visualize it with the bubble sizes.
Reach: Percentage of page loads that have a tracker from that company. Ranges between 0% and 100%.
Site Reach: Percentage of unique websites that have a tracker from that company. Ranges between 0% and 100%.
Tracker Content: Ranges between 0% and 100%. Represented by circle radius. Specific types include:
  • Cookies: Percentage of pages where a cookie was sent by the browser, or a Set-Cookie header (which is used to send cookies to a user) was sent to the browser by the tracker's server.
  • Bad Queries: Percentage of pages where data was carried in the URL, as measured by the presence of an identifier in the query string parameters sent with a request to this tracker.
  • All Tracked: Percentage of pages with either cookies or bad queries.

Overall Trends


  • Scripts and Images are the most popular content types for tracking
  • Interestingly, beacons, originally designed to satisfy tracking use cases, are encountered less and less.

What can these content types be used for?


There are two main methods of tracking:
  • Cookies: Cookies are data stored locally on a browser.
  • Fingerprinting: Fingerprinting is a more recent method that looks at the characteristics of a user's device, such as the screen resolution, operating system and model, and uses this information to uniquely identify and follow the user as they browse the web.
    The content types used for tracker requests (the petals of the flower visualization) have been found to be used in fingerprinting. For instance, JavaScript scripts can be used to list the set of fonts on a browser and use the set to identify users.

Built with D3 v4

Dong Hur, Jambay Kinley, Alexis Ross, and Elizabeth Yeoh-Wang

Sources

  • Englehardt, Steven, and Arvind Narayanan. 2016. “Online Tracking: A 1-Million-Site Measurement and Analysis.” In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security - CCS’16, 1388–1401. Vienna, Austria: ACM Press. https://doi.org/10.1145/2976749.2978313.
  • Karaj, Arjaldo, Sam Macbeth, Rémi Berson, and Josep M. Pujol. 2018. “WhoTracks .Me: Shedding Light on the Opaque World of Online Tracking,” April. https://arxiv.org/abs/1804.08959v2.