Web tracking is the practice by which websites and third-party companies collect information about users’ online activity. The basis of tracking is the accurate identification of users – you are detected and identified even when you’re just passing through a random website that you are not signed in to. The conventional solution to implement identification and tracking is saving web cookies to the user’s browser.

How does cookie-based tracking work?

Imagine that user Alice visits an online store and puts a T-shirt in her basket. At this moment, Alice’s user ID and the T-shirt’s product ID is saved to the browser as a cookie, enabling that Alice’s basket contents are known at the checkout page. Alternatively, it is enough to save only the user ID to the browser if the user ID/product ID pair is saved in the online store’s database.

TNW online events

Our Couch Conferences bring together industry experts to discuss what’s next

REGISTER NOW

The previous scenario sounds pretty normal, but cookies can be used for tracking purposes too. Imagine that Alice reads about antidepressants on a medical website. Then a third-party advertising company that has control over a small section of the website puts a cookie in Alice’s browser and records that she has read about product XY at time T. Assume that Alice visits a totally unrelated website that is also in contract with the same advertising company. Her previous activity can be tracked through the cookie, and as an unpleasant surprise, antidepressant ads pop up on the unrelated website.

The previous example shows why the application of third-party cookies is considered a questionable practice that violates users’ privacy. Major browsers have already started to take action against this practice. Safari blocks third-party cookies by default since 2017. Firefox has also done this since 2019, and Chrome plans to join them too.

Read: [Digital fingerprints are the new cookies — and advertisers want yours]

Cookies can be blocked – what’s next?

As cookie-based tracking becomes more difficult, the tracking business is moving toward different techniques such as browser fingerprinting. The idea behind browser fingerprinting is to collect information about the browser and its environment for the purpose of identification. These attributes include the browser type and version, operating system, language, time zone, active plugins, installed fonts, screen resolution, CPU class, device memory, and various other settings. The attributes are concatenated into a long string, and the fingerprint is defined as a hash value of the string.

One might ask how unique these browser fingerprints are. It turns out that they tend to be unique in the majority of cases. Curious readers can check it for their own browser at amiunique.org. If a browser fingerprint happens to be non-unique, it can probably be made unique by combining it with the device’s IP address. In other words, browser fingerprints are capable of fully or partially identifying users when cookies are turned off.

Browser fingerprinting uncovered

In order to catch real-life browser fingerprinting in action, let’s analyze some websites. Specifically, I will use Incognito mode Chrome so that all extensions are turned off. Although I try to present reproducible experiments, keep in mind that browser fingerprinting can be browser– or location-dependent, or it can be turned on only for a random subset of IP addresses. Also, the fingerprinter scripts sometimes get a version update. Therefore, 100-percent reproducibility cannot be guaranteed.

Having that said, let’s open the website mobile.de. The browser’s developer tools contain a performance analyzer that reveals which JavaScript functions have been called after loading the website. If we search “fingerprint” in the call tree, an interesting function call pops up:

browser fingerprinting call tree

The script is loaded from https://script.ioam.de/iam.js. Here is the source code of the function:

browser fingerprinting script

The fingerprint string is accumulated in the variable t. The components of the fingerprint are the User-Agent string, installed plugins with version number, MIME types recognized by the browser, and ActiveX related information too if the browser is Internet Explorer.

If we put a breakpoint on line 22 and reload the page, we can observe the final value of t. It is the following for my browser:

browser fingerprint string

After applying the hash() function, the fingerprint becomes “94qaxn”. And it isn’t just mobile.de using this fingerprint() function. For example, immobilienscout24.de, spiegel.de, and wetteronline.de also embed and run it.

A similar type of fingerprinting can be observed on the news site lemonde.fr. The relevant JavaScript code is loaded from https://cdn.keywee.co/dist/sp-2.9.1.js. The code is minified, which makes it more difficult to follow. Nevertheless, if we pretty-print the code, the following snippet can be found, starting from line 1454:

browser fingerprintint script

First, the components of the fingerprint are calculated. Then, on line 16, a string is created from the fingerprint components and an integer hash code is computed using the function k. On my machine, the fingerprint string is

browser fingerprint string

and the computed hash code is 641572758.

More subtle techniques are also present on the web. For example, express.co.uk uses many external resources. Among others, it loads and executes JavaScript code from https://securepubads.g.doubleclick.net/gpt/pubads_impl_2020020309.js. The domain is owned by the ad serving company DoubleClick. The code is minified again. After pretty-printing, the following snippet can be found, starting from line 8095:

browser fingerprinting script

By placing a breakpoint on line 1 and reloading the website, we can double-check that this code is indeed executed. Then step-by-step execution allows us to investigate what happens here. Lines 1 to 11 prepare the dictionary f and fill it with various browser attributes. At line 12 the function Er initiates a chain of function calls. The first parameter of Er is a complex data structure that was created before. One of its attributes is the array a.B that already has 40 elements when Er is called. The main effect of Er from our perspective is that it appends all key-value pairs of f to array a.B. Then the rest of the code augments a.B with other attributes. For example, line 38 queries the device memory from the navigator object and appends it to the end of a.B.

After line 38 is executed, the content of a.B is the following:

browser fingerprint

The first 40 elements of a.B contain attributes not related to fingerprinting, and they are not shown here. I have found no sign of computing a hash code from the fingerprinting related elements of a.B. However, this can be easily done on the server-side after the data is transferred to DoubleClick.

DoubleClick-based fingerprinting is also present, for example, on lequipe.fr, news.com.au, t-online.de and also on all previously mentioned websites (mobile.de, immobilienscout24.de, spiegel.de, wetteronline.de, and lemonde.fr).

The browser fingerprinting landscape

Of course, the landscape of browser fingerprinting is diverse. Here are a few other fingerprinting functions along with websites that apply them:

browser fingerprinting scripts

Countermeasures against browser fingerprinting

Like most users, we believe that anyone should have the right to opt-out of any forms of web tracking, including browser fingerprinting. That is why we are working on algorithms that detect browser fingerprinting activities.

We collect and analyze known cases of browser fingerprinting and identify patterns based on them. The conventional method for detection would be to exactly match these patterns against websites and find the ones that apply known fingerprinting methods. However, more can be done with artificial intelligence. An AI-based fingerprinting detector is able to perform inexact pattern matching and detect novel fingerprinting methods. Users , therefore, get a stronger defense against browser fingerprinting.

This story is republished from TechTalks, the blog that explores how technology is solving problems… and creating new ones. Like them on Facebook here and follow them down here:

Read next: This AI wrote such emo lyrics that humans thought it was My Chemical Romance

Corona coverage

Read our daily coverage on how the tech industry is responding to the coronavirus and subscribe to our weekly newsletter Coronavirus in Context.

For tips and tricks on working remotely, check out our Growth Quarters articles here or follow us on Twitter.