On the Robustness of Topics API to a Re-Identification Attack

Author: Nikhil Jha

Third-party cookies have been a core pillar in Interest-Based Advertising (IBA). Their use allowed advertisers to build profiles of users surfing the web, in order to offer them advertisements tailored to their interests.
Privacy advocates have long criticized the use of third-party cookies, forcing legislators to take action, and finally leading tech companies to move past this paradigm.
Google is having a central role in the discussion about the post-third-party-cookies era. Most notably, they have introduced the Topics API, a new paradigm that moves the computation of the user profile inside the browser, which only offers to the advertisers a subset of topics the user’s interested in.

In the paper I published with Martino Trevisan, Emilio Leonardi, and Marco Mellia, we noted that even Topics API cannot rule out the risk of re-identification, i.e., the risk that a user’s identity can be linked to his/her visit to a website.

The dataset

The experiments in the paper use a dataset of 268 active users’ browsing history collected during the PIMCity European project. To protect their privacy, the published data simply encodes their rate of visiting for each of the 349 topics currently included in the taxonomy.

The code

You can find the code that backs the results in the paper at this Github address.