Fingerprinting the Fingerprinters: Learning to Detect Browser Fingerprinting Behaviors

79 0 0.0 ( 0 )

Download Cite

Added by Umar Iqbal

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Umar Iqbal Then University of Iowa

Cryptography and Security

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Browser fingerprinting is an invasive and opaque stateless tracking technique. Browser vendors, academics, and standards bodies have long struggled to provide meaningful protections against browser fingerprinting that are both accurate and do not degrade user experience. We propose FP-Inspector, a machine learning based syntactic-semantic approach to accurately detect browser fingerprinting. We show that FP-Inspector performs well, allowing us to detect 26% more fingerprinting scripts than the state-of-the-art. We show that an API-level fingerprinting countermeasure, built upon FP-Inspector, helps reduce website breakage by a factor of 2. We use FP-Inspector to perform a measurement study of browser fingerprinting on top-100K websites. We find that browser fingerprinting is now present on more than 10% of the top-100K websites and over a quarter of the top-10K websites. We also discover previously unreported uses of JavaScript APIs by fingerprinting scripts suggesting that they are looking to exploit APIs in new and unexpected ways.

rate research

Actions speak louder than words: Semi-supervised learning for browser fingerprinting detection

196 - Sarah Bird , Vikas Mishra , Steven Englehardt 2020

As online tracking continues to grow, existing anti-tracking and fingerprinting detection techniques that require significant manual input must be augmented. Heuristic approaches to fingerprinting detection are precise but must be carefully curated. Supervised machine learning techniques proposed for detecting tracking require manually generated label-sets. Seeking to overcome these challenges, we present a semi-supervised machine learning approach for detecting fingerprinting scripts. Our approach is based on the core insight that fingerprinting scripts have similar patterns of API access when generating their fingerprints, even though their access patterns may not match exactly. Using this insight, we group scripts by their JavaScript (JS) execution traces and apply a semi-supervised approach to detect new fingerprinting scripts. We detail our methodology and demonstrate its ability to identify the majority of scripts ($geqslant$94.9%) identified by existing heuristic techniques. We also show that the approach expands beyond detecting known scripts by surfacing candidate scripts that are likely to include fingerprinting. Through an analysis of these candidate scripts we discovered fingerprinting scripts that were missed by heuristics and for which there are no heuristics. In particular, we identified over one hundred device-class fingerprinting scripts present on hundreds of domains. To the best of our knowledge, this is the first time device-class fingerprinting has been measured in the wild. These successes illustrate the power of a sparse vector representation and semi-supervised learning to complement and extend existing tracking detection techniques.

Cryptography and Security

New Approaches to Website Fingerprinting Defenses

367 - Xiang Cai , Rishab Nithyanand , 2014

Website fingerprinting attacks enable an adversary to infer which website a victim is visiting, even if the victim uses an encrypting proxy, such as Tor. Previous work has shown that all proposed defenses against website fingerprinting attacks are ineffective. This paper advances the study of website fingerprinting attacks and defenses in two ways. First, we develop bounds on the trade-off between security and bandwidth overhead that any fingerprinting defense scheme can achieve. This enables us to compare schemes with different security/overhead trade-offs by comparing how close they are to the lower bound. We then refine, implement, and evaluate the Congestion Sensitive BuFLO scheme outlined by Cai, et al. CS-BuFLO, which is based on the provably-secure BuFLO defense proposed by Dyer, et al., was not fully-specified by Cai, et al, but has nonetheless attracted the attention of the Tor developers. Our experiments find that CS-BuFLO has high overhead (around 2.3-2.8x) but can get 6x closer to the bandwidth/security trade-off lower bound than Tor or plain SSH.

Cryptography and Security

Teacher Model Fingerprinting Attacks Against Transfer Learning

332 - Yufei Chen , Chao Shen , Cong Wang 2021

Transfer learning has become a common solution to address training data scarcity in practice. It trains a specified student model by reusing or fine-tuning early layers of a well-trained teacher model that is usually publicly available. However, besides utility improvement, the transferred public knowledge also brings potential threats to model confidentiality, and even further raises other security and privacy issues. In this paper, we present the first comprehensive investigation of the teacher model exposure threat in the transfer learning context, aiming to gain a deeper insight into the tension between public knowledge and model confidentiality. To this end, we propose a teacher model fingerprinting attack to infer the origin of a student model, i.e., the teacher model it transfers from. Specifically, we propose a novel optimization-based method to carefully generate queries to probe the student model to realize our attack. Unlike existing model reverse engineering approaches, our proposed fingerprinting method neither relies on fine-grained model outputs, e.g., posteriors, nor auxiliary information of the model architecture or training dataset. We systematically evaluate the effectiveness of our proposed attack. The empirical results demonstrate that our attack can accurately identify the model origin with few probing queries. Moreover, we show that the proposed attack can serve as a stepping stone to facilitating other attacks against machine learning models, such as model stealing.

Cryptography and Security Machine Learning

Robust Website Fingerprinting Through the Cache Occupancy Channel

287 - Anatoly Shusterman , Lachlan Kang , Yarden Haskal 2018

Website fingerprinting attacks, which use statistical analysis on network traffic to compromise user privacy, have been shown to be effective even if the traffic is sent over anonymity-preserving networks such as Tor. The classical attack model used to evaluate website fingerprinting attacks assumes an on-path adversary, who can observe all traffic traveling between the users computer and the Tor network. In this work we investigate these attacks under a different attack model, in which the adversary is capable of running a small amount of unprivileged code on the target users computer. Under this model, the attacker can mount cache side-channel attacks, which exploit the effects of contention on the CPUs cache, to identify the website being browsed. In an important special case of this attack model, a JavaScript attack is launched when the target user visits a website controlled by the attacker. The effectiveness of this attack scenario has never been systematically analyzed, especially in the open-world model which assumes that the user is visiting a mix of both sensitive and non-sensitive sites. In this work we show that cache website fingerprinting attacks in JavaScript are highly feasible, even when they are run from highly restrictive environments, such as the Tor Browser. Specifically, we use machine learning techniques to classify traces of cache activity. Unlike prior works, which try to identify cache conflicts, our work measures the overall occupancy of the last-level cache. We show that our approach achieves high classification accuracy in both the open-world and the closed-world models. We further show that our techniques are resilient both to network-based defenses and to side-channel countermeasures introduced to modern browsers as a response to the Spectre attack.

Cryptography and Security Machine Learning

The Effectiveness of Privacy Enhancing Technologies against Fingerprinting

294 - Amit Datta , Jianan Lu , Michael Carl Tschantz 2018

We measure how effective Privacy Enhancing Technologies (PETs) are at protecting users from website fingerprinting. Our measurements use both experimental and observational methods. Experimental methods allow control, precision, and use on new PETs that currently lack a user base. Observational methods enable scale and drawing from the browsers currently in real-world use. By applying experimentally created models of a PETs behavior to an observational data set, our novel hybrid method offers the best of both worlds. We find the Tor Browser Bundle to be the most effective PET amongst the set we tested. We find that some PETs have inconsistent behaviors, which can do more harm than good.

Cryptography and Security