Actions speak louder than words: Semi-supervised learning for browser fingerprinting detection

197 0 0.0 ( 0 )

Download Cite

Added by Sarah Bird

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Sarah Bird - Vikas Mishra - Steven Englehardt

Cryptography and Security

visit our facebook page

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

As online tracking continues to grow, existing anti-tracking and fingerprinting detection techniques that require significant manual input must be augmented. Heuristic approaches to fingerprinting detection are precise but must be carefully curated. Supervised machine learning techniques proposed for detecting tracking require manually generated label-sets. Seeking to overcome these challenges, we present a semi-supervised machine learning approach for detecting fingerprinting scripts. Our approach is based on the core insight that fingerprinting scripts have similar patterns of API access when generating their fingerprints, even though their access patterns may not match exactly. Using this insight, we group scripts by their JavaScript (JS) execution traces and apply a semi-supervised approach to detect new fingerprinting scripts. We detail our methodology and demonstrate its ability to identify the majority of scripts ($geqslant$94.9%) identified by existing heuristic techniques. We also show that the approach expands beyond detecting known scripts by surfacing candidate scripts that are likely to include fingerprinting. Through an analysis of these candidate scripts we discovered fingerprinting scripts that were missed by heuristics and for which there are no heuristics. In particular, we identified over one hundred device-class fingerprinting scripts present on hundreds of domains. To the best of our knowledge, this is the first time device-class fingerprinting has been measured in the wild. These successes illustrate the power of a sparse vector representation and semi-supervised learning to complement and extend existing tracking detection techniques.

rate research

Fingerprinting the Fingerprinters: Learning to Detect Browser Fingerprinting Behaviors

78 - Umar Iqbal Then University of Iowa 2020

Browser fingerprinting is an invasive and opaque stateless tracking technique. Browser vendors, academics, and standards bodies have long struggled to provide meaningful protections against browser fingerprinting that are both accurate and do not degrade user experience. We propose FP-Inspector, a machine learning based syntactic-semantic approach to accurately detect browser fingerprinting. We show that FP-Inspector performs well, allowing us to detect 26% more fingerprinting scripts than the state-of-the-art. We show that an API-level fingerprinting countermeasure, built upon FP-Inspector, helps reduce website breakage by a factor of 2. We use FP-Inspector to perform a measurement study of browser fingerprinting on top-100K websites. We find that browser fingerprinting is now present on more than 10% of the top-100K websites and over a quarter of the top-10K websites. We also discover previously unreported uses of JavaScript APIs by fingerprinting scripts suggesting that they are looking to exploit APIs in new and unexpected ways.

Cryptography and Security

Detection of Slang Words in e-Data using semi-Supervised Learning

52 - Alok Ranjan Pal , Diganta Saha 2015

The proposed algorithmic approach deals with finding the sense of a word in an electronic data. Now a day,in different communication mediums like internet, mobile services etc. people use few words, which are slang in nature. This approach detects those abusive words using supervised learning procedure. But in the real life scenario, the slang words are not used in complete word forms always. Most of the times, those words are used in different abbreviated forms like sounds alike forms, taboo morphemes etc. This proposed approach can detect those abbreviated forms also using semi supervised learning procedure. Using the synset and concept analysis of the text, the probability of a suspicious word to be a slang word is also evaluated.

Computation and Language

Semi-supervised Learning Framework for UAV Detection

99 - Olusiji O Medaiyese , Martins Ezuma , Adrian P Lauf 2021

The use of supervised learning with various sensing techniques such as audio, visual imaging, thermal sensing, RADAR, and radio frequency (RF) have been widely applied in the detection of unmanned aerial vehicles (UAV) in an environment. However, little or no attention has been given to the application of unsupervised or semi-supervised algorithms for UAV detection. In this paper, we proposed a semi-supervised technique and architecture for detecting UAVs in an environment by exploiting the RF signals (i.e., fingerprints) between a UAV and its flight-controller communication under wireless inference such as Bluetooth and WiFi. By decomposing the RF signals using a two-level wavelet packet transform, we estimated the second moment statistic (i.e., variance) of the coefficients in each packet as a feature set. We developed a local outlier factor model as the UAV detection algorithm using the coefficient variances of the wavelet packets from WiFi and Bluetooth signals. When detecting the presence of RF-based UAV, we achieved an accuracy of 96.7$%$ and 86$%$ at a signal-to-noise ratio of 30~dB and 18~dB, respectively. The application of this approach is not limited to UAV detection as it can be extended to the detection of rogue RF devices in an environment.

Signal Processing

Interpolation-based semi-supervised learning for object detection

91 - Jisoo Jeong , Vikas Verma , Minsung Hyun 2020

Despite the data labeling cost for the object detection tasks being substantially more than that of the classification tasks, semi-supervised learning methods for object detection have not been studied much. In this paper, we propose an Interpolation-based Semi-supervised learning method for object Detection (ISD), which considers and solves the problems caused by applying conventional Interpolation Regularization (IR) directly to object detection. We divide the output of the model into two types according to the objectness scores of both original patches that are mixed in IR. Then, we apply a separate loss suitable for each type in an unsupervised manner. The proposed losses dramatically improve the performance of semi-supervised learning as well as supervised learning. In the supervised learning setting, our method improves the baseline methods by a significant margin. In the semi-supervised learning setting, our algorithm improves the performance on a benchmark dataset (PASCAL VOC and MSCOCO) in a benchmark architecture (SSD).

Computer Vision and Pattern Recognition

A Simple Semi-Supervised Learning Framework for Object Detection

119 - Kihyuk Sohn , Zizhao Zhang , Chun-Liang Li 2020

Semi-supervised learning (SSL) has a potential to improve the predictive performance of machine learning models using unlabeled data. Although there has been remarkable recent progress, the scope of demonstration in SSL has mainly been on image classification tasks. In this paper, we propose STAC, a simple yet effective SSL framework for visual object detection along with a data augmentation strategy. STAC deploys highly confident pseudo labels of localized objects from an unlabeled image and updates the model by enforcing consistency via strong augmentations. We propose experimental protocols to evaluate the performance of semi-supervised object detection using MS-COCO and show the efficacy of STAC on both MS-COCO and VOC07. On VOC07, STAC improves the AP$^{0.5}$ from $76.30$ to $79.08$; on MS-COCO, STAC demonstrates $2{times}$ higher data efficiency by achieving 24.38 mAP using only 5% labeled data than supervised baseline that marks 23.86% using 10% labeled data. The code is available at https://github.com/google-research/ssl_detection/.

Computer Vision and Pattern Recognition