ترغب بنشر مسار تعليمي؟ اضغط هنا

Actions speak louder than words: Semi-supervised learning for browser fingerprinting detection

197   0   0.0 ( 0 )
 نشر من قبل Sarah Bird
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

As online tracking continues to grow, existing anti-tracking and fingerprinting detection techniques that require significant manual input must be augmented. Heuristic approaches to fingerprinting detection are precise but must be carefully curated. Supervised machine learning techniques proposed for detecting tracking require manually generated label-sets. Seeking to overcome these challenges, we present a semi-supervised machine learning approach for detecting fingerprinting scripts. Our approach is based on the core insight that fingerprinting scripts have similar patterns of API access when generating their fingerprints, even though their access patterns may not match exactly. Using this insight, we group scripts by their JavaScript (JS) execution traces and apply a semi-supervised approach to detect new fingerprinting scripts. We detail our methodology and demonstrate its ability to identify the majority of scripts ($geqslant$94.9%) identified by existing heuristic techniques. We also show that the approach expands beyond detecting known scripts by surfacing candidate scripts that are likely to include fingerprinting. Through an analysis of these candidate scripts we discovered fingerprinting scripts that were missed by heuristics and for which there are no heuristics. In particular, we identified over one hundred device-class fingerprinting scripts present on hundreds of domains. To the best of our knowledge, this is the first time device-class fingerprinting has been measured in the wild. These successes illustrate the power of a sparse vector representation and semi-supervised learning to complement and extend existing tracking detection techniques.



قيم البحث

اقرأ أيضاً

Browser fingerprinting is an invasive and opaque stateless tracking technique. Browser vendors, academics, and standards bodies have long struggled to provide meaningful protections against browser fingerprinting that are both accurate and do not deg rade user experience. We propose FP-Inspector, a machine learning based syntactic-semantic approach to accurately detect browser fingerprinting. We show that FP-Inspector performs well, allowing us to detect 26% more fingerprinting scripts than the state-of-the-art. We show that an API-level fingerprinting countermeasure, built upon FP-Inspector, helps reduce website breakage by a factor of 2. We use FP-Inspector to perform a measurement study of browser fingerprinting on top-100K websites. We find that browser fingerprinting is now present on more than 10% of the top-100K websites and over a quarter of the top-10K websites. We also discover previously unreported uses of JavaScript APIs by fingerprinting scripts suggesting that they are looking to exploit APIs in new and unexpected ways.
The proposed algorithmic approach deals with finding the sense of a word in an electronic data. Now a day,in different communication mediums like internet, mobile services etc. people use few words, which are slang in nature. This approach detects th ose abusive words using supervised learning procedure. But in the real life scenario, the slang words are not used in complete word forms always. Most of the times, those words are used in different abbreviated forms like sounds alike forms, taboo morphemes etc. This proposed approach can detect those abbreviated forms also using semi supervised learning procedure. Using the synset and concept analysis of the text, the probability of a suspicious word to be a slang word is also evaluated.
The use of supervised learning with various sensing techniques such as audio, visual imaging, thermal sensing, RADAR, and radio frequency (RF) have been widely applied in the detection of unmanned aerial vehicles (UAV) in an environment. However, lit tle or no attention has been given to the application of unsupervised or semi-supervised algorithms for UAV detection. In this paper, we proposed a semi-supervised technique and architecture for detecting UAVs in an environment by exploiting the RF signals (i.e., fingerprints) between a UAV and its flight-controller communication under wireless inference such as Bluetooth and WiFi. By decomposing the RF signals using a two-level wavelet packet transform, we estimated the second moment statistic (i.e., variance) of the coefficients in each packet as a feature set. We developed a local outlier factor model as the UAV detection algorithm using the coefficient variances of the wavelet packets from WiFi and Bluetooth signals. When detecting the presence of RF-based UAV, we achieved an accuracy of 96.7$%$ and 86$%$ at a signal-to-noise ratio of 30~dB and 18~dB, respectively. The application of this approach is not limited to UAV detection as it can be extended to the detection of rogue RF devices in an environment.
Despite the data labeling cost for the object detection tasks being substantially more than that of the classification tasks, semi-supervised learning methods for object detection have not been studied much. In this paper, we propose an Interpolation -based Semi-supervised learning method for object Detection (ISD), which considers and solves the problems caused by applying conventional Interpolation Regularization (IR) directly to object detection. We divide the output of the model into two types according to the objectness scores of both original patches that are mixed in IR. Then, we apply a separate loss suitable for each type in an unsupervised manner. The proposed losses dramatically improve the performance of semi-supervised learning as well as supervised learning. In the supervised learning setting, our method improves the baseline methods by a significant margin. In the semi-supervised learning setting, our algorithm improves the performance on a benchmark dataset (PASCAL VOC and MSCOCO) in a benchmark architecture (SSD).
Semi-supervised learning (SSL) has a potential to improve the predictive performance of machine learning models using unlabeled data. Although there has been remarkable recent progress, the scope of demonstration in SSL has mainly been on image class ification tasks. In this paper, we propose STAC, a simple yet effective SSL framework for visual object detection along with a data augmentation strategy. STAC deploys highly confident pseudo labels of localized objects from an unlabeled image and updates the model by enforcing consistency via strong augmentations. We propose experimental protocols to evaluate the performance of semi-supervised object detection using MS-COCO and show the efficacy of STAC on both MS-COCO and VOC07. On VOC07, STAC improves the AP$^{0.5}$ from $76.30$ to $79.08$; on MS-COCO, STAC demonstrates $2{times}$ higher data efficiency by achieving 24.38 mAP using only 5% labeled data than supervised baseline that marks 23.86% using 10% labeled data. The code is available at https://github.com/google-research/ssl_detection/.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا