New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Anonymizing Sensor Data on the Edge: A Representation Learning and Transformation Approach

266 0 0.0 ( 0 )

Download Cite

Added by Omid Ardakanian

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Omid Hajihassani - Omid Ardakanian - Hamzeh Khazaei

Machine Learning Artificial Intelligence Cryptography and Security

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The abundance of data collected by sensors in Internet of Things (IoT) devices, and the success of deep neural networks in uncovering hidden patterns in time series data have led to mounting privacy concerns. This is because private and sensitive information can be potentially learned from sensor data by applications that have access to this data. In this paper, we aim to examine the tradeoff between utility and privacy loss by learning low-dimensional representations that are useful for data obfuscation. We propose deterministic and probabilistic transformations in the latent space of a variational autoencoder to synthesize time series data such that intrusive inferences are prevented while desired inferences can still be made with sufficient accuracy. In the deterministic case, we use a linear transformation to move the representation of input data in the latent space such that the reconstructed data is likely to have the same public attribute but a different private attribute than the original input data. In the probabilistic case, we apply the linear transformation to the latent representation of input data with some probability. We compare our technique with autoencoder-based anonymization techniques and additionally show that it can anonymize data in real time on resource-constrained edge devices.

rate research

Privacy Adversarial Network: Representation Learning for Mobile Data Privacy

270 - Sicong Liu , Junzhao Du , Anshumali Shrivastava 2020

The remarkable success of machine learning has fostered a growing number of cloud-based intelligent services for mobile users. Such a service requires a user to send data, e.g. image, voice and video, to the provider, which presents a serious challenge to user privacy. To address this, prior works either obfuscate the data, e.g. add noise and remove identity information, or send representations extracted from the data, e.g. anonymized features. They struggle to balance between the service utility and data privacy because obfuscated data reduces utility and extracted representation may still reveal sensitive information. This work departs from prior works in methodology: we leverage adversarial learning to a better balance between privacy and utility. We design a textit{representation encoder} that generates the feature representations to optimize against the privacy disclosure risk of sensitive information (a measure of privacy) by the textit{privacy adversaries}, and concurrently optimize with the task inference accuracy (a measure of utility) by the textit{utility discriminator}. The result is the privacy adversarial network (systemname), a novel deep model with the new training algorithm, that can automatically learn representations from the raw data. Intuitively, PAN adversarially forces the extracted representations to only convey the information required by the target task. Surprisingly, this constitutes an implicit regularization that actually improves task accuracy. As a result, PAN achieves better utility and better privacy at the same time! We report extensive experiments on six popular datasets and demonstrate the superiority of systemname compared with alternative methods reported in prior work.

Machine Learning Artificial Intelligence Cryptography and Security

Personalised Federated Learning: A Combinational Approach

91 - Sone Kyaw Pye , Han Yu 2021

Federated learning (FL) is a distributed machine learning approach involving multiple clients collaboratively training a shared model. Such a system has the advantage of more training data from multiple clients, but data can be non-identically and independently distributed (non-i.i.d.). Privacy and integrity preserving features such as differential privacy (DP) and robust aggregation (RA) are commonly used in FL. In this work, we show that on common deep learning tasks, the performance of FL models differs amongst clients and situations, and FL models can sometimes perform worse than local models due to non-i.i.d. data. Secondly, we show that incorporating DP and RA degrades performance further. Then, we conduct an ablation study on the performance impact of different combinations of common personalization approaches for FL, such as finetuning, mixture-of-experts ensemble, multi-task learning, and knowledge distillation. It is observed that certain combinations of personalization approaches are more impactful in certain scenarios while others always improve performance, and combination approaches are better than individual ones. Most clients obtained better performance with combined personalized FL and recover from performance degradation caused by non-i.i.d. data, DP, and RA.

Machine Learning Artificial Intelligence Cryptography and Security

Where Does the Robustness Come from? A Study of the Transformation-based Ensemble Defence

135 - Chang Liao , Yao Cheng , Chengfang Fang 2020

This paper aims to provide a thorough study on the effectiveness of the transformation-based ensemble defence for image classification and its reasons. It has been empirically shown that they can enhance the robustness against evasion attacks, while there is little analysis on the reasons. In particular, it is not clear whether the robustness improvement is a result of transformation or ensemble. In this paper, we design two adaptive attacks to better evaluate the transformation-based ensemble defence. We conduct experiments to show that 1) the transferability of adversarial examples exists among the models trained on data records after different reversible transformations; 2) the robustness gained through transformation-based ensemble is limited; 3) this limited robustness is mainly from the irreversible transformations rather than the ensemble of a number of models; and 4) blindly increasing the number of sub-models in a transformation-based ensemble does not bring extra robustness gain.

Machine Learning Artificial Intelligence Cryptography and Security

Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses

103 - Micah Goldblum , Dimitris Tsipras , Chulin Xie 2020

As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance. The absence of trustworthy human supervision over the data collection process exposes organizations to security vulnerabilities; training data can be manipulated to control and degrade the downstream behaviors of learned models. The goal of this work is to systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space. In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.

Machine Learning Artificial Intelligence Cryptography and Security

Fast Wireless Sensor Anomaly Detection based on Data Stream in Edge Computing Enabled Smart Greenhouse

82 - Yihong Yang , Sheng Ding , Yuwen Liu 2021

Edge computing enabled smart greenhouse is a representative application of Internet of Things technology, which can monitor the environmental information in real time and employ the information to contribute to intelligent decision-making. In the process, anomaly detection for wireless sensor data plays an important role. However, traditional anomaly detection algorithms originally designed for anomaly detection in static data have not properly considered the inherent characteristics of data stream produced by wireless sensor such as infiniteness, correlations and concept drift, which may pose a considerable challenge on anomaly detection based on data stream, and lead to low detection accuracy and efficiency. First, data stream usually generates quickly which means that it is infinite and enormous, so any traditional off-line anomaly detection algorithm that attempts to store the whole dataset or to scan the dataset multiple times for anomaly detection will run out of memory space. Second, there exist correlations among different data streams, which traditional algorithms hardly consider. Third, the underlying data generation process or data distribution may change over time. Thus, traditional anomaly detection algorithms with no model update will lose their effects. Considering these issues, a novel method (called DLSHiForest) on basis of Locality-Sensitive Hashing and time window technique in this paper is proposed to solve these problems while achieving accurate and efficient detection. Comprehensive experiments are executed using real-world agricultural greenhouse dataset to demonstrate the feasibility of our approach. Experimental results show that our proposal is practicable in addressing challenges of traditional anomaly detection while ensuring accuracy and efficiency.

Machine Learning Artificial Intelligence Signal Processing

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Anonymizing Sensor Data on the Edge: A Representation Learning and Transformation Approach

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions