No Arabic abstract
Billions of text analysis requests containing private emails, personal text messages, and sensitive online reviews, are processed by recurrent neural networks (RNNs) deployed on public clouds every day. Although prior secure networks combine homomorphic encryption (HE) and garbled circuit (GC) to preserve users privacy, naively adopting the HE and GC hybrid technique to implement RNNs suffers from long inference latency due to slow activation functions. In this paper, we present a HE and GC hybrid gated recurrent unit (GRU) network, CryptoGRU, for low-latency secure inferences. CryptoGRU replaces computationally expensive GC-based $tanh$ with fast GC-based $ReLU$, and then quantizes $sigmoid$ and $ReLU$ with a smaller bit length to accelerate activations in a GRU. We evaluate CryptoGRU with multiple GRU models trained on 4 public datasets. Experimental results show CryptoGRU achieves top-notch accuracy and improves the secure inference latency by up to $138times$ over one of state-of-the-art secure networks on the Penn Treebank dataset.
In the big data era, more and more cloud-based data-driven applications are developed that leverage individual data to provide certain valuable services (the utilities). On the other hand, since the same set of individual data could be utilized to infer the individuals certain sensitive information, it creates new channels to snoop the individuals privacy. Hence it is of great importance to develop techniques that enable the data owners to release privatized data, that can still be utilized for certain premised intended purpose. Existing data releasing approaches, however, are either privacy-emphasized (no consideration on utility) or utility-driven (no guarantees on privacy). In this work, we propose a two-step perturbation-based utility-aware privacy-preserving data releasing framework. First, certain predefined privacy and utility problems are learned from the public domain data (background knowledge). Later, our approach leverages the learned knowledge to precisely perturb the data owners data into privatized data that can be successfully utilized for certain intended purpose (learning to succeed), without jeopardizing certain predefined privacy (training to fail). Extensive experiments have been conducted on Human Activity Recognition, Census Income and Bank Marketing datasets to demonstrate the effectiveness and practicality of our framework.
Deep learning has been widely applied in many computer vision applications, with remarkable success. However, running deep learning models on mobile devices is generally challenging due to the limitation of computing resources. A popular alternative is to use cloud services to run deep learning models to process raw data. This, however, imposes privacy risks. Some prior arts proposed sending the features extracted from raw data to the cloud. Unfortunately, these extracted features can still be exploited by attackers to recover raw images and to infer embedded private attributes. In this paper, we propose an adversarial training framework, DeepObfuscator, which prevents the usage of the features for reconstruction of the raw images and inference of private attributes. This is done while retaining useful information for the intended cloud service. DeepObfuscator includes a learnable obfuscator that is designed to hide privacy-related sensitive information from the features by performing our proposed adversarial training algorithm. The proposed algorithm is designed by simulating the game between an attacker who makes efforts to reconstruct raw image and infer private attributes from the extracted features and a defender who aims to protect user privacy. By deploying the trained obfuscator on the smartphone, features can be locally extracted and then sent to the cloud. Our experiments on CelebA and LFW datasets show that the quality of the reconstructed images from the obfuscated features of the raw image is dramatically decreased from 0.9458 to 0.3175 in terms of multi-scale structural similarity. The person in the reconstructed image, hence, becomes hardly to be re-identified. The classification accuracy of the inferred private attributes that can be achieved by the attacker is significantly reduced to a random-guessing level.
Online users generate tremendous amounts of textual information by participating in different activities, such as writing reviews and sharing tweets. This textual data provides opportunities for researchers and business partners to study and understand individuals. However, this user-generated textual data not only can reveal the identity of the user but also may contain individuals private information (e.g., age, location, gender). Hence, you are what you write as the saying goes. Publishing the textual data thus compromises the privacy of individuals who provided it. The need arises for data publishers to protect peoples privacy by anonymizing the data before publishing it. It is challenging to design effective anonymization techniques for textual information which minimizes the chances of re-identification and does not contain users sensitive information (high privacy) while retaining the semantic meaning of the data for given tasks (high utility). In this paper, we study this problem and propose a novel double privacy preserving text representation learning framework, DPText, which learns a textual representation that (1) is differentially private, (2) does not contain private information and (3) retains high utility for the given task. Evaluating on two natural language processing tasks, i.e., sentiment analysis and part of speech tagging, we show the effectiveness of this approach in terms of preserving both privacy and utility.
Federated analytics has many applications in edge computing, its use can lead to better decision making for service provision, product development, and user experience. We propose a Bayesian approach to trend detection in which the probability of a keyword being trendy, given a dataset, is computed via Bayes Theorem; the probability of a dataset, given that a keyword is trendy, is computed through secure aggregation of such conditional probabilities over local datasets of users. We propose a protocol, named SAFE, for Bayesian federated analytics that offers sufficient privacy for production grade use cases and reduces the computational burden of users and an aggregator. We illustrate this approach with a trend detection experiment and discuss how this approach could be extended further to make it production-ready.
In the digital era, users share their personal data with service providers to obtain some utility, e.g., access to high-quality services. Yet, the induced information flows raise privacy and integrity concerns. Consequently, cautious users may want to protect their privacy by minimizing the amount of information they disclose to curious service providers. Service providers are interested in verifying the integrity of the users data to improve their services and obtain useful knowledge for their business. In this work, we present a generic solution to the trade-off between privacy, integrity, and utility, by achieving authenticity verification of data that has been encrypted for offloading to service providers. Based on lattice-based homomorphic encryption and commitments, as well as zero-knowledge proofs, our construction enables a service provider to process and reuse third-party signed data in a privacy-friendly manner with integrity guarantees. We evaluate our solution on different use cases such as smart-metering, disease susceptibility, and location-based activity tracking, thus showing its versatility. Our solution achieves broad generality, quantum-resistance, and relaxes some assumptions of state-of-the-art solutions without affecting performance.