No Arabic abstract
We provide an end-to-end differentially private spectral algorithm for learning LDA, based on matrix/tensor decompositions, and establish theoretical guarantees on utility/consistency of the estimated model parameters. The spectral algorithm consists of multiple algorithmic steps, named as {edges}, to which noise could be injected to obtain differential privacy. We identify emph{subsets of edges}, named as {configurations}, such that adding noise to all edges in such a subset guarantees differential privacy of the end-to-end spectral algorithm. We characterize the sensitivity of the edges with respect to the input and thus estimate the amount of noise to be added to each edge for any required privacy level. We then characterize the utility loss for each configuration as a function of injected noise. Overall, by combining the sensitivity and utility characterization, we obtain an end-to-end differentially private spectral algorithm for LDA and identify the corresponding configuration that outperforms others in any specific regime. We are the first to achieve utility guarantees under the required level of differential privacy for learning in LDA. Overall our method systematically outperforms differentially private variational inference.
Assume we have potential causes $zin Z$, which produce events $w$ with known probabilities $beta(w|z)$. We observe $w_1,w_2,...,w_n$, what can we say about the distribution of the causes? A Bayesian estimate will assume a prior on distributions on $Z$ (we assume a Dirichlet prior) and calculate a posterior. An average over that posterior then gives a distribution on $Z$, which estimates how much each cause $z$ contributed to our observations. This is the setting of Latent Dirichlet Allocation, which can be applied e.g. to topics producing words in a document. In this setting usually the number of observed words is large, but the number of potential topics is small. We are here interested in applications with many potential causes (e.g. locations on the globe), but only a few observations. We show that the exact Bayesian estimate can be computed in linear time (and constant space) in $|Z|$ for a given upper bound on $n$ with a surprisingly simple formula. We generalize this algorithm to the case of sparse probabilities $beta(w|z)$, in which we only need to assume that the tree width of an interaction graph on the observations is limited. On the other hand we also show that without such limitation the problem is NP-hard.
Large data collections required for the training of neural networks often contain sensitive information such as the medical histories of patients, and the privacy of the training data must be preserved. In this paper, we introduce a dropout technique that provides an elegant Bayesian interpretation to dropout, and show that the intrinsic noise added, with the primary goal of regularization, can be exploited to obtain a degree of differential privacy. The iterative nature of training neural networks presents a challenge for privacy-preserving estimation since multiple iterations increase the amount of noise added. We overcome this by using a relaxed notion of differential privacy, called concentrated differential privacy, which provides tighter estimates on the overall privacy loss. We demonstrate the accuracy of our privacy-preserving dropout algorithm on benchmark datasets.
A major challenge for machine learning is increasing the availability of data while respecting the privacy of individuals. Here we combine the provable privacy guarantees of the differential privacy framework with the flexibility of Gaussian processes (GPs). We propose a method using GPs to provide differentially private (DP) regression. We then improve this method by crafting the DP noise covariance structure to efficiently protect the training data, while minimising the scale of the added noise. We find that this cloaking method achieves the greatest accuracy, while still providing privacy guarantees, and offers practical DP for regression over multi-dimensional inputs. Together these methods provide a starter toolkit for combining differential privacy and GPs.
Deep neural networks with their large number of parameters are highly flexible learning systems. The high flexibility in such networks brings with some serious problems such as overfitting, and regularization is used to address this problem. A currently popular and effective regularization technique for controlling the overfitting is dropout. Often, large data collections required for neural networks contain sensitive information such as the medical histories of patients, and the privacy of the training data should be protected. In this paper, we modify the recently proposed variational dropout technique which provided an elegant Bayesian interpretation to dropout, and show that the intrinsic noise in the variational dropout can be exploited to obtain a degree of differential privacy. The iterative nature of training neural networks presents a challenge for privacy-preserving estimation since multiple iterations increase the amount of noise added. We overcome this by using a relaxed notion of differential privacy, called concentrated differential privacy, which provides tighter estimates on the overall privacy loss. We demonstrate the accuracy of our privacy-preserving variational dropout algorithm on benchmark datasets.
A typical audio signal processing pipeline includes multiple disjoint analysis stages, including calculation of a time-frequency representation followed by spectrogram-based feature analysis. We show how time-frequency analysis and nonnegative matrix factorisation can be jointly formulated as a spectral mixture Gaussian process model with nonstationary priors over the amplitude variance parameters. Further, we formulate this nonlinear models state space representation, making it amenable to infinite-horizon Gaussian process regression with approximate inference via expectation propagation, which scales linearly in the number of time steps and quadratically in the state dimensionality. By doing so, we are able to process audio signals with hundreds of thousands of data points. We demonstrate, on various tasks with empirical data, how this inference scheme outperforms more standard techniques that rely on extended Kalman filtering.