No Arabic abstract
Differential privacy is a promising framework for addressing the privacy concerns in sharing sensitive datasets for others to analyze. However differential privacy is a highly technical area and current deployments often require experts to write code, tune parameters, and optimize the trade-off between the privacy and accuracy of statistical releases. For differential privacy to achieve its potential for wide impact, it is important to design usable systems that enable differential privacy to be used by ordinary data owners and analysts. PSI is a tool that was designed for this purpose, allowing researchers to release useful differentially private statistical information about their datasets without being experts in computer science, statistics, or privacy. We conducted a thorough usability study of PSI to test whether it accomplishes its goal of usability by non-experts. The usability test illuminated which features of PSI are most user-friendly and prompted us to improve aspects of the tool that caused confusion. The test also highlighted some general principles and lessons for designing usable systems for differential privacy, which we discuss in depth.
Charts often contain visually prominent features that draw attention to aspects of the data and include text captions that emphasize aspects of the data. Through a crowdsourced study, we explore how readers gather takeaways when considering charts and captions together. We first ask participants to mark visually prominent regions in a set of line charts. We then generate text captions based on the prominent features and ask participants to report their takeaways after observing chart-caption pairs. We find that when both the chart and caption describe a high-prominence feature, readers treat the doubly emphasized high-prominence feature as the takeaway; when the caption describes a low-prominence chart feature, readers rely on the chart and report a higher-prominence feature as the takeaway. We also find that external information that provides context, helps further convey the captions message to the reader. We use these findings to provide guidelines for authoring effective chart-caption pairs.
WaveCluster is an important family of grid-based clustering algorithms that are capable of finding clusters of arbitrary shapes. In this paper, we investigate techniques to perform WaveCluster while ensuring differential privacy. Our goal is to develop a general technique for achieving differential privacy on WaveCluster that accommodates different wavelet transforms. We show that straightforward techniques based on synthetic data generation and introduction of random noise when quantizing the data, though generally preserving the distribution of data, often introduce too much noise to preserve useful clusters. We then propose two optimized techniques, PrivTHR and PrivTHREM, which can significantly reduce data distortion during two key steps of WaveCluster: the quantization step and the significant grid identification step. We conduct extensive experiments based on four datasets that are particularly interesting in the context of clustering, and show that PrivTHR and PrivTHREM achieve high utility when privacy budgets are properly allocated.
Crowdsourcing information constitutes an important aspect of human-in-the-loop learning for researchers across multiple disciplines such as AI, HCI, and social science. While using crowdsourced data for subjective tasks is not new, eliciting useful insights from such data remains challenging due to a variety of factors such as difficulty of the task, personal prejudices of the human evaluators, lack of question clarity, etc. In this paper, we consider one such subjective evaluation task, namely that of estimating experienced emotions of distressed individuals who are conversing with a human listener in an online coaching platform. We explore strategies to aggregate the evaluators choices, and show that a simple voting consensus is as effective as an optimum aggregation method for the task considered. Intrigued by how an objective assessment would compare to the subjective evaluation of evaluators, we also designed a machine learning algorithm to perform the same task. Interestingly, we observed a machine learning algorithm that is not explicitly modeled to characterize evaluators subjectivity is as reliable as the human evaluation in terms of assessing the most dominant experienced emotions.
Black-box machine learning models are used in critical decision-making domains, giving rise to several calls for more algorithmic transparency. The drawback is that model explanations can leak information about the training data and the explanation data used to generate them, thus undermining data privacy. To address this issue, we propose differentially private algorithms to construct feature-based model explanations. We design an adaptive differentially private gradient descent algorithm, that finds the minimal privacy budget required to produce accurate explanations. It reduces the overall privacy loss on explanation data, by adaptively reusing past differentially private explanations. It also amplifies the privacy guarantees with respect to the training data. We evaluate the implications of differentially private models and our privacy mechanisms on the quality of model explanations.
Computing devices such as laptops, tablets and mobile phones have become part of our daily lives. End users increasingly know more and more information about these devices. Further, more technically savvy end users know how such devices are being built and know how to choose one over the others. However, we cannot say the same about the Internet of Things (IoT) products. Due to its infancy nature of the marketplace, end users have very little idea about IoT products. To address this issue, we developed a method, a crowdsourced peer learning activity, supported by an online platform (OLYMPUS) to enable a group of learners to learn IoT products space better. We conducted two different user studies to validate that our tool enables better IoT education. Our method guide learners to think more deeply about IoT products and their design decisions. The learning platform we developed is open source and available for the community.