ترغب بنشر مسار تعليمي؟ اضغط هنا

Privacy in Sensor-Driven Human Data Collection: A Guide for Practitioners

143   0   0.0 ( 0 )
 نشر من قبل Arkadiusz Stopczynski Mr.
 تاريخ النشر 2014
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In recent years, the amount of information collected about human beings has increased dramatically. This development has been partially driven by individuals posting and storing data about themselves and friends using online social networks or collecting their data for self-tracking purposes (quantified-self movement). Across the sciences, researchers conduct studies collecting data with an unprecedented resolution and scale. Using computational power combined with mathematical models, such rich datasets can be mined to infer underlying patterns, thereby providing insights into human nature. Much of the data collected is sensitive. It is private in the sense that most individuals would feel uncomfortable sharing their collected personal data publicly. For this reason, the need for solutions to ensure the privacy of the individuals generating data has grown alongside the data collection efforts. Out of all the massive data collection efforts, this paper focuses on efforts directly instrumenting human behavior, and notes that -- in many cases -- the privacy of participants is not sufficiently addressed. For example, study purposes are often not explicit, informed consent is ill-defined, and security and sharing protocols are only partially disclosed. This paper provides a survey of the work related to addressing privacy issues in research studies that collect detailed sensor data on human behavior. Reflections on the key problems and recommendations for future work are included. We hope the overview of the privacy-related practices in massive data collection studies can be used as a frame of reference for practitioners in the field. Although focused on data collection in an academic context, we believe that many of the challenges and solutions we identify are also relevant and useful for other domains where massive data collection takes place, including businesses and governments.



قيم البحث

اقرأ أيضاً

The increasing generation and collection of personal data has created a complex ecosystem, often collaborative but sometimes combative, around companies and individuals engaging in the use of these data. We propose that the interactions between these agents warrants a new topic of study: Human-Data Interaction (HDI). In this paper we discuss how HDI sits at the intersection of various disciplines, including computer science, statistics, sociology, psychology and behavioural economics. We expose the challenges that HDI raises, organised into three core themes of legibility, agency and negotiability, and we present the HDI agenda to open up a dialogue amongst interested parties in the personal and big data ecosystems.
272 - Jie Ding , Bangjun Ding 2021
The emerging public awareness and government regulations of data privacy motivate new paradigms of collecting and analyzing data transparent and acceptable to data owners. We present a new concept of privacy and corresponding data formats, mechanisms , and tradeoffs for privatizing data during data collection. The privacy, named Interval Privacy, enforces the raw data conditional distribution on the privatized data to be the same as its unconditional distribution over a nontrivial support set. Correspondingly, the proposed privacy mechanism will record each data value as a random interval containing it. The proposed interval privacy mechanisms can be easily deployed through most existing survey-based data collection paradigms, e.g., by asking a respondent whether its data value is within a randomly generated range. Another unique feature of interval mechanisms is that they obfuscate the truth but not distort it. The way of using narrowed range to convey information is complementary to the popular paradigm of perturbing data. Also, the interval mechanisms can generate progressively refined information at the discretion of individual respondents. We study different theoretical aspects of the proposed privacy. In the context of supervised learning, we also offer a method such that existing supervised learning algorithms designed for point-valued data could be directly applied to learning from interval-valued data.
Despite decades of research on approximate query processing (AQP), our understanding of sample-based joins has remained limited and, to some extent, even superficial. The common belief in the community is that joining random samples is futile. This b elief is largely based on an early result showing that the join of two uniform samples is not an independent sample of the original join, and that it leads to quadratically fewer output tuples. However, unfortunately, this result has little applicability to the key questions practitioners face. For example, the success metric is often the final approximations accuracy, rather than output cardinality. Moreover, there are many non-uniform sampling strategies that one can employ. Is sampling for joins still futile in all of these settings? If not, what is the best sampling strategy in each case? To the best of our knowledge, there is no formal study answering these questions. This paper aims to improve our understanding of sample-based joins and offer a guideline for practitioners building and using real-world AQP systems. We study limitations of offline samples in approximating join queries: given an offline sampling budget, how well can one approximate the join of two tables? We answer this question for two success metrics: output size and estimator variance. We show that maximizing output size is easy, while there is an information-theoretical lower bound on the lowest variance achievable by any sampling strategy. We then define a hybrid sampling scheme that captures all combinations of stratified, universe, and Bernoulli sampling, and show that this scheme with our optimal parameters achieves the theoretical lower bound within a constant factor. Since computing these optimal parameters requires shuffling statistics across the network, we also propose a decentralized variant where each node acts autonomously using minimal statistics.
Large-scale collection of human behavioral data by companies raises serious privacy concerns. We show that behavior captured in the form of application usage data collected from smartphones is highly unique even in very large datasets encompassing mi llions of individuals. This makes behavior-based re-identification of users across datasets possible. We study 12 months of data from 3.5 million users and show that four apps are enough to uniquely re-identify 91.2% of users using a simple strategy based on public information. Furthermore, we show that there is seasonal variability in uniqueness and that application usage fingerprints drift over time at an average constant rate.
This article presents a set of tools for the modeling of a spatial allocation problem in a large geographic market and gives examples of applications. In our settings, the market is described by a network that maps the cost of travel between each pai r of adjacent locations. Two types of agents are located at the nodes of this network. The buyers choose the most competitive sellers depending on their prices and the cost to reach them. Their utility is assumed additive in both these quantities. Each seller, taking as given other sellers prices, sets her own price to have a demand equal to the one we observed. We give a linear programming formulation for the equilibrium conditions. After formally introducing our model we apply it on two examples: prices offered by petrol stations and quality of services provided by maternity wards. These examples illustrate the applicability of our model to aggregate demand, rank prices and estimate cost structure over the network. We insist on the possibility of applications to large scale data sets using modern linear programming solvers such as Gurobi. In addition to this paper we released a R toolbox to implement our results and an online tutorial (http://optimalnetwork.github.io)
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا