ترغب بنشر مسار تعليمي؟ اضغط هنا

On Privacy of Socially Contagious Attributes

81   0   0.0 ( 0 )
 نشر من قبل Aria Rezaei
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

A commonly used method to protect user privacy in data collection is to perform randomized perturbation on users real data before collection so that aggregated statistics can still be inferred without endangering secrets held by individuals. In this paper, we take a closer look at the validity of Differential Privacy guarantees, when the sensitive attributes are subject to social influence and contagions. We first show that in the absence of any knowledge about the contagion network, an adversary that tries to predict the real values from perturbed ones, cannot achieve an area under the ROC curve (AUC) above $1-(1-delta)/(1+e^varepsilon)$, if the dataset is perturbed using an $(varepsilon,delta)$-differentially private mechanism. Then, we show that with the knowledge of the contagion network and model, one can do significantly better. We demonstrate that our method passes the performance limit imposed by differential privacy. Our experiments also reveal that nodes with high influence on others are at more risk of revealing their secrets than others. The performance is shown through extensive experiments on synthetic and real-world networks.



قيم البحث

اقرأ أيضاً

The presence of correlation is known to make privacy protection more difficult. We investigate the privacy of socially contagious attributes on a network of individuals, where each individual possessing that attribute may influence a number of others into adopting it. We show that for contagions following the Independent Cascade model there exists a giant connected component of infected nodes, containing a constant fraction of all the nodes who all receive the contagion from the same set of sources. We further show that it is extremely hard to hide the existence of this giant connected component if we want to obtain an estimate of the activated users at an acceptable level. Moreover, an adversary possessing this knowledge can predict the real status (active or inactive) with decent probability for many of the individuals regardless of the privacy (perturbation) mechanism used. As a case study, we show that the Wasserstein mechanism, a state-of-the-art privacy mechanism designed specifically for correlated data, introduces a noise with magnitude of order $Omega(n)$ in the count estimation in our setting. We provide theoretical guarantees for two classes of random networks: Erdos Renyi graphs and Chung-Lu power-law graphs under the Independent Cascade model. Experiments demonstrate that a giant connected component of infected nodes can and does appear in real-world networks and that a simple inference attack can reveal the status of a good fraction of nodes.
Recent research has focused on the monitoring of global-scale online data for improved detection of epidemics, mood patterns, movements in the stock market, political revolutions, box-office revenues, consumer behaviour and many other important pheno mena. However, privacy considerations and the sheer scale of data available online are quickly making global monitoring infeasible, and existing methods do not take full advantage of local network structure to identify key nodes for monitoring. Here, we develop a model of the contagious spread of information in a global-scale, publicly-articulated social network and show that a simple method can yield not just early detection, but advance warning of contagious outbreaks. In this method, we randomly choose a small fraction of nodes in the network and then we randomly choose a friend of each node to include in a group for local monitoring. Using six months of data from most of the full Twittersphere, we show that this friend group is more central in the network and it helps us to detect viral outbreaks of the use of novel hashtags about 7 days earlier than we could with an equal-sized randomly chosen group. Moreover, the method actually works better than expected due to network structure alone because highly central actors are both more active and exhibit increased diversity in the information they transmit to others. These results suggest that local monitoring is not just more efficient, it is more effective, and it is possible that other contagious processes in global-scale networks may be similarly monitored.
120 - Xin Liu , Tsuyoshi Murata , 2014
In network science, assortativity refers to the tendency of links to exist between nodes with similar attributes. In social networks, for example, links tend to exist between individuals of similar age, nationality, location, race, income, educationa l level, religious belief, and language. Thus, various attributes jointly affect the network topology. An interesting problem is to detect community structure beyond some specific assortativity-related attributes $rho$, i.e., to take out the effect of $rho$ on network topology and reveal the hidden community structure which are due to other attributes. An approach to this problem is to redefine the null model of the modularity measure, so as to simulate the effect of $rho$ on network topology. However, a challenge is that we do not know to what extent the network topology is affected by $rho$ and by other attributes. In this paper, we propose Dist-Modularity which allows us to freely choose any suitable function to simulate the effect of $rho$. Such freedom can help us probe the effect of $rho$ and detect the hidden communities which are due to other attributes. We test the effectiveness of Dist-Modularity on synthetic benchmarks and two real-world networks.
One of the most significant challenges facing systems of collective intelligence is how to encourage participation on the scale required to produce high quality data. This paper details ongoing work with Phrase Detectives, an online game-with-a-purpo se deployed on Facebook, and investigates user motivations for participation in social network gaming where the wisdom of crowds produces useful data.
The ability to share social network data at the level of individual connections is beneficial to science: not only for reproducing results, but also for researchers who may wish to use it for purposes not foreseen by the data releaser. Sharing such d ata, however, can lead to serious privacy issues, because individuals could be re-identified, not only based on possible nodes attributes, but also from the structure of the network around them. The risk associated with re-identification can be measured and it is more serious in some networks than in others. Various optimization algorithms have been proposed to anonymize the network while keeping the number of changes minimal. However, existing algorithms do not provide guarantees on where the changes will be made, making it difficult to quantify their effect on various measures. Using network models and real data, we show that the average degree of networks is a crucial parameter for the severity of re-identification risk from nodes neighborhoods. Dense networks are more at risk, and, apart from a small band of average degree values, either almost all nodes are re-identifiable or they are all safe. Our results allow researchers to assess the privacy risk based on a small number of network statistics which are available even before the data is collected. As a rule-of-thumb, the privacy risks are high if the average degree is above 10. Guided by these results we propose a simple method based on edge sampling to mitigate the re-identification risk of nodes. Our method can be implemented already at the data collection phase. Its effect on various network measures can be estimated and corrected using sampling theory. These properties are in contrast with previous methods arbitrarily biasing the data. In this sense, our work could help in sharing network data in a statistically tractable way.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا