ﻻ يوجد ملخص باللغة العربية
Online Social Networks (OSNs) provide a platform for users to share their thoughts and opinions with their community of friends or to the general public. In order to keep the platform safe for all users, as well as to keep it compliant with local laws, OSNs typically create a set of community standards organized into policy groups, and use Machine Learning (ML) models to identify and remove content that violates any of the policies. However, out of the billions of content that is uploaded on a daily basis only a small fraction is so unambiguously violating that it can be removed by the automated models. Prevalence estimation is the task of estimating the fraction of violating content in the residual items by sending a small sample of these items to human labelers to get ground truth labels. This task is exceedingly hard because even though we can easily get the ML scores or features for all of the billions of items we can only get ground truth labels on a few thousands of these items due to practical considerations. Indeed the prevalence can be so low that even after a judicious choice of items to be labeled there can be many days in which not even a single item is labeled violating. A pragmatic choice for such low prevalence, $10^{-4}$ to $10^{-5}$, regimes is to report the upper bound, or $97.5%$ confidence interval, prevalence (UBP) that takes the uncertainties of the sampling and labeling processes into account and gives a smoothed estimate. In this work we present two novel techniques Bucketed-Beta-Binomial and a Bucketed-Gaussian Process for this UBP task and demonstrate on real and simulated data that it has much better coverage than the commonly used bootstrapping technique.
We present an empirical study of different social networks obtained from digital repositories. Our analysis reveals the community structure and provides a useful visualising technique. We investigate the scaling properties of the community size distr
In online question-and-answer (QA) websites like Quora, one central issue is to find (invite) users who are able to provide answers to a given question and at the same time would be unlikely to say no to the invitation. The challenge is how to trade
We introduce a new paradigm that is important for community detection in the realm of network analysis. Networks contain a set of strong, dominant communities, which interfere with the detection of weak, natural community structure. When most of the
An increasing number of todays social interactions occurs using online social media as communication channels. Some online social networks have become extremely popular in the last decade. They differ among themselves in the character of the service
A common goal in network modeling is to uncover the latent community structure present among nodes. For many real-world networks, observed connections consist of events arriving as streams, which are then aggregated to form edges, ignoring the tempor