No Arabic abstract
Underground online forums are platforms that enable trades of illicit services and stolen goods. Carding forums, in particular, are known for being focused on trading financial information. However, little evidence exists about the sellers that are present on carding forums, the precise types of products they advertise, and the prices buyers pay. Existing literature mainly focuses on the organisation and structure of the forums. Furthermore, studies on carding forums are usually based on literature review, expert interviews, or data from forums that have already been shut down. This paper provides first-of-its-kind empirical evidence on active forums where stolen financial data is traded. We monitored 5 out of 25 discovered forums, collected posts from the forums over a three-month period, and analysed them quantitatively and qualitatively. We focused our analyses on products, prices, seller prolificacy, seller specialisation, and seller reputation.
In this paper, we study the privacy of online health data. We present a novel online health data De-Anonymization (DA) framework, named De-Health. De-Health consists of two phases: Top-K DA, which identifies a candidate set for each anonymized user, and refined DA, which de-anonymizes an anonymized user to a user in its candidate set. By employing both candidate selection and DA verification schemes, De-Health significantly reduces the DA space by several orders of magnitude while achieving promising DA accuracy. Leveraging two real world online health datasets WebMD (89,393 users, 506K posts) and HealthBoards (388,398 users, 4.7M posts), we validate the efficacy of De-Health. Further, when the training data are insufficient, De-Health can still successfully de-anonymize a large portion of anonymized users. We develop the first analytical framework on the soundness and effectiveness of online health data DA. By analyzing the impact of various data features on the anonymity, we derive the conditions and probabilities for successfully de-anonymizing one user or a group of users in exact DA and Top-K DA. Our analysis is meaningful to both researchers and policy makers in facilitating the development of more effective anonymization techniques and proper privacy polices. We present a linkage attack framework which can link online health/medical information to real world people. Through a proof-of-concept attack, we link 347 out of 2805 WebMD users to real world people, and find the full names, medical/health information, birthdates, phone numbers, and other sensitive information for most of the re-identified users. This clearly illustrates the fragility of the notion of privacy of those who use online health forums.
In this paper we present a novel interactive multimodal learning system, which facilitates search and exploration in large networks of social multimedia users. It allows the analyst to identify and select users of interest, and to find similar users in an interactive learning setting. Our approach is based on novel multimodal representations of users, words and concepts, which we simultaneously learn by deploying a general-purpose neural embedding model. We show these representations to be useful not only for categorizing users, but also for automatically generating user and community profiles. Inspired by traditional summarization approaches, we create the profiles by selecting diverse and representative content from all available modalities, i.e. the text, image and user modality. The usefulness of the approach is evaluated using artificial actors, which simulate user behavior in a relevance feedback scenario. Multiple experiments were conducted in order to evaluate the quality of our multimodal representations, to compare different embedding strategies, and to determine the importance of different modalities. We demonstrate the capabilities of the proposed approach on two different multimedia collections originating from the violent online extremism forum Stormfront and the microblogging platform Twitter, which are particularly interesting due to the high semantic level of the discussions they feature.
We present Sapporo, a library for performing high-precision gravitational N-body simulations on NVIDIA Graphical Processing Units (GPUs). Our library mimics the GRAPE-6 library, and N-body codes currently running on GRAPE-6 can switch to Sapporo by a simple relinking of the library. The precision of our library is comparable to that of GRAPE-6, even though internally the GPU hardware is limited to single precision arithmetics. This limitation is effectively overcome by emulating double precision for calculating the distance between particles. The performance loss of this operation is small (< 20%) compared to the advantage of being able to run at high precision. We tested the library using several GRAPE-6-enabled N-body codes, in particular with Starlab and phiGRAPE. We measured peak performance of 800 Gflop/s for running with 10^6 particles on a PC with four commercial G92 architecture GPUs (two GeForce 9800GX2). As a production test, we simulated a 32k Plummer model with equal mass stars well beyond core collapse. The simulation took 41 days, during which the mean performance was 113 Gflop/s. The GPU did not show any problems from running in a production environment for such an extended period of time.
Due to time constraints, course instructors often need to selectively participate in student discussion threads, due to their limited bandwidth and lopsided student--instructor ratio on online forums. We propose the first deep learning models for this binary prediction problem. We propose novel attention based models to infer the amount of latent context necessary to predict instructor intervention. Such models also allow themselves to be tuned to instructors preference to intervene early or late. Our three proposed attentive model variants to infer the latent context improve over the state-of-the-art by a significant, large margin of 11% in F1 and 10% in recall, on average. Further, introspection of attention help us better understand what aspects of a discussion post propagate through the discussion thread that prompts instructor intervention.
Our opinions, which things we like or dislike, depend on the opinions of those around us. Nowadays, we are influenced by the opinions of online strangers, expressed in comments and ratings on online platforms. Here, we perform novel academic A/B testing experiments with over 2,500 participants to measure the extent of that influence. In our experiments, the participants watch and evaluate videos on mirror proxies of YouTube and Vimeo. We control the comments and ratings that are shown underneath each of these videos. Our study shows that from 5$%$ up to 40$%$ of subjects adopt the majority opinion of strangers expressed in the comments. Using Bayes theorem, we derive a flexible and interpretable family of models of social influence, in which each individual forms posterior opinions stochastically following a logit model. The variants of our mixture model that maximize Akaike information criterion represent two sub-populations, i.e., non-influenceable and influenceable individuals. The prior opinions of the non-influenceable individuals are strongly correlated with the external opinions and have low standard error, whereas the prior opinions of influenceable individuals have high standard error and become correlated with the external opinions due to social influence. Our findings suggest that opinions are random variables updated via Bayes rule whose standard deviation is correlated with opinion influenceability. Based on these findings, we discuss how to hinder opinion manipulation and misinformation diffusion in the online realm.