No Arabic abstract
In this work we analyze traces of mobility and co-location among a group of nearly 1000 closely interacting individuals. We attempt to reconstruct the Facebook friendship graph, Facebook interaction network, as well as call and SMS networks from longitudinal records of person-to-person offline proximity. We find subtle, yet observable behavioral differences between pairs of people who communicate using each of the different channels and we show that the signal of friendship is strong enough to stand out from the noise of random and schedule-driven offline interactions between familiar strangers. Our study also provides an overview of methods for link inference based on offline behavior and proposes new features to improve the performance of the prediction task.
Early prediction of students at risk (STAR) is an effective and significant means to provide timely intervention for dropout and suicide. Existing works mostly rely on either online or offline learning behaviors which are not comprehensive enough to capture the whole learning processes and lead to unsatisfying prediction performance. We propose a novel algorithm (EPARS) that could early predict STAR in a semester by modeling online and offline learning behaviors. The online behaviors come from the log of activities when students use the online learning management system. The offline behaviors derive from the check-in records of the library. Our main observations are two folds. Significantly different from good students, STAR barely have regular and clear study routines. We devised a multi-scale bag-of-regularity method to extract the regularity of learning behaviors that is robust to sparse data. Second, friends of STAR are more likely to be at risk. We constructed a co-occurrence network to approximate the underlying social network and encode the social homophily as features through network embedding. To validate the proposed algorithm, extensive experiments have been conducted among an Asian university with 15,503 undergraduate students. The results indicate EPARS outperforms baselines by 14.62% ~ 38.22% in predicting STAR.
Objective: Ubiquitous internet access is reshaping the way we live, but it is accompanied by unprecedented challenges in preventing chronic diseases that are usually planted by long exposure to unhealthy lifestyles. This paper proposes leveraging online shopping behaviors as a proxy for personal lifestyle choices to improve chronic disease prevention literacy, targeted for times when e-commerce user experience has been assimilated into most peoples everyday lives. Methods: Longitudinal query logs and purchase records from 15 million online shoppers were accessed, constructing a broad spectrum of lifestyle features covering assorted product categories and buyer personas. Using the lifestyle-related information preceding their first purchases of specific prescription drugs, we could determine associations between online shoppers past lifestyle choices and whether they suffered from a particular chronic disease or not. Results: Novel lifestyle risk factors were discovered in two exemplars -- depression and diabetes, most of which showed cognitive congruence with existing healthcare knowledge. Further, such empirical findings could be adopted to locate online shoppers at high risk of these chronic diseases with decent accuracy (i.e., [area under the receiver operating characteristic curve] AUC=0.68 for depression and AUC=0.70 for diabetes), closely matching the performance of screening surveys benchmarked against medical diagnosis. Conclusions: Mining online shopping behaviors can point medical experts to a series of lifestyle issues associated with chronic diseases that are less explored to date. Hopefully, unobtrusive chronic disease surveillance via e-commerce sites can grant consenting individuals a privilege to be connected more readily with the medical profession and sophistication.
We consider the online stochastic matching problem proposed by Feldman et al. [FMMM09] as a model of display ad allocation. We are given a bipartite graph; one side of the graph corresponds to a fixed set of bins and the other side represents the set of possible ball types. At each time step, a ball is sampled independently from the given distribution and it needs to be matched upon its arrival to an empty bin. The goal is to maximize the number of allocations. We present an online algorithm for this problem with a competitive ratio of 0.702. Before our result, algorithms with a competitive ratio better than $1-1/e$ were known under the assumption that the expected number of arriving balls of each type is integral. A key idea of the algorithm is to collect statistics about the decisions of the optimum offline solution using Monte Carlo sampling and use those statistics to guide the decisions of the online algorithm. We also show that our algorithm achieves a competitive ratio of 0.705 when the rates are integral. On the hardness side, we prove that no online algorithm can have a competitive ratio better than 0.823 under the known distribution model (and henceforth under the permutation model). This improves upon the 5/6 hardness result proved by Goel and Mehta cite{GM08} for the permutation model.
How to obtain an unbiased ranking model by learning to rank with biased user feedback is an important research question for IR. Existing work on unbiased learning to rank (ULTR) can be broadly categorized into two groups -- the studies on unbiased learning algorithms with logged data, namely the textit{offline} unbiased learning, and the studies on unbiased parameters estimation with real-time user interactions, namely the textit{online} learning to rank. While their definitions of textit{unbiasness} are different, these two types of ULTR algorithms share the same goal -- to find the best models that rank documents based on their intrinsic relevance or utility. However, most studies on offline and online unbiased learning to rank are carried in parallel without detailed comparisons on their background theories and empirical performance. In this paper, we formalize the task of unbiased learning to rank and show that existing algorithms for offline unbiased learning and online learning to rank are just the two sides of the same coin. We evaluate six state-of-the-art ULTR algorithms and find that most of them can be used in both offline settings and online environments with or without minor modifications. Further, we analyze how different offline and online learning paradigms would affect the theoretical foundation and empirical effectiveness of each algorithm on both synthetic and real search data. Our findings could provide important insights and guideline for choosing and deploying ULTR algorithms in practice.
With similarity-based content delivery, the request for a content can be satisfied by delivering a related content under a dissimilarity cost. This letter addresses the joint optimization of caching and similarity-based delivery decisions across a network so as to minimize the weighted sum of average delay and dissimilarity cost. A convergent alternate gradient descent ascent algorithm is first introduced for an offline scenario with prior knowledge of the request rates, and then extended to an online setting. Numerical results validate the advantages of the approach with respect to standard per-cache solutions.