BiRank: Towards Ranking on Bipartite Graphs

97 0 0.0 ( 0 )

Download Cite

Added by Xiangnan He

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Xiangnan He - Ming Gao - Min-Yen Kan

Information Retrieval

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The bipartite graph is a ubiquitous data structure that can model the relationship between two entity types: for instance, users and items, queries and webpages. In this paper, we study the problem of ranking vertices of a bipartite graph, based on the graphs link structure as well as prior information about vertices (which we term a query vector). We present a new solution, BiRank, which iteratively assigns scores to vertices and finally converges to a unique stationary ranking. In contrast to the traditional random walk-based methods, BiRank iterates towards optimizing a regularization function, which smooths the graph under the guidance of the query vector. Importantly, we establish how BiRank relates to the Bayesian methodology, enabling the future extension in a probabilistic way. To show the rationale and extendability of the ranking methodology, we further extend it to rank for the more generic n-partite graphs. BiRanks generic modeling of both the graph structure and vertex features enables it to model various ranking hypotheses flexibly. To illustrate its functionality, we apply the BiRank and TriRank (ranking for tripartite graphs) algorithms to two real-world applications: a general ranking scenario that predicts the future popularity of items, and a personalized ranking scenario that recommends items of interest to users. Extensive experiments on both synthetic and real-world datasets demonstrate BiRanks soundness (fast convergence), efficiency (linear in the number of graph edges) and effectiveness (achieving state-of-the-art in the two real-world tasks).

rate research

COLD: Towards the Next Generation of Pre-Ranking System

86 - Zhe Wang , Liqin Zhao , Biye Jiang 2020

Multi-stage cascade architecture exists widely in many industrial systems such as recommender systems and online advertising, which often consists of sequential modules including matching, pre-ranking, ranking, etc. For a long time, it is believed pre-ranking is just a simplified version of the ranking module, considering the larger size of the candidate set to be ranked. Thus, efforts are made mostly on simplifying ranking model to handle the explosion of computing power for online inference. In this paper, we rethink the challenge of the pre-ranking system from an algorithm-system co-design view. Instead of saving computing power with restriction of model architecture which causes loss of model performance, here we design a new pre-ranking system by joint optimization of both the pre-ranking model and the computing power it costs. We name it COLD (Computing power cost-aware Online and Lightweight Deep pre-ranking system). COLD beats SOTA in three folds: (i) an arbitrary deep model with cross features can be applied in COLD under a constraint of controllable computing power cost. (ii) computing power cost is explicitly reduced by applying optimization tricks for inference acceleration. This further brings space for COLD to apply more complex deep models to reach better performance. (iii) COLD model works in an online learning and severing manner, bringing it excellent ability to handle the challenge of the data distribution shift. Meanwhile, the fully online pre-ranking system of COLD provides us with a flexible infrastructure that supports efficient new model developing and online A/B testing.Since 2019, COLD has been deployed in almost all products involving the pre-ranking module in the display advertising system in Alibaba, bringing significant improvements.

Information Retrieval Artificial Intelligence Machine Learning

On the Problem of Underranking in Group-Fair Ranking

77 - Sruthi Gorantla , Amit Deshpande , Anand Louis 2020

Search and recommendation systems, such as search engines, recruiting tools, online marketplaces, news, and social media, output ranked lists of content, products, and sometimes, people. Credit ratings, standardized tests, risk assessments output only a score, but are also used implicitly for ranking. Bias in such ranking systems, especially among the top ranks, can worsen social and economic inequalities, polarize opinions, and reinforce stereotypes. On the other hand, a bias correction for minority groups can cause more harm if perceived as favoring group-fair outcomes over meritocracy. In this paper, we formulate the problem of underranking in group-fair rankings, which was not addressed in previous work. Most group-fair ranking algorithms post-process a given ranking and output a group-fair ranking. We define underranking based on how close the group-fair rank of each item is to its original rank, and prove a lower bound on the trade-off achievable for simultaneous underranking and group fairness in ranking. We give a fair ranking algorithm that takes any given ranking and outputs another ranking with simultaneous underranking and group fairness guarantees comparable to the lower bound we prove. Our algorithm works with group fairness constraints for any number of groups. Our experimental results confirm the theoretical trade-off between underranking and group fairness, and also show that our algorithm achieves the best of both when compared to the state-of-the-art baselines.

Information Retrieval Computers and Society Data Structures and Algorithms

Towards a Better Tradeoff between Effectiveness and Efficiency in Pre-Ranking: A Learnable Feature Selection based Approach

72 - Xu Ma , Pengjie Wang , Hui Zhao 2021

In real-world search, recommendation, and advertising systems, the multi-stage ranking architecture is commonly adopted. Such architecture usually consists of matching, pre-ranking, ranking, and re-ranking stages. In the pre-ranking stage, vector-product based models with representation-focused architecture are commonly adopted to account for system efficiency. However, it brings a significant loss to the effectiveness of the system. In this paper, a novel pre-ranking approach is proposed which supports complicated models with interaction-focused architecture. It achieves a better tradeoff between effectiveness and efficiency by utilizing the proposed learnable Feature Selection method based on feature Complexity and variational Dropout (FSCD). Evaluations in a real-world e-commerce sponsored search system for a search engine demonstrate that utilizing the proposed pre-ranking, the effectiveness of the system is significantly improved. Moreover, compared to the systems with conventional pre-ranking models, an identical amount of computational resource is consumed.

Information Retrieval Artificial Intelligence

PoolRank: Max/Min Pooling-based Ranking Loss for Listwise Learning & Ranking Balance

71 - Zhizhong Chen , Carsten Eickhoff 2021

Numerous neural retrieval models have been proposed in recent years. These models learn to compute a ranking score between the given query and document. The majority of existing models are trained in pairwise fashion using human-judged labels directly without further calibration. The traditional pairwise schemes can be time-consuming and require pre-defined positive-negative document pairs for training, potentially leading to learning bias due to document distribution mismatch between training and test conditions. Some popular existing listwise schemes rely on the strong pre-defined probabilistic assumptions and stark difference between relevant and non-relevant documents for the given query, which may limit the model potential due to the low-quality or ambiguous relevance labels. To address these concerns, we turn to a physics-inspired ranking balance scheme and propose PoolRank, a pooling-based listwise learning framework. The proposed scheme has four major advantages: (1) PoolRank extracts training information from the best candidates at the local level based on model performance and relative ranking among abundant document candidates. (2) By combining four pooling-based loss components in a multi-task learning fashion, PoolRank calibrates the ranking balance for the partially relevant and the highly non-relevant documents automatically without costly human inspection. (3) PoolRank can be easily generalized to any neural retrieval model without requiring additional learnable parameters or model structure modifications. (4) Compared to pairwise learning and existing listwise learning schemes, PoolRank yields better ranking performance for all studied retrieval models while retaining efficient convergence rates.

Information Retrieval

Production Ranking Systems: A Review

77 - Murium Iqbal , Nishan Subedi , Kamelia Aryafar 2019

The problem of ranking is a multi-billion dollar problem. In this paper we present an overview of several production quality ranking systems. We show that due to conflicting goals of employing the most effective machine learning models and responding to users in real time, ranking systems have evolved into a system of systems, where each subsystem can be viewed as a component layer. We view these layers as being data processing, representation learning, candidate selection and online inference. Each layer employs different algorithms and tools, with every end-to-end ranking system spanning multiple architectures. Our goal is to familiarize the general audience with a working knowledge of ranking at scale, the tools and algorithms employed and the challenges introduced by adopting a layered approach.

Information Retrieval Machine Learning Machine Learning