No Arabic abstract
We propose two well-motivated ranking-based methods to enhance the performance of current state-of-the-art human activity recognition systems. First, as an improvement over the classic power normalization method, we propose a parameter-free ranking technique called rank normalization (RaN). RaN normalizes each dimension of the video features to address the sparse and bursty distribution problems of Fisher Vectors and VLAD. Second, inspired by curriculum learning, we introduce a training-free re-ranking technique called multi-class iterative re-ranking (MIR). MIR captures relationships among action classes by separating easy and typical videos from difficult ones and re-ranking the prediction scores of classifiers accordingly. We demonstrate that our methods significantly improve the performance of state-of-the-art motion features on six real-world datasets.
Nowadays, deep learning is widely applied to extract features for similarity computation in person re-identification (re-ID) and have achieved great success. However, due to the non-overlapping between training and testing IDs, the difference between the data used for model training and the testing data makes the performance of learned feature degraded during testing. Hence, re-ranking is proposed to mitigate this issue and various algorithms have been developed. However, most of existing re-ranking methods focus on replacing the Euclidean distance with sophisticated distance metrics, which are not friendly to downstream tasks and hard to be used for fast retrieval of massive data in real applications. In this work, we propose a graph-based re-ranking method to improve learned features while still keeping Euclidean distance as the similarity metric. Inspired by graph convolution networks, we develop an operator to propagate features over an appropriate graph. Since graph is the essential key for the propagation, two important criteria are considered for designing the graph, and three different graphs are explored accordingly. Furthermore, a simple yet effective method is proposed to generate a profile vector for each tracklet in videos, which helps extend our method to video re-ID. Extensive experiments on three benchmark data sets, e.g., Market-1501, Duke, and MARS, demonstrate the effectiveness of our proposed approach.
Users of industrial recommender systems are normally suggesteda list of items at one time. Ideally, such list-wise recommendationshould provide diverse and relevant options to the users. However, in practice, list-wise recommendation is implemented as top-N recommendation. Top-N recommendation selects the first N items from candidates to display. The list is generated by a ranking function, which is learned from labeled data to optimize accuracy.However, top-N recommendation may lead to suboptimal, as it focuses on accuracy of each individual item independently and overlooks mutual influence between items. Therefore, we propose a personalized re-ranking model for improving diversity of the recommendation list in real recommender systems. The proposed re-ranking model can be easily deployed as a follow-up component after any existing ranking function. The re-ranking model improves the diversity by employing personalized Determinental Point Process (DPP). DPP has been applied in some recommender systems to improve the diversity and increase the user engagement.However, DPP does not take into account the fact that users may have individual propensities to the diversity. To overcome such limitation, our re-ranking model proposes a personalized DPP to model the trade-off between accuracy and diversity for each individual user. We implement and deploy the personalized DPP model on alarge scale industrial recommender system. Experimental results on both offline and online demonstrate the efficiency of our proposed re-ranking model.
Image copy detection is challenging and appealing topic in computer vision and signal processing. Recent advancements in multimedia have made distribution of image across the global easy and fast: that leads to many other issues such as forgery and image copy retrieval. Local keypoint descriptors such as SIFT are used to represent the images, and based on those descriptors matching, images are matched and retrieved. Features are quantized so that searching/matching may be made feasible for large databases at the cost of accuracy loss. In this paper, we propose binary feature that is obtained by quantizing the SIFT into binary, and rank list is re-examined to remove the false positives. Experiments on challenging dataset shows the gain in accuracy and time.
In this paper, we propose a two-stage depth ranking based method (DRPose3D) to tackle the problem of 3D human pose estimation. Instead of accurate 3D positions, the depth ranking can be identified by human intuitively and learned using the deep neural network more easily by solving classification problems. Moreover, depth ranking contains rich 3D information. It prevents the 2D-to-3D pose regression in two-stage methods from being ill-posed. In our method, firstly, we design a Pairwise Ranking Convolutional Neural Network (PRCNN) to extract depth rankings of human joints from images. Secondly, a coarse-to-fine 3D Pose Network(DPNet) is proposed to estimate 3D poses from both depth rankings and 2D human joint locations. Additionally, to improve the generality of our model, we introduce a statistical method to augment depth rankings. Our approach outperforms the state-of-the-art methods in the Human3.6M benchmark for all three testing protocols, indicating that depth ranking is an essential geometric feature which can be learned to improve the 3D pose estimation.
Monocular 3D human-pose estimation from static images is a challenging problem, due to the curse of dimensionality and the ill-posed nature of lifting 2D-to-3D. In this paper, we propose a Deep Conditional Variational Autoencoder based model that synthesizes diverse anatomically plausible 3D-pose samples conditioned on the estimated 2D-pose. We show that CVAE-based 3D-pose sample set is consistent with the 2D-pose and helps tackling the inherent ambiguity in 2D-to-3D lifting. We propose two strategies for obtaining the final 3D pose- (a) depth-ordering/ordinal relations to score and weight-average the candidate 3D-poses, referred to as OrdinalScore, and (b) with supervision from an Oracle. We report close to state of-the-art results on two benchmark datasets using OrdinalScore, and state-of-the-art results using the Oracle. We also show that our pipeline yields competitive results without paired image-to-3D annotations. The training and evaluation code is available at https://github.com/ssfootball04/generative_pose.