أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Wanpeng Xiao

Towards A Category-extended Object Detector without Relabeling or Conflicts

95 - Bowen Zhao , Chen Chen , Wanpeng Xiao 2020

Object detectors are typically learned based on fully-annotated training data with fixed pre-defined categories. However, not all possible categories of interest can be known beforehand, classes are often required to be increased progressively in man y realistic applications. In such scenario, only the original training set annotated with the old classes and some new training data labeled with the new classes are available. Based on the limited datasets without extra manual labor, a unified detector that can handle all categories is strongly needed. Plain joint training leads to heavy biases and poor performance due to the incomplete annotations. To avoid such situation, we propose a practical framework in this paper. A conflict-free loss is designed to avoid label ambiguity, leading to an acceptable detector in one training round. To further improve performance, we propose a retraining phase in which Monte Carlo Dropout is employed to calculate the localization confidence, combined with the classification confidence, to mine more accurate bounding boxes, and an overlap-weighted method is employed for making better use of pseudo annotations during retraining to achieve more powerful detectors. Extensive experiments conducted on multiple datasets demonstrate the effectiveness of our framework for category-extended object detectors.

الرؤية الحاسوبية وتمييز الأنماط

Improving Image Captioning with Conditional Generative Adversarial Nets

530 - Chen Chen , Shuai Mu , Wanpeng Xiao 2018

In this paper, we propose a novel conditional-generative-adversarial-nets-based image captioning framework as an extension of traditional reinforcement-learning (RL)-based encoder-decoder architecture. To deal with the inconsistent evaluation problem among different objective language metrics, we are motivated to design some discriminator networks to automatically and progressively determine whether generated caption is human described or machine generated. Two kinds of discriminator architectures (CNN and RNN-based structures) are introduced since each has its own advantages. The proposed algorithm is generic so that it can enhance any existing RL-based image captioning framework and we show that the conventional RL training method is just a special case of our approach. Empirically, we show consistent improvements over all language evaluation metrics for different state-of-the-art image captioning models. In addition, the well-trained discriminators can also be viewed as objective image captioning evaluators

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد