أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Wenwei Zhang

Integrated Satellite-HAP-Terrestrial Networks for Dual-Band Connectivity

310 - Wenwei Zhang , Ruoqi Deng , Boya Di 2021

The recent development of high-altitude platforms (HAPs) has attracted increasing attention since they can serve as a promising communication method to assist satellite-terrestrial networks. In this paper, we consider an integrated three-layer satell ite-HAP-terrestrial network where the HAP support dual-band connectivity. Specifically, the HAP can not only communicate with terrestrial users over C-band directly, but also provide backhaul services to terrestrial user terminals over Ka-band. We formulate a sum-rate maximization problem and then propose a fractional programming based algorithm to solve the problem by optimizing the bandwidth and power allocation iteratively. The closed-form optimal solutions for bandwidth allocation and power allocation in each iteration are also derived. Simulation results show the capacity enhancement brought by the dual-band connectivity of the HAP. The influence of the power of the HAP and the power of the satellite is also discussed.

معالجة الإشارات

K-Net: Towards Unified Image Segmentation

119 - Wenwei Zhang , Jiangmiao Pang , Kai Chen 2021

Semantic, instance, and panoptic segmentations have been addressed using different and specialized frameworks despite their underlying connections. This paper presents a unified, simple, and effective framework for these essentially similar tasks. Th e framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels, where each kernel is responsible for generating a mask for either a potential instance or a stuff class. To remedy the difficulties of distinguishing various instances, we propose a kernel update strategy that enables each kernel dynamic and conditional on its meaningful group in the input image. K-Net can be trained in an end-to-end manner with bipartite matching, and its training and inference are naturally NMS-free and box-free. Without bells and whistles, K-Net surpasses all previous state-of-the-art single-model results of panoptic segmentation on MS COCO and semantic segmentation on ADE20K with 52.1% PQ and 54.3% mIoU, respectively. Its instance segmentation performance is also on par with Cascade Mask R-CNNon MS COCO with 60%-90% faster inference speeds. Code and models will be released at https://github.com/open-mmlab/mmdetection.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Exploring Data Augmentation for Multi-Modality 3D Object Detection

109 - Wenwei Zhang , Zhe Wang , Chen Change Loy 2020

It is counter-intuitive that multi-modality methods based on point cloud and images perform only marginally better or sometimes worse than approaches that solely use point cloud. This paper investigates the reason behind this phenomenon. Due to the f act that multi-modality data augmentation must maintain consistency between point cloud and images, recent methods in this field typically use relatively insufficient data augmentation. This shortage makes their performance under expectation. Therefore, we contribute a pipeline, named transformation flow, to bridge the gap between single and multi-modality data augmentation with transformation reversing and replaying. In addition, considering occlusions, a point in different modalities may be occupied by different objects, making augmentations such as cut and paste non-trivial for multi-modality detection. We further present Multi-mOdality Cut and pAste (MoCa), which simultaneously considers occlusion and physical plausibility to maintain the multi-modality consistency. Without using ensemble of detectors, our multi-modality detector achieves new state-of-the-art performance on nuScenes dataset and competitive performance on KITTI 3D benchmark. Our method also wins the best PKL award in the 3rd nuScenes detection challenge. Code and models will be released at https://github.com/open-mmlab/mmdetection3d.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Seesaw Loss for Long-Tailed Instance Segmentation

185 - Jiaqi Wang , Wenwei Zhang , Yuhang Zang 2020

Instance segmentation has witnessed a remarkable progress on class-balanced benchmarks. However, they fail to perform as accurately in real-world scenarios, where the category distribution of objects naturally comes with a long tail. Instances of hea d classes dominate a long-tailed dataset and they serve as negative samples of tail categories. The overwhelming gradients of negative samples on tail classes lead to a biased learning process for classifiers. Consequently, objects of tail categories are more likely to be misclassified as backgrounds or head categories. To tackle this problem, we propose Seesaw Loss to dynamically re-balance gradients of positive and negative samples for each category, with two complementary factors, i.e., mitigation factor and compensation factor. The mitigation factor reduces punishments to tail categories w.r.t. the ratio of cumulative training instances between different categories. Meanwhile, the compensation factor increases the penalty of misclassified instances to avoid false positives of tail categories. We conduct extensive experiments on Seesaw Loss with mainstream frameworks and different data sampling strategies. With a simple end-to-end training pipeline, Seesaw Loss obtains significant gains over Cross-Entropy Loss, and achieves state-of-the-art performance on LVIS dataset without bells and whistles. Code is available at https://github.com/open-mmlab/mmdetection.

الرؤية الحاسوبية وتمييز الأنماط

More Information Supervised Probabilistic Deep Face Embedding Learning

118 - Ying Huang , Shangfeng Qiu , Wenwei Zhang 2020

Researches using margin based comparison loss demonstrate the effectiveness of penalizing the distance between face feature and their corresponding class centers. Despite their popularity and excellent performance, they do not explicitly encourage th e generic embedding learning for an open set recognition problem. In this paper, we analyse margin based softmax loss in probability view. With this perspective, we propose two general principles: 1) monotonic decreasing and 2) margin probability penalty, for designing new margin loss functions. Unlike methods optimized with single comparison metric, we provide a new perspective to treat open set face recognition as a problem of information transmission. And the generalization capability for face embedding is gained with more clean information. An auto-encoder architecture called Linear-Auto-TS-Encoder(LATSE) is proposed to corroborate this finding. Extensive experiments on several benchmarks demonstrate that LATSE help face embedding to gain more generalization capability and it boosted the single model performance with open training dataset to more than $99%$ on MegaFace test.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Deep Frequent Spatial Temporal Learning for Face Anti-Spoofing

96 - Ying Huang , Wenwei Zhang , 2020

Face anti-spoofing is crucial for the security of face recognition system, by avoiding invaded with presentation attack. Previous works have shown the effectiveness of using depth and temporal supervision for this task. However, depth supervision is often considered only in a single frame, and temporal supervision is explored by utilizing certain signals which is not robust to the change of scenes. In this work, motivated by two stream ConvNets, we propose a novel two stream FreqSaptialTemporalNet for face anti-spoofing which simultaneously takes advantage of frequent, spatial and temporal information. Compared with existing methods which mine spoofing cues in multi-frame RGB image, we make multi-frame spectrum image as one input stream for the discriminative deep neural network, encouraging the primary difference between live and fake video to be automatically unearthed. Extensive experiments show promising improvement results using the proposed architecture. Meanwhile, we proposed a concise method to obtain a large amount of spoofing training data by utilizing a frequent augmentation pipeline, which contributes detail visualization between live and fake images as well as data insufficiency issue when training large networks.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

EcoNAS: Finding Proxies for Economical Neural Architecture Search

142 - Dongzhan Zhou , Xinchi Zhou , Wenwei Zhang 2020

Neural Architecture Search (NAS) achieves significant progress in many computer vision tasks. While many methods have been proposed to improve the efficiency of NAS, the search progress is still laborious because training and evaluating plausible arc hitectures over large search space is time-consuming. Assessing network candidates under a proxy (i.e., computationally reduced setting) thus becomes inevitable. In this paper, we observe that most existing proxies exhibit different behaviors in maintaining the rank consistency among network candidates. In particular, some proxies can be more reliable -- the rank of candidates does not differ much comparing their reduced setting performance and final performance. In this paper, we systematically investigate some widely adopted reduction factors and report our observations. Inspired by these observations, we present a reliable proxy and further formulate a hierarchical proxy strategy. The strategy spends more computations on candidate networks that are potentially more accurate, while discards unpromising ones in early stage with a fast proxy. This leads to an economical evolutionary-based NAS (EcoNAS), which achieves an impressive 400x search time reduction in comparison to the evolutionary-based state of the art (8 vs. 3150 GPU days). Some new proxies led by our observations can also be applied to accelerate other NAS methods while still able to discover good candidate networks with performance matching those found by previous proxy strategies.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي الحوسبة العصبية والتطورية

Side-Aware Boundary Localization for More Precise Object Detection

129 - Jiaqi Wang , Wenwei Zhang , Yuhang Cao 2019

Current object detection frameworks mainly rely on bounding box regression to localize objects. Despite the remarkable progress in recent years, the precision of bounding box regression remains unsatisfactory, hence limiting performance in object det ection. We observe that precise localization requires careful placement of each side of the bounding box. However, the mainstream approach, which focuses on predicting centers and sizes, is not the most effective way to accomplish this task, especially when there exists displacements with large variance between the anchors and the targets. In this paper, we propose an alternative approach, named as Side-Aware Boundary Localization (SABL), where each side of the bounding box is respectively localized with a dedicated network branch. To tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket. We test the proposed method on both two-stage and single-stage detection frameworks. Replacing the standard bounding box regression branch with the proposed design leads to significant improvements on Faster R-CNN, RetinaNet, and Cascade R-CNN, by 3.0%, 1.7%, and 0.9%, respectively. Code is available at https://github.com/open-mmlab/mmdetection.

الرؤية الحاسوبية وتمييز الأنماط

Robust Multi-Modality Multi-Object Tracking

151 - Wenwei Zhang , Hui Zhou , Shuyang Sun 2019

Multi-sensor perception is crucial to ensure the reliability and accuracy in autonomous driving system, while multi-object tracking (MOT) improves that by tracing sequential movement of dynamic objects. Most current approaches for multi-sensor multi- object tracking are either lack of reliability by tightly relying on a single input source (e.g., center camera), or not accurate enough by fusing the results from multiple sensors in post processing without fully exploiting the inherent information. In this study, we design a generic sensor-agnostic multi-modality MOT framework (mmMOT), where each modality (i.e., sensors) is capable of performing its role independently to preserve reliability, and further improving its accuracy through a novel multi-modality fusion module. Our mmMOT can be trained in an end-to-end manner, enables joint optimization for the base feature extractor of each modality and an adjacency estimator for cross modality. Our mmMOT also makes the first attempt to encode deep representation of point cloud in data association process in MOT. We conduct extensive experiments to evaluate the effectiveness of the proposed framework on the challenging KITTI benchmark and report state-of-the-art performance. Code and models are available at https://github.com/ZwwWayne/mmMOT.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد