أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Qiang Zhou

Quasinormal mode theory for nanoscale electromagnetism with quantum surface responses

121 - Qiang Zhou , Pu Zhang , Xue-Wen Chen 2021

We report a self-consistent quasinormal mode theory for nanometer scale electromagnetism where the possible nonlocal and quantum effects are treated through quantum surface responses. With Feibelmans frequency-dependent textit{d} parameters to descri be the quantum surface responses, we formulate the source-free Maxwells equations into a generalized linear eigenvalue problem to define the quasinormal modes. We then construct an orthonormal relation for the modes and consequently unlock the powerful toolbox of modal analysis. The orthonormal relation is validated by the reconstruction of the full numerical results through modal contributions. Significant changes in the landscape of the modes are observed due to the incorporation of the quantum surface responses for a number of nanostructures. Our semi-analytical modal analysis enables transparent physical interpretation of the spontaneous emission enhancement of a dipolar emitter as well as the near-field and far-field responses of planewave excitations in the nanostructures.

بصريات

CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

75 - Yijia Weng , He Wang , Qiang Zhou 2021

In this work, we tackle the problem of category-level online pose tracking of objects from point cloud sequences. For the first time, we propose a unified framework that can handle 9DoF pose tracking for novel rigid object instances as well as per-pa rt pose tracking for articulated objects from known categories. Here the 9DoF pose, comprising 6D pose and 3D size, is equivalent to a 3D amodal bounding box representation with free 6D pose. Given the depth point cloud at the current frame and the estimated pose from the last frame, our novel end-to-end pipeline learns to accurately update the pose. Our pipeline is composed of three modules: 1) a pose canonicalization module that normalizes the pose of the input depth point cloud; 2) RotationNet, a module that directly regresses small interframe delta rotations; and 3) CoordinateNet, a module that predicts the normalized coordinates and segmentation, enabling analytical computation of the 3D size and translation. Leveraging the small pose regime in the pose-canonicalized point clouds, our method integrates the best of both worlds by combining dense coordinate prediction and direct rotation regression, thus yielding an end-to-end differentiable pipeline optimized for 9DoF pose accuracy (without using non-differentiable RANSAC). Our extensive experiments demonstrate that our method achieves new state-of-the-art performance on category-level rigid object pose (NOCS-REAL275) and articulated object pose benchmarks (SAPIEN , BMVC) at the fastest FPS ~12.

الرؤية الحاسوبية وتمييز الأنماط

Optimization for Oriented Object Detection via Representation Invariance Loss

89 - Qi Ming , Zhiqiang Zhou , Lingjuan Miao 2021

Arbitrary-oriented objects exist widely in natural scenes, and thus the oriented object detection has received extensive attention in recent years. The mainstream rotation detectors use oriented bounding boxes (OBB) or quadrilateral bounding boxes (Q BB) to represent the rotating objects. However, these methods suffer from the representation ambiguity for oriented object definition, which leads to suboptimal regression optimization and the inconsistency between the loss metric and the localization accuracy of the predictions. In this paper, we propose a Representation Invariance Loss (RIL) to optimize the bounding box regression for the rotating objects. Specifically, RIL treats multiple representations of an oriented object as multiple equivalent local minima, and hence transforms bounding box regression into an adaptive matching process with these local minima. Then, the Hungarian matching algorithm is adopted to obtain the optimal regression strategy. We also propose a normalized rotation loss to alleviate the weak correlation between different variables and their unbalanced loss contribution in OBB representation. Extensive experiments on remote sensing datasets and scene text datasets show that our method achieves consistent and substantial improvement. The source code and trained models are available at https://github.com/ming71/RIDet.

الرؤية الحاسوبية وتمييز الأنماط

Human De-occlusion: Invisible Perception and Recovery for Humans

146 - Qiang Zhou , Shiyin Wang , Yitong Wang 2021

In this paper, we tackle the problem of human de-occlusion which reasons about occluded segmentation masks and invisible appearance content of humans. In particular, a two-stage framework is proposed to estimate the invisible portions and recover the content inside. For the stage of mask completion, a stacked network structure is devised to refine inaccurate masks from a general instance segmentation model and predict integrated masks simultaneously. Additionally, the guidance from human parsing and typical pose masks are leveraged to bring prior information. For the stage of content recovery, a novel parsing guided attention module is applied to isolate body parts and capture context information across multiple scales. Besides, an Amodal Human Perception dataset (AHP) is collected to settle the task of human de-occlusion. AHP has advantages of providing annotations from real-world scenes and the number of humans is comparatively larger than other amodal perception datasets. Based on this dataset, experiments demonstrate that our method performs over the state-of-the-art techniques in both tasks of mask completion and content recovery. Our AHP dataset is available at url{https://sydney0zq.github.io/ahp/}.

الرؤية الحاسوبية وتمييز الأنماط

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

171 - Qiang Zhou , Chaohui Yu , Zhibin Wang 2021

Supervised learning based object detection frameworks demand plenty of laborious manual annotations, which may not be practical in real applications. Semi-supervised object detection (SSOD) can effectively leverage unlabeled data to improve the model performance, which is of great significance for the application of object detection models. In this paper, we revisit SSOD and propose Instant-Teaching, a completely end-to-end and effective SSOD framework, which uses instant pseudo labeling with extended weak-strong data augmentations for teaching during each training iteration. To alleviate the confirmation bias problem and improve the quality of pseudo annotations, we further propose a co-rectify scheme based on Instant-Teaching, denoted as Instant-Teaching$^*$. Extensive experiments on both MS-COCO and PASCAL VOC datasets substantiate the superiority of our framework. Specifically, our method surpasses state-of-the-art methods by 4.2 mAP on MS-COCO when using $2%$ labeled data. Even with full supervised information of MS-COCO, the proposed method still outperforms state-of-the-art methods by about 1.0 mAP. On PASCAL VOC, we can achieve more than 5 mAP improvement by applying VOC07 as labeled data and VOC12 as unlabeled data.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

High-speed silicon microring modulator at 2-um waveband

97 - Weihong Shen , Gangqiang Zhou , Jiangbing Du 2021

We demonstrated a silicon integrated microring modulator working at the 2-um waveband with an L-shaped PN junction. 15-GHz 3-dB electro-optic bandwidth and <1 Vcm modulation efficiency for 45-Gbps NRZ-OOK signaling is achieved at 1960 nm.

الفيزياء التطبيقية بصريات

Modeling Heterogeneous Relations across Multiple Modes for Potential Crowd Flow Prediction

89 - Qiang Zhou , Jingjing Gu , Xinjiang Lu 2021

Potential crowd flow prediction for new planned transportation sites is a fundamental task for urban planners and administrators. Intuitively, the potential crowd flow of the new coming site can be implied by exploring the nearby sites. However, the transportation modes of nearby sites (e.g. bus stations, bicycle stations) might be different from the target site (e.g. subway station), which results in severe data scarcity issues. To this end, we propose a data driven approach, named MOHER, to predict the potential crowd flow in a certain mode for a new planned site. Specifically, we first identify the neighbor regions of the target site by examining the geographical proximity as well as the urban function similarity. Then, to aggregate these heterogeneous relations, we devise a cross-mode relational GCN, a novel relation-specific transformation model, which can learn not only the correlations but also the differences between different transportation modes. Afterward, we design an aggregator for inductive potential flow representation. Finally, an LTSM module is used for sequential flow prediction. Extensive experiments on real-world data sets demonstrate the superiority of the MOHER framework compared with the state-of-the-art algorithms.

التعلم الآلي

CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote Sensing Images

98 - Qi Ming , Lingjuan Miao , Zhiqiang Zhou 2021

Object detection in optical remote sensing images is an important and challenging task. In recent years, the methods based on convolutional neural networks have made good progress. However, due to the large variation in object scale, aspect ratio, an d arbitrary orientation, the detection performance is difficult to be further improved. In this paper, we discuss the role of discriminative features in object detection, and then propose a Critical Feature Capturing Network (CFC-Net) to improve detection accuracy from three aspects: building powerful feature representation, refining preset anchors, and optimizing label assignment. Specifically, we first decouple the classification and regression features, and then construct robust critical features adapted to the respective tasks through the Polarization Attention Module (PAM). With the extracted discriminative regression features, the Rotation Anchor Refinement Module (R-ARM) performs localization refinement on preset horizontal anchors to obtain superior rotation anchors. Next, the Dynamic Anchor Learning (DAL) strategy is given to adaptively select high-quality anchors based on their ability to capture critical features. The proposed framework creates more powerful semantic representations for objects in remote sensing images and achieves high-performance real-time object detection. Experimental results on three remote sensing datasets including HRSC2016, DOTA, and UCAS-AOD show that our method achieves superior detection performance compared with many state-of-the-art approaches. Code and models are available at https://github.com/ming71/CFC-Net.

الرؤية الحاسوبية وتمييز الأنماط

Molecular CT: Unifying Geometry and Representation Learning for Molecules at Different Scales

93 - Jun Zhang , Yaqiang Zhou , Yao-Kun Lei 2020

Deep learning is changing many areas in molecular physics, and it has shown great potential to deliver new solutions to challenging molecular modeling problems. Along with this trend arises the increasing demand of expressive and versatile neural net work architectures which are compatible with molecular systems. A new deep neural network architecture, Molecular Configuration Transformer (Molecular CT), is introduced for this purpose. Molecular CT is composed of a relation-aware encoder module and a computationally universal geometry learning unit, thus able to account for the relational constraints between particles meanwhile scalable to different particle numbers and invariant w.r.t. the trans-rotational transforms. The computational efficiency and universality make Molecular CT versatile for a variety of molecular learning scenarios and especially appealing for transferable representation learning across different molecular systems. As examples, we show that Molecular CT enables representational learning for molecular systems at different scales, and achieves comparable or improved results on common benchmarks using a more light-weighted structure compared to baseline models.

التعلم الآلي مادة مكثفة ناعمة

Dynamic Anchor Learning for Arbitrary-Oriented Object Detection

249 - Qi Ming , Zhiqiang Zhou , Lingjuan Miao 2020

Arbitrary-oriented objects widely appear in natural scenes, aerial photographs, remote sensing images, etc., thus arbitrary-oriented object detection has received considerable attention. Many current rotation detectors use plenty of anchors with diff erent orientations to achieve spatial alignment with ground truth boxes, then Intersection-over-Union (IoU) is applied to sample the positive and negative candidates for training. However, we observe that the selected positive anchors cannot always ensure accurate detections after regression, while some negative samples can achieve accurate localization. It indicates that the quality assessment of anchors through IoU is not appropriate, and this further lead to inconsistency between classification confidence and localization accuracy. In this paper, we propose a dynamic anchor learning (DAL) method, which utilizes the newly defined matching degree to comprehensively evaluate the localization potential of the anchors and carry out a more efficient label assignment process. In this way, the detector can dynamically select high-quality anchors to achieve accurate object detection, and the divergence between classification and regression will be alleviated. With the newly introduced DAL, we achieve superior detection performance for arbitrary-oriented objects with only a few horizontal preset anchors. Experimental results on three remote sensing datasets HRSC2016, DOTA, UCAS-AOD as well as a scene text dataset ICDAR 2015 show that our method achieves substantial improvement compared with the baseline model. Besides, our approach is also universal for object detection using horizontal bound box. The code and models are available at https://github.com/ming71/DAL.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد