أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Liang Zheng

Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment

200 - Heliang Zheng , Huan Yang , Jianlong Fu 2021

An important scenario for image quality assessment (IQA) is to evaluate image restoration (IR) algorithms. The state-of-the-art approaches adopt a full-reference paradigm that compares restored images with their corresponding pristine-quality images. However, pristine-quality images are usually unavailable in blind image restoration tasks and real-world scenarios. In this paper, we propose a practical solution named degraded-reference IQA (DR-IQA), which exploits the inputs of IR models, degraded images, as references. Specifically, we extract reference information from degraded images by distilling knowledge from pristine-quality images. The distillation is achieved through learning a reference space, where various degraded images are encouraged to share the same feature statistics with pristine-quality images. And the reference space is optimized to capture deep image priors that are useful for quality assessment. Note that pristine-quality images are only used during training. Our work provides a powerful and differentiable metric for blind IRs, especially for GAN-based methods. Extensive experiments show that our results can even be close to the performance of full-reference settings.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

72 - Yunzhong Hou , Liang Zheng 2021

Multiview detection incorporates multiple camera views to deal with occlusions, and its central problem is multiview aggregation. Given feature map projections from multiple views onto a common ground plane, the state-of-the-art method addresses this problem via convolution, which applies the same calculation regardless of object locations. However, such translation-invariant behaviors might not be the best choice, as object features undergo various projection distortions according to their positions and cameras. In this paper, we propose a novel multiview detector, MVDeTr, that adopts a newly introduced shadow transformer to aggregate multiview information. Unlike convolutions, shadow transformer attends differently at different positions and cameras to deal with various shadow-like distortions. We propose an effective training scheme that includes a new view-coherent data augmentation method, which applies random augmentations while maintaining multiview consistency. On two multiview detection benchmarks, we report new state-of-the-art accuracy with the proposed system. Code is available at https://github.com/hou-yz/MVDeTr.

الرؤية الحاسوبية وتمييز الأنماط

Accelerating Kinodynamic RRT* Through Dimensionality Reduction

139 - Dongliang Zheng , Panagiotis Tsiotras 2021

Sampling-based motion planning algorithms such as RRT* are well-known for their ability to quickly find an initial solution and then converge to the optimal solution asymptotically. However, the convergence rate can be slow for highdimensional planni ng problems, particularly for dynamical systems where the sampling space is not just the configuration space but the full state space. In this paper, we introduce the idea of using a partial-final-state-free (PFF) optimal controller in kinodynamic RRT* [1] to reduce the dimensionality of the sampling space. Instead of sampling the full state space, the proposed accelerated kinodynamic RRT*, called Kino-RRT*, only samples part of the state space, while the rest of the states are selected by the PFF optimal controller. We also propose a delayed and intermittent update of the optimal arrival time of all the edges in the RRT* tree to decrease the computation complexity of the algorithm. We tested the proposed algorithm using 4-D and 10-D state-space linear systems and showed that Kino-RRT* converges much faster than the kinodynamic RRT* algorithm.

علم الروبوتات أنظمة وتحكم أنظمة وتحكم

What Does Rotation Prediction Tell Us about Classifier Accuracy under Varying Testing Environments?

133 - Weijian Deng , Stephen Gould , Liang Zheng 2021

Understanding classifier decision under novel environments is central to the community, and a common practice is evaluating it on labeled test sets. However, in real-world testing, image annotations are difficult and expensive to obtain, especially w hen the test environment is changing. A natural question then arises: given a trained classifier, can we evaluate its accuracy on varying unlabeled test sets? In this work, we train semantic classification and rotation prediction in a multi-task way. On a series of datasets, we report an interesting finding, i.e., the semantic classification accuracy exhibits a strong linear relationship with the accuracy of the rotation prediction task (Pearsons Correlation r > 0.88). This finding allows us to utilize linear regression to estimate classifier performance from the accuracy of rotation prediction which can be obtained on the test set through the freely generated rotation labels.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Belief Space Planning: A Covariance Steering Approach

190 - Dongliang Zheng , Jack Ridderhof , Panagiotis Tsiotras 2021

A new belief space planning algorithm, called covariance steering Belief RoadMap (CS-BRM), is introduced, which is a multi-query algorithm for motion planning of dynamical systems under simultaneous motion and observation uncertainties. CS-BRM extend s the probabilistic roadmap (PRM) approach to belief spaces and is based on the recently developed theory of covariance steering (CS) that enables guaranteed satisfaction of terminal belief constraints in finite-time. The nodes in the CS-BRM are sampled in belief space and represent distributions of the system states. A covariance steering controller steers the system from one BRM node to another, thus acting as an edge controller of the corresponding belief graph that ensures belief constraint satisfaction. After the edge controller is computed, a specific edge cost is assigned to that edge. The CS-BRM algorithm allows the sampling of non-stationary belief nodes, and thus is able to explore the velocity space and find efficient motion plans. The performance of CS-BRM is evaluated and compared to a previous belief space planning method, demonstrating the benefits of the proposed approach.

علم الروبوتات أنظمة وتحكم أنظمة وتحكم

Investigating high energy proton proton collisions with a multi-phase transport model approach based on PYTHIA8 initial conditions

346 - Liang Zheng , Guang-Hui Zhang , Yun-Fan Liu 2021

The striking resemblance of high multiplicity proton-proton (pp) collisions at the LHC to heavy ion collisions challenges our conventional wisdom on the formation of the Quark-Gluon Plasma (QGP). A consistent explanation of the collectivity phenomena in pp will help us to understand the mechanism that leads to the QGP-like signals in small systems. In this study, we introduce a transport model approach connecting the initial conditions provided by PYTHIA8 with subsequent AMPT rescatterings to study the collective behavior in high energy pp collisions. The multiplicity dependence of light hadron productions from this model is in reasonable agreement with the pp $sqrt{s}=13$ TeV experimental data. It is found in the comparisons that both the partonic and hadronic final state interactions are important for the generation of the radial flow feature of the pp transverse momentum spectra. The study also shows that the long range two particle azimuthal correlation in high multiplicity pp events is sensitive to the proton sub-nucleon spatial fluctuations.

فيزياء الطاقة العالية - الظواهر نظرية نووية

Positive Sample Propagation along the Audio-Visual Event Line

144 - Jinxing Zhou , Liang Zheng , Yiran Zhong 2021

Visual and audio signals often coexist in natural environments, forming audio-visual events (AVEs). Given a video, we aim to localize video segments containing an AVE and identify its category. In order to learn discriminative features for a classifi er, it is pivotal to identify the helpful (or positive) audio-visual segment pairs while filtering out the irrelevant ones, regardless whether they are synchronized or not. To this end, we propose a new positive sample propagation (PSP) module to discover and exploit the closely related audio-visual pairs by evaluating the relationship within every possible pair. It can be done by constructing an all-pair similarity map between each audio and visual segment, and only aggregating the features from the pairs with high similarity scores. To encourage the network to extract high correlated features for positive samples, a new audio-visual pair similarity loss is proposed. We also propose a new weighting branch to better exploit the temporal correlations in weakly supervised setting. We perform extensive experiments on the public AVE dataset and achieve new state-of-the-art accuracy in both fully and weakly supervised settings, thus verifying the effectiveness of our method.

الرؤية الحاسوبية وتمييز الأنماط الوسائط المتعددة أنظمة الصوت في الحاسوب

Using local nuclear scaling of initial condition parameters to improve the system size dependence of transport model descriptions of nuclear collisions

55 - Chao Zhang , Liang Zheng , Shusu Shi 2021

We extensively study the system size dependence of nuclear collisions with a multi-phase transport model. Previously certain key parameters for the initial condition needed significantly different values for $pp$ and central $AA$ collisions for the m odel to reasonably describe the yields and transverse momentum spectra of the bulk matter in those collision systems. Here we scale two key parameters, the Lund string fragmentation parameter $b_L$ and the minijet transverse momentum cutoff $p_0$, with local nuclear thickness functions from the two colliding nuclei. This allows the model to use the parameter values for $pp$ collisions with the local nuclear scaling to describe the system size and centrality dependences of nuclear collisions self consistently. In addition to providing good descriptions of $pp$ collisions from 23.6 GeV to 13 TeV and reasonable descriptions of the centrality dependence of charged particle yields for Au+Au collisions from $7.7A$ GeV to $200A$ GeV and Pb+Pb collisions at LHC energies, the improved model can now well describe the centrality dependence of the mean transverse momentum of charged particles below $p_{rm T} lesssim 2$ GeV. It works similarly well for smaller systems including $p$Pb, Cu+Cu and Xe+Xe collisions.

نظرية نووية التجربة النووية

Coherent Optical Memory Baesd on A Laser-written On-chip Waveguide

68 - Tian-Xiang Zhu , Chao Liu , Liang Zheng 2020

Quantum memory is the core device for the construction of large-scale quantum networks. For scalable and convenient practical applications, integrated optical memories, especially on-chip optical memories, are crucial requirements because they can be easily integrated with other on-chip devices. Here, we report the coherent optical memory based on a type-IV waveguide fabricated on the surface of a rare-earth ion-doped crystal (i.e. $mathrm{Eu^{3+}}$:$mathrm{Y_2SiO_5}$). The properties of the optical transition ($mathrm{{^7}F{_0}rightarrow{^5}D{_0}}$) of the $mathrm{Eu^{3+}}$ ions inside the surface waveguide are well preserved compared to those of the bulk crystal. Spin-wave atomic frequency comb storage is demonstrated inside the type-IV waveguide. The reliability of this device is confirmed by the high interference visibility of ${97pm 1%}$ between the retrieval pulse and the reference pulse. The developed on-chip optical memory paves the way towards integrated quantum nodes.

فيزياء الكم

CycAs: Self-supervised Cycle Association for Learning Re-identifiable Descriptions

132 - Zhongdao Wang , Jingwei Zhang , Liang Zheng 2020

This paper proposes a self-supervised learning method for the person re-identification (re-ID) problem, where existing unsupervised methods usually rely on pseudo labels, such as those from video tracklets or clustering. A potential drawback of using pseudo labels is that errors may accumulate and it is challenging to estimate the number of pseudo IDs. We introduce a different unsupervised method that allows us to learn pedestrian embeddings from raw videos, without resorting to pseudo labels. The goal is to construct a self-supervised pretext task that matches the person re-ID objective. Inspired by the emph{data association} concept in multi-object tracking, we propose the textbf{Cyc}le textbf{As}sociation (textbf{CycAs}) task: after performing data association between a pair of video frames forward and then backward, a pedestrian instance is supposed to be associated to itself. To fulfill this goal, the model must learn a meaningful representation that can well describe correspondences between instances in frame pairs. We adapt the discrete association process to a differentiable form, such that end-to-end training becomes feasible. Experiments are conducted in two aspects: We first compare our method with existing unsupervised re-ID methods on seven benchmarks and demonstrate CycAs superiority. Then, to further validate the practical value of CycAs in real-world applications, we perform training on self-collected videos and report promising performance on standard test sets.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد