أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Yawei Li

194 - Jiezhang Cao , Yawei Li , Kai Zhang 2021

Video super-resolution (VSR), with the aim to restore a high-resolution video from its corresponding low-resolution version, is a spatial-temporal sequence prediction problem. Recently, Transformer has been gaining popularity due to its parallel comp uting ability for sequence-to-sequence modeling. Thus, it seems to be straightforward to apply the vision Transformer to solve VSR. However, the typical block design of Transformer with a fully connected self-attention layer and a token-wise feed-forward layer does not fit well for VSR due to the following two reasons. First, the fully connected self-attention layer neglects to exploit the data locality because this layer relies on linear layers to compute attention maps. Second, the token-wise feed-forward layer lacks the feature alignment which is important for VSR since this layer independently processes each of the input token embeddings without any interaction among them. In this paper, we make the first attempt to adapt Transformer for VSR. Specifically, to tackle the first issue, we present a spatial-temporal convolutional self-attention layer with a theoretical understanding to exploit the locality information. For the second issue, we design a bidirectional optical flow-based feed-forward layer to discover the correlations across different video frames and also align features. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our proposed method. The code will be available at https://github.com/caojiezhang/VSR-Transformer.

الرؤية الحاسوبية وتمييز الأنماط

LocalViT: Bringing Locality to Vision Transformers

88 - Yawei Li , Kai Zhang , Jiezhang Cao 2021

We study how to introduce locality mechanisms into vision transformers. The transformer network originates from machine translation and is particularly good at modelling long-range dependencies within a long sequence. Although the global interaction between the token embeddings could be well modelled by the self-attention mechanism of transformers, what is lacking a locality mechanism for information exchange within a local region. Yet, locality is essential for images since it pertains to structures like lines, edges, shapes, and even objects. We add locality to vision transformers by introducing depth-wise convolution into the feed-forward network. This seemingly simple solution is inspired by the comparison between feed-forward networks and inverted residual blocks. The importance of locality mechanisms is validated in two ways: 1) A wide range of design choices (activation function, layer placement, expansion ratio) are available for incorporating locality mechanisms and all proper choices can lead to a performance gain over the baseline, and 2) The same locality mechanism is successfully applied to 4 vision transformers, which shows the generalization of the locality concept. In particular, for ImageNet2012 classification, the locality-enhanced transformers outperform the baselines DeiT-T and PVT-T by 2.6% and 3.1% with a negligible increase in the number of parameters and computational effort. Code is available at url{https://github.com/ofsoundof/LocalViT}.

الرؤية الحاسوبية وتمييز الأنماط

Towards Efficient Graph Convolutional Networks for Point Cloud Handling

87 - Yawei Li , He Chen , Zhaopeng Cui 2021

In this paper, we aim at improving the computational efficiency of graph convolutional networks (GCNs) for learning on point clouds. The basic graph convolution that is typically composed of a $K$-nearest neighbor (KNN) search and a multilayer percep tron (MLP) is examined. By mathematically analyzing the operations there, two findings to improve the efficiency of GCNs are obtained. (1) The local geometric structure information of 3D representations propagates smoothly across the GCN that relies on KNN search to gather neighborhood features. This motivates the simplification of multiple KNN searches in GCNs. (2) Shuffling the order of graph feature gathering and an MLP leads to equivalent or similar composite operations. Based on those findings, we optimize the computational procedure in GCNs. A series of experiments show that the optimized networks have reduced computational complexity, decreased memory consumption, and accelerated inference speed while maintaining comparable accuracy for learning on point clouds. Code will be available at url{https://github.com/ofsoundof/EfficientGCN.git}.

الرؤية الحاسوبية وتمييز الأنماط

AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

294 - Kai Zhang , Martin Danelljan , Yawei Li 2020

This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examp les of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter count, FLOPs, activations, and memory consumption while at least maintaining PSNR of MSRResNet. The track had 150 registered participants, and 25 teams submitted the final results. They gauge the state-of-the-art in efficient single image super-resolution.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

Giant nonlinear optical responses from photon avalanching nanoparticles

156 - Changhwan Lee , Emma Xu , Yawei Liu 2020

Avalanche phenomena leverage steeply nonlinear dynamics to generate disproportionately high responses from small perturbations and are found in a multitude of events and materials, enabling technologies including optical phase-conjugate imaging, infr ared quantum counting, and efficient upconverted lasing. However, the photon avalanching (PA) mechanism underlying these optical innovations has been observed only in bulk materials and aggregates, and typically at cryogenic temperatures, limiting its utility and impact. Here, we report the realization of PA at room temperature in single nanostructures--small, Tm-doped upconverting nanocrystals--and demonstrate their use in superresolution imaging at near-infrared (NIR) wavelengths within spectral windows of maximal biological transparency. Avalanching nanoparticles (ANPs) can be pumped by continuous-wave or pulsed lasers and exhibit all of the defining features of PA. These hallmarks include excitation power thresholds, long rise time at threshold, and a dominant excited-state absorption that is >13,000x larger than ground-state absorption. Beyond the avalanching threshold, ANP emission scales nonlinearly with the 26th power of pump intensity. This enables the realization of photon-avalanche single-beam superresolution imaging (PASSI), achieving sub-70 nm spatial resolution using only simple scanning confocal microscopy and before any computational analysis. Pairing their steep nonlinearity with existing superresolution techniques and computational methods, ANPs allow for imaging with higher resolution and at ca. 100-fold lower excitation intensities than is possible with other probes. The low PA threshold and exceptional photostability of ANPs also suggest their utility in a diverse array of applications including sub-wavelength bioimaging, IR detection, temperature and pressure transduction, neuromorphic computing, and quantum optics.

بصريات الفيزياء ميسكالي وننكالي علم المواد

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد