أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Kai Wan

Harnessing Perceptual Adversarial Patches for Crowd Counting

112 - Shunchang Liu , Jiakai Wang , Aishan Liu 2021

Crowd counting, which is significantly important for estimating the number of people in safety-critical scenes, has been shown to be vulnerable to adversarial examples in the physical world (e.g., adversarial patches). Though harmful, adversarial exa mples are also valuable for assessing and better understanding model robustness. However, existing adversarial example generation methods in crowd counting scenarios lack strong transferability among different black-box models. Motivated by the fact that transferability is positively correlated to the model-invariant characteristics, this paper proposes the Perceptual Adversarial Patch (PAP) generation framework to learn the shared perceptual features between models by exploiting both the model scale perception and position perception. Specifically, PAP exploits differentiable interpolation and density attention to help learn the invariance between models during training, leading to better transferability. In addition, we surprisingly found that our adversarial patches could also be utilized to benefit the performance of vanilla models for alleviating several challenges including cross datasets and complex backgrounds. Extensive experiments under both digital and physical world scenarios demonstrate the effectiveness of our PAP.

الرؤية الحاسوبية وتمييز الأنماط

R-PCC: A Baseline for Range Image-based Point Cloud Compression

82 - Sukai Wang , Jianhao Jiao , Peide Cai 2021

In autonomous vehicles or robots, point clouds from LiDAR can provide accurate depth information of objects compared with 2D images, but they also suffer a large volume of data, which is inconvenient for data storage or transmission. In this paper, w e propose a Range image-based Point Cloud Compression method, R-PCC, which can reconstruct the point cloud with uniform or non-uniform accuracy loss. We segment the original large-scale point cloud into small and compact regions for spatial redundancy and salient region classification. Compared with other voxel-based or image-based compression methods, our method can keep and align all points from the original point cloud in the reconstructed point cloud. It can also control the maximum reconstruction error for each point through a quantization module. In the experiments, we prove that our easier FPS-based segmentation method can achieve better performance than instance-based segmentation methods such as DBSCAN. To verify the advantages of our proposed method, we evaluate the reconstruction quality and fidelity for 3D object detection and SLAM, as the downstream tasks. The experimental results show that our elegant framework can achieve 30$times$ compression ratio without affecting downstream tasks, and our non-uniform compression framework shows a great improvement on the downstream tasks compared with the state-of-the-art large-scale point cloud compression methods. Our real-time method is efficient and effective enough to act as a baseline for range image-based point cloud compression. The code is available on https://github.com/StevenWang30/R-PCC.git.

علم الروبوتات

The Low-Energy Spectral Index of Gamma-Ray Burst Prompt Emission from Internal Shocks

78 - Kai Wang , Zi-Gao Dai 2021

The prompt emission of most gamma-ray bursts (GRBs) typically exhibits a non-thermal Band component. The synchrotron radiation in the popular internal shock model is generally put forward to explain such a non-thermal component. However, the low-ener gy photon index $alpha sim -1.5$ predicted by the synchrotron radiation is inconsistent with the observed value $alpha sim -1$. Here, we investigate the evolution of a magnetic field during propagation of internal shocks within an ultrarelativistic outflow, and revisit the fast cooling of shock-accelerated electrons via synchrotron radiation for this evolutional magnetic field. We find that the magnetic field is first nearly constant and then decays as $Bpropto t^{-1}$, which leads to a reasonable range of the low-energy photon index, $-3/2 < alpha < -2/3$. In addition, if a rising electron injection rate during a GRB is introduced, we find that $alpha$ reaches $-2/3$ more easily. We thus fit the prompt emission spectra of GRB 080916c and GRB~080825c.

ظاهرة عالية الطاقة الفيزياء الفيزيائية

On the Optimal Load-Memory Tradeoff of Coded Caching for Location-Based Content

97 - Kai Wan , Minquan Cheng , Mari Kobayashi 2021

Caching at the wireless edge nodes is a promising way to boost the spatial and spectral efficiency, for the sake of alleviating networks from content-related traffic. Coded caching originally introduced by Maddah-Ali and Niesen significantly speeds u p communication efficiency by transmitting multicast messages simultaneously useful to multiple users. Most prior works on coded caching are based on the assumption that each user may request all content in the library. However, in many applications the users are interested only in a limited set of content items that depends on their location. For example, visitors in a museum may stream audio and video related to the artworks in the room they are visiting, or assisted self-driving vehicles may access super-high definition maps of the area through which they are travelling. Motivated by these considerations, this paper formulates the coded caching problem for location-based content with edge cache nodes. The considered problem includes a content server with access to N location-based files, K edge cache nodes located at different regions, and K users each of which is in the serving region of one cache node and can retrieve the cached content of this cache node with negligible cost. Depending on the location, each user only requests a file from a location-dependent subset of the library. The objective is to minimize the worst-case load transmitted from the content server among all possible demands. We propose a highly non-trivial converse bound under uncoded cache placement, which shows that a simple achievable scheme is optimal. In addition, this achievable scheme is generally order optimal within 3. Finally, we extend the coded caching problem for location-based content to the multiaccess coded caching topology, where each user is connected to L nearest cache nodes. When $L geq 2$ we characterize the exact optimality on the worst-case load.

نظرية المعلومات نظرية المعلومات

Graphine: A Dataset for Graph-aware Terminology Definition Generation

82 - Zequn Liu , Shukai Wang , Yiyang Gu 2021

Precisely defining the terminology is the first step in scientific communication. Developing neural text generation models for definition generation can circumvent the labor-intensity curation, further accelerating scientific discovery. Unfortunately , the lack of large-scale terminology definition dataset hinders the process toward definition generation. In this paper, we present a large-scale terminology definition dataset Graphine covering 2,010,648 terminology definition pairs, spanning 227 biomedical subdisciplines. Terminologies in each subdiscipline further form a directed acyclic graph, opening up new avenues for developing graph-aware text generation models. We then proposed a novel graph-aware definition generation model Graphex that integrates transformer with graph neural network. Our model outperforms existing text generation models by exploiting the graph structure of terminologies. We further demonstrated how Graphine can be used to evaluate pretrained language models, compare graph representation learning methods and predict sentence granularity. We envision Graphine to be a unique resource for definition generation and many other NLP tasks in biomedicine.

الحساب واللغة

Passive Mechanical Realizations of Bicubic Impedances with No More Than Five Elements for Inerter-Based Control Design with the Supplementary Material

212 - Kai Wang , Michael Z. Q. Chen 2021

This report includes the original manuscript (pp. 2-40) and the supplementary material (pp. 41-48) of Passive Mechanical Realizations of Bicubic Impedances with No More Than Five Elements for Inerter-Based Control Design.

التحسين والتحكم

Novel Frameworks for Coded Caching via Cartesian Product with Reduced Subpacketization

118 - Jinyu Wang , Minquan Cheng , Kai Wan 2021

Caching prefetches some library content at users memories during the off-peak times (i.e., {it placement phase}), such that the number of transmissions during the peak-traffic times (i.e., {it delivery phase}) are reduced. A coded caching strategy wa s originally proposed by Maddah-Ali and Niesen (MN) leading to a multicasting gain compared to the conventional uncoded caching, where each message in the delivery phase is useful to multiple users simultaneously. However, the MN coded caching scheme suffers from the high subpacketization which makes it impractical. In order to reduce the subpacketization while retain the multicast opportunities in the delivery phase, Yan et al. proposed a combinatorial structure called placement delivery array (PDA) to design coded caching schemes. In this paper, we propose two novel frameworks for constructing PDA via Cartesian product, which constructs a PDA for $mK_1$ users by the $m$-fold Cartesian product of a PDA for $K_1$ users. By applying the proposed frameworks to some existing PDAs, three novel caching schemes are obtained which can significantly reduce the subpacketization of the MN scheme while slightly increasing the needed number of transmissions. For instance, for the third scheme which works for any number of users and any memory regime, while reducing the coded caching gain by one, the needed subpacketization is at most $Oleft(sqrt{frac{K}{q}}2^{-frac{K}{q}}right)$ of that of the MN scheme, where $K$ is the number of users, $0<z/q<1$ is the memory ratio of each user, and $q,z$ are coprime.

نظرية المعلومات نظرية المعلومات

Elastic Tactile Simulation Towards Tactile-Visual Perception

128 - Yikai Wang , Wenbing Huang , Bin Fang 2021

Tactile sensing plays an important role in robotic perception and manipulation tasks. To overcome the real-world limitations of data collection, simulating tactile response in a virtual environment comes as a desirable direction of robotic research. In this paper, we propose Elastic Interaction of Particles (EIP) for tactile simulation. Most existing works model the tactile sensor as a rigid multi-body, which is incapable of reflecting the elastic property of the tactile sensor as well as characterizing the fine-grained physical interaction between the two objects. By contrast, EIP models the tactile sensor as a group of coordinated particles, and the elastic property is applied to regulate the deformation of particles during contact. With the tactile simulation by EIP, we further propose a tactile-visual perception network that enables information fusion between tactile data and visual images. The perception network is based on a global-to-local fusion mechanism where multi-scale tactile features are aggregated to the corresponding local region of the visual modality with the guidance of tactile positions and directions. The fusion method exhibits superiority regarding the 3D geometric reconstruction task.

علم الروبوتات الرؤية الحاسوبية وتمييز الأنماط

Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

88 - Yikai Wang , Fuchun Sun , Ming Lu 2021

We propose a compact and effective framework to fuse multimodal features at multiple layers in a single network. The framework consists of two innovative fusion schemes. Firstly, unlike existing multimodal methods that necessitate individual encoders for different modalities, we verify that multimodal features can be learnt within a shared single network by merely maintaining modality-specific batch normalization layers in the encoder, which also enables implicit fusion via joint feature representation learning. Secondly, we propose a bidirectional multi-layer fusion scheme, where multimodal features can be exploited progressively. To take advantage of such scheme, we introduce two asymmetric fusion operations including channel shuffle and pixel shift, which learn different fused features with respect to different fusion directions. These two operations are parameter-free and strengthen the multimodal feature interactions across channels as well as enhance the spatial feature discrimination within channels. We conduct extensive experiments on semantic segmentation and image translation tasks, based on three publicly available datasets covering diverse modalities. Results indicate that our proposed framework is general, compact and is superior to state-of-the-art fusion frameworks.

الرؤية الحاسوبية وتمييز الأنماط

Physics-informed generative neural network: an application to troposphere temperature prediction

192 - Zhihao Chen , Jie Gao , Weikai Wang 2021

The troposphere is one of the atmospheric layers where most weather phenomena occur. Temperature variations in the troposphere, especially at 500 hPa, a typical level of the middle troposphere, are significant indicators of future weather changes. Nu merical weather prediction is effective for temperature prediction, but its computational complexity hinders a timely response. This paper proposes a novel temperature prediction approach in framework ofphysics-informed deep learning. The new model, called PGnet, builds upon a generative neural network with a mask matrix. The mask is designed to distinguish the low-quality predicted regions generated by the first physical stage. The generative neural network takes the mask as prior for the second-stage refined predictions. A mask-loss and a jump pattern strategy are developed to train the generative neural network without accumulating errors during making time-series predictions. Experiments on ERA5 demonstrate that PGnet can generate more refined temperature predictions than the state-of-the-art.

التعلم الآلي الفيزياء الجوية والمحيطية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد