أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل An Liu

Harnessing Perceptual Adversarial Patches for Crowd Counting

112 - Shunchang Liu , Jiakai Wang , Aishan Liu 2021

Crowd counting, which is significantly important for estimating the number of people in safety-critical scenes, has been shown to be vulnerable to adversarial examples in the physical world (e.g., adversarial patches). Though harmful, adversarial exa mples are also valuable for assessing and better understanding model robustness. However, existing adversarial example generation methods in crowd counting scenarios lack strong transferability among different black-box models. Motivated by the fact that transferability is positively correlated to the model-invariant characteristics, this paper proposes the Perceptual Adversarial Patch (PAP) generation framework to learn the shared perceptual features between models by exploiting both the model scale perception and position perception. Specifically, PAP exploits differentiable interpolation and density attention to help learn the invariance between models during training, leading to better transferability. In addition, we surprisingly found that our adversarial patches could also be utilized to benefit the performance of vanilla models for alleviating several challenges including cross datasets and complex backgrounds. Extensive experiments under both digital and physical world scenarios demonstrate the effectiveness of our PAP.

الرؤية الحاسوبية وتمييز الأنماط

Multi View Spatial-Temporal Model for Travel Time Estimation

215 - ZiChuan Liu , Zhaoyang Wu , Meng Wang 2021

Taxi arrival time prediction is an essential part of building intelligent transportation systems. Traditional arrival time estimation methods mainly rely on traffic map feature extraction, which can not model complex situations and nonlinear spatial and temporal relationships. Therefore, we propose a Multi-View Spatial-Temporal Model (MVSTM) to capture the dependence of spatial-temporal and trajectory. Specifically, we use graph2vec to model the spatial view, dual-channel temporal module to model the trajectory view, and structural embedding to model the traffic semantics. Experiments on large-scale taxi trajectory data show that our approach is more effective than the novel method. The source code can be obtained from https://github.com/775269512/SIGSPATIAL-2021-GISCUP-4th-Solution.

التعلم الآلي أجهزة الكمبيوتر والمجتمع

A Unified Framework for Biphasic Facial Age Translation with Noisy-Semantic Guided Generative Adversarial Networks

306 - Muyi Sun , Jian Wang , Yunfan Liu 2021

Biphasic facial age translation aims at predicting the appearance of the input face at any age. Facial age translation has received considerable research attention in the last decade due to its practical value in cross-age face recognition and variou s entertainment applications. However, most existing methods model age changes between holistic images, regardless of the human face structure and the age-changing patterns of individual facial components. Consequently, the lack of semantic supervision will cause infidelity of generated faces in detail. To this end, we propose a unified framework for biphasic facial age translation with noisy-semantic guided generative adversarial networks. Structurally, we project the class-aware noisy semantic layouts to soft latent maps for the following injection operation on the individual facial parts. In particular, we introduce two sub-networks, ProjectionNet and ConstraintNet. ProjectionNet introduces the low-level structural semantic information with noise map and produces soft latent maps. ConstraintNet disentangles the high-level spatial features to constrain the soft latent maps, which endows more age-related context into the soft latent maps. Specifically, attention mechanism is employed in ConstraintNet for feature disentanglement. Meanwhile, in order to mine the strongest mapping ability of the network, we embed two types of learning strategies in the training procedure, supervised self-driven generation and unsupervised condition-driven cycle-consistent generation. As a result, extensive experiments conducted on MORPH and CACD datasets demonstrate the prominent ability of our proposed method which achieves state-of-the-art performance.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Efficient and Probabilistic Adaptive VoxelnMapping for Accurate Online3D SLAM

90 - Chongjian Yuan , Wei xu , Xiyuan Liu 2021

This paper proposes an efficient and probabilistic adaptive voxel mapping method for 3D SLAM. An accurate uncertainty model of point and plane is proposed for probabilistic plane representation. We analyze the need for coarse-to-fine voxel mapping an d then use a novel voxel map organized by a Hash table and octrees to build and update the map efficiently. We apply the voxel map to the iterated Kalman filter and construct the maximum posterior probability problem for pose estimation. The experiments on the open KITTI dataset show the high accuracy and efficiency of our method in contrast with other state-of-the-art. Outdoor experiments on unstructured environments with non-repetitive scanning LiDAR further verify the adaptability of our mapping method to different environments and LiDAR scanning patterns.

علم الروبوتات

The Promise of Dataflow Architectures in the Design of Processing Systems for Autonomous Machines

424 - Shaoshan Liu , Yuhao Zhu , Bo Yu 2021

The commercialization of autonomous machines is a thriving sector, and likely to be the next major computing demand driver, after PC, cloud computing, and mobile computing. Nevertheless, a suitable computer architecture for autonomous machines is mis sing, and many companies are forced to develop ad hoc computing solutions that are neither scalable nor extensible. In this article, we analyze the demands of autonomous machine computing, and argue for the promise of dataflow architectures in autonomous machines.

هندسة العتاد الذكاء الاصطناعي علم الروبوتات

Design Guidelines for Prompt Engineering Text-to-Image Generative Models

62 - Vivian Liu , Lydia B. Chilton 2021

Text-to-image generative models are a new and powerful way to generate visual artwork. The free-form nature of text as interaction is double-edged; while users have access to an infinite range of generations, they also must engage in brute-force tria l and error with the text prompt when the result quality is poor. We conduct a study exploring what prompt components and model parameters can help produce coherent outputs. In particular, we study prompts structured to include subject and style and investigate success and failure modes within these dimensions. Our evaluation of 5493 generations over the course of five experiments spans 49 abstract and concrete subjects as well as 51 abstract and figurative styles. From this evaluation, we present design guidelines that can help people find better outcomes from text-to-image generative models.

تفاعل الإنسان والحاسوب

Non-autoregressive Transformer with Unified Bidirectional Decoder for Automatic Speech Recognition

139 - Chuan-Fei Zhang , Yan Liu , Tian-Hao Zhang 2021

Non-autoregressive (NAR) transformer models have been studied intensively in automatic speech recognition (ASR), and a substantial part of NAR transformer models is to use the casual mask to limit token dependencies. However, the casual mask is desig ned for the left-to-right decoding process of the non-parallel autoregressive (AR) transformer, which is inappropriate for the parallel NAR transformer since it ignores the right-to-left contexts. Some models are proposed to utilize right-to-left contexts with an extra decoder, but these methods increase the model complexity. To tackle the above problems, we propose a new non-autoregressive transformer with a unified bidirectional decoder (NAT-UBD), which can simultaneously utilize left-to-right and right-to-left contexts. However, direct use of bidirectional contexts will cause information leakage, which means the decoder output can be affected by the character information from the input of the same position. To avoid information leakage, we propose a novel attention mask and modify vanilla queries, keys, and values matrices for NAT-UBD. Experimental results verify that NAT-UBD can achieve character error rates (CERs) of 5.0%/5.5% on the Aishell1 dev/test sets, outperforming all previous NAR transformer models. Moreover, NAT-UBD can run 49.8x faster than the AR transformer baseline when decoding in a single step.

الحساب واللغة أنظمة الصوت في الحاسوب معالجة الصوت والكلام

Fast and Accurate Extrinsic Calibration for Multiple LiDARs and Cameras

95 - Xiyuan Liu , Chongjian Yuan , Fu Zhang 2021

In this letter, we propose a fast, accurate, and targetless extrinsic calibration method for multiple LiDARs and cameras based on adaptive voxelization. On the theory level, we incorporate the LiDAR extrinsic calibration with the bundle adjustment me thod. We derive the second-order derivatives of the cost function w.r.t. the extrinsic parameter to accelerate the optimization. On the implementation level, we apply the adaptive voxelization to dynamically segment the LiDAR point cloud into voxels with non-identical sizes, and reduce the computation time in the process of feature correspondence matching. The robustness and accuracy of our proposed method have been verified with experiments in outdoor test scenes under multiple LiDAR-camera configurations.

علم الروبوتات

Sensor Adversarial Traits: Analyzing Robustness of 3D Object Detection Sensor Fusion Models

80 - Won Park , Nan Liu , Qi Alfred Chen 2021

A critical aspect of autonomous vehicles (AVs) is the object detection stage, which is increasingly being performed with sensor fusion models: multimodal 3D object detection models which utilize both 2D RGB image data and 3D data from a LIDAR sensor as inputs. In this work, we perform the first study to analyze the robustness of a high-performance, open source sensor fusion model architecture towards adversarial attacks and challenge the popular belief that the use of additional sensors automatically mitigate the risk of adversarial attacks. We find that despite the use of a LIDAR sensor, the model is vulnerable to our purposefully crafted image-based adversarial attacks including disappearance, universal patch, and spoofing. After identifying the underlying reason, we explore some potential defenses and provide some recommendations for improved sensor fusion models.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Cross-Modality Domain Adaptation for Vestibular Schwannoma and Cochlea Segmentation

188 - Han Liu , Yubo Fan , Can Cui 2021

Automatic methods to segment the vestibular schwannoma (VS) tumors and the cochlea from magnetic resonance imaging (MRI) are critical to VS treatment planning. Although supervised methods have achieved satisfactory performance in VS segmentation, the y require full annotations by experts, which is laborious and time-consuming. In this work, we aim to tackle the VS and cochlea segmentation problem in an unsupervised domain adaptation setting. Our proposed method leverages both the image-level domain alignment to minimize the domain divergence and semi-supervised training to further boost the performance. Furthermore, we propose to fuse the labels predicted from multiple models via noisy label correction. Our results on the challenge validation leaderboard showed that our unsupervised method has achieved promising VS and cochlea segmentation performance with mean dice score of 0.8261 $pm$ 0.0416; The mean dice value for the tumor is 0.8302 $pm$ 0.0772. This is comparable to the weakly-supervised based method.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد