أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Yu Ding

DPGen: Automated Program Synthesis for Differential Privacy

199 - Yuxin Wang , Zeyu Ding , Yingtai Xiao 2021

Differential privacy has become a de facto standard for releasing data in a privacy-preserving way. Creating a differentially private algorithm is a process that often starts with a noise-free (non-private) algorithm. The designer then decides where to add noise, and how much of it to add. This can be a non-trivial process -- if not done carefully, the algorithm might either violate differential privacy or have low utility. In this paper, we present DPGen, a program synthesizer that takes in non-private code (without any noise) and automatically synthesizes its differentially private version (with carefully calibrated noise). Under the hood, DPGen uses novel algorithms to automatically generate a sketch program with candidate locations for noise, and then optimize privacy proof and noise scales simultaneously on the sketch program. Moreover, DPGen can synthesize sophisticated mechanisms that adaptively process queries until a specified privacy budget is exhausted. When evaluated on standard benchmarks, DPGen is able to generate differentially private mechanisms that optimize simple utility functions within 120 seconds. It is also powerful enough to synthesize adaptive privacy mechanisms.

التشفير والأمن لغات البرمجة

AugLimb: Compact Robotic Limb for Human Augmentation

79 - Zeyu Ding , Shogo Yoshida , Toby Chong 2021

This work proposes a compact robotic limb, AugLimb, that can augment our body functions and support the daily activities. AugLimb adopts the double-layer scissor unit for the extendable mechanism which can achieve 2.5 times longer than the forearm le ngth. The proposed device can be mounted on the users upper arm, and transform into compact state without obstruction to wearers. The proposed device is lightweight with low burden exerted on the wearer. We developed the prototype of AugLimb to demonstrate the proposed mechanisms. We believe that the design methodology of AugLimb can facilitate human augmentation research for practical use. see http://www.jaist.ac.jp/~xie/auglimb.html

تفاعل الإنسان والحاسوب علم الروبوتات

Information retrieval and eigenstates coalescence in a non-Hermitian quantum system with anti-$mathcal{PT}$ symmetry

124 - Liangyu Ding , Kaiye Shi , Yuxin Wang 2021

Non-Hermitian systems with parity-time reversal ($mathcal{PT}$) or anti-$mathcal{PT}$ symmetry have attracted a wide range of interest owing to their unique characteristics and counterintuitive phenomena. One of the most extraordinary features is the presence of an exception point (EP), across which a phase transition with spontaneously broken $mathcal{PT}$ symmetry takes place. We implement a Floquet Hamiltonian of a single qubit with anti-$mathcal{PT}$ symmetry by periodically driving a dissipative quantum system of a single trapped ion. With stroboscopic emission and quantum state tomography, we obtain the time evolution of density matrix for an arbitrary initial state, and directly demonstrate information retrieval, eigenstates coalescence, and topological energy spectra as unique features of non-Hermitian systems.

فيزياء الكم غازات الكم

Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion

108 - Suzhen Wang , Lincheng Li , Yu Ding 2021

We propose an audio-driven talking-head method to generate photo-realistic talking-head videos from a single reference image. In this work, we tackle two key challenges: (i) producing natural head motions that match speech prosody, and (ii) maintaini ng the appearance of a speaker in a large head motion while stabilizing the non-face regions. We first design a head pose predictor by modeling rigid 6D head movements with a motion-aware recurrent neural network (RNN). In this way, the predicted head poses act as the low-frequency holistic movements of a talking head, thus allowing our latter network to focus on detailed facial movement generation. To depict the entire image motions arising from audio, we exploit a keypoint based dense motion field representation. Then, we develop a motion field generator to produce the dense motion fields from input audio, head poses, and a reference image. As this keypoint based representation models the motions of facial regions, head, and backgrounds integrally, our method can better constrain the spatial and temporal consistency of the generated videos. Finally, an image generation network is employed to render photo-realistic talking-head videos from the estimated keypoint based motion fields and the input reference image. Extensive experiments demonstrate that our method produces videos with plausible head motions, synchronized facial expressions, and stable backgrounds and outperforms the state-of-the-art.

الرؤية الحاسوبية وتمييز الأنماط الحساب واللغة

Only Train Once: A One-Shot Neural Network Training And Pruning Framework

70 - Tianyi Chen , Bo Ji , Tianyu Ding 2021

Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. However, the existing pruning methods are usually heuristic, task-specified, and require an extra fine-tuning procedure. To ov ercome these limitations, we propose a framework that compresses DNNs into slimmer architectures with competitive performances and significant FLOPs reductions by Only-Train-Once (OTO). OTO contains two keys: (i) we partition the parameters of DNNs into zero-invariant groups, enabling us to prune zero groups without affecting the output; and (ii) to promote zero groups, we then formulate a structured-sparsity optimization problem and propose a novel optimization algorithm, Half-Space Stochastic Projected Gradient (HSPG), to solve it, which outperforms the standard proximal methods on group sparsity exploration and maintains comparable convergence. To demonstrate the effectiveness of OTO, we train and compress full models simultaneously from scratch without fine-tuning for inference speedup and parameter reduction, and achieve state-of-the-art results on VGG16 for CIFAR10, ResNet50 for CIFAR10/ImageNet and Bert for SQuAD.

التعلم الآلي

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers

370 - Mingyu Ding , Xiaochen Lian , Linjie Yang 2021

High-resolution representations (HR) are essential for dense prediction tasks such as segmentation, detection, and pose estimation. Learning HR representations is typically ignored in previous Neural Architecture Search (NAS) methods that focus on im age classification. This work proposes a novel NAS method, called HR-NAS, which is able to find efficient and accurate networks for different tasks, by effectively encoding multiscale contextual information while maintaining high-resolution representations. In HR-NAS, we renovate the NAS search space as well as its searching strategy. To better encode multiscale image contexts in the search space of HR-NAS, we first carefully design a lightweight transformer, whose computational complexity can be dynamically changed with respect to different objective functions and computation budgets. To maintain high-resolution representations of the learned networks, HR-NAS adopts a multi-branch architecture that provides convolutional encoding of multiple feature resolutions, inspired by HRNet. Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources. HR-NAS is capable of achieving state-of-the-art trade-offs between performance and FLOPs for three dense prediction tasks and an image classification task, given only small computational budgets. For example, HR-NAS surpasses SqueezeNAS that is specially designed for semantic segmentation while improving efficiency by 45.9%. Code is available at https://github.com/dingmyu/HR-NAS

الرؤية الحاسوبية وتمييز الأنماط

The Permute-and-Flip Mechanism is Identical to Report-Noisy-Max with Exponential Noise

76 - Zeyu Ding , Daniel Kifer , Sayed M. Saghaian N. E. 2021

The permute-and-flip mechanism is a recently proposed differentially private selection algorithm that was shown to outperform the exponential mechanism. In this paper, we show that permute-and-flip is equivalent to the well-known report noisy max algorithm with exponential noise.

التشفير والأمن

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

75 - Mutian Xu , Runyu Ding , Hengshuang Zhao 2021

We introduce Position Adaptive Convolution (PAConv), a generic convolution operation for 3D point cloud processing. The key of PAConv is to construct the convolution kernel by dynamically assembling basic weight matrices stored in Weight Bank, where the coefficients of these weight matrices are self-adaptively learned from point positions through ScoreNet. In this way, the kernel is built in a data-driven manner, endowing PAConv with more flexibility than 2D convolutions to better handle the irregular and unordered point cloud data. Besides, the complexity of the learning process is reduced by combining weight matrices instead of brutally predicting kernels from point positions. Furthermore, different from the existing point convolution operators whose network architectures are often heavily engineered, we integrate our PAConv into classical MLP-based point cloud pipelines without changing network configurations. Even built on simple networks, our method still approaches or even surpasses the state-of-the-art models, and significantly improves baseline performance on both classification and segmentation tasks, yet with decent efficiency. Thorough ablation studies and visualizations are provided to understand PAConv. Code is released on https://github.com/CVMI-Lab/PAConv.

الرؤية الحاسوبية وتمييز الأنماط

Learning Versatile Neural Architectures by Propagating Network Codes

56 - Mingyu Ding , Yuqi Huo , Haoyu Lu 2021

This work explores how to design a single neural network that is capable of adapting to multiple heterogeneous tasks of computer vision, such as image segmentation, 3D detection, and video recognition. This goal is challenging because network archite cture designs in different tasks are inconsistent. We solve this challenge by proposing Network Coding Propagation (NCP), a novel neural predictor, which is able to predict an architectures performance in multiple datasets and tasks. Unlike prior arts of neural architecture search (NAS) that typically focus on a single task, NCP has several unique benefits. (1) NCP can be trained on different NAS benchmarks, such as NAS-Bench-201 and NAS-Bench-MR, which contains a novel network space designed by us for jointly searching an architecture among multiple tasks, including ImageNet, Cityscapes, KITTI, and HMDB51. (2) NCP learns from network codes but not original data, enabling it to update the architecture efficiently across datasets. (3) Extensive experiments evaluate NCP on object classification, detection, segmentation, and video recognition. For example, with 17% fewer FLOPs, a single architecture returned by NCP achieves 86% and 77.16% on ImageNet-50-1000 and Cityscapes respectively, outperforming its counterparts. More interestingly, NCP enables a single architecture applicable to both image segmentation and video recognition, which achieves competitive performance on both HMDB51 and ADE20K compared to the singular counterparts. Code is available at https://github.com/dingmyu/NCP}{https://github.com/dingmyu/NCP.

الرؤية الحاسوبية وتمييز الأنماط

ERNIE-Doc: A Retrospective Long-Document Modeling Transformer

126 - Siyu Ding , Junyuan Shang , Shuohuan Wang 2020

Transformers are not suited for processing long documents, due to their quadratically increasing memory and time consumption. Simply truncating a long document or applying the sparse attention mechanism will incur the context fragmentation problem or lead to an inferior modeling capability against comparable model sizes. In this paper, we propose ERNIE-Doc, a document-level language pretraining model based on Recurrence Transformers. Two well-designed techniques, namely the retrospective feed mechanism and the enhanced recurrence mechanism, enable ERNIE-Doc, which has a much longer effective context length, to capture the contextual information of a complete document. We pretrain ERNIE-Doc to explicitly learn the relationships among segments with an additional document-aware segment-reordering objective. Various experiments were conducted on both English and Chinese document-level tasks. ERNIE-Doc improved the state-of-the-art language modeling result of perplexity to 16.8 on WikiText-103. Moreover, it outperformed competitive pretraining models by a large margin on most language understanding tasks, such as text classification and question answering.

الحساب واللغة

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد